>
section 12 of 132 min read

12. Things to Try

Hands-on cements understanding. A short list:

  1. Iron Law calculation. Take a real benchmark on your machine. Use perf stat ./program to read instruction count, cycles, and frequency. Compute IPC (1/CPI). Try a different compiler optimization level. Watch IPC and instruction count both move.
  2. Cache simulation. Pin the SimpleScalar or Cachegrind tool on a small program. Observe miss rates as you vary cache size, associativity, line size. See how they trade off.
  3. Compile to assembly. gcc -S -O0 hello.c then gcc -S -O3 hello.c. Compare. The optimizer's tricks (loop unrolling, vectorization, constant propagation) become visible.
  4. Compiler Explorer (godbolt.org). Paste C code; pick x86, ARM, RISC-V; see instructions side by side. Identical algorithms compile differently.
  5. Probe the cache. Write a microbenchmark that accesses memory in strides of varying sizes; plot access time vs stride. The plateaus reveal L1, L2, L3, and DRAM. (Ulrich Drepper's "What Every Programmer Should Know About Memory" walks through this in detail.)
  6. Build a tiny CPU on FPGA. A 32-bit RISC-V core in Verilog: fetch, decode, execute, write back. ~500 lines. Many open-source designs exist (PicoRV32, VexRiscv) to learn from.
  7. Read the Intel SDM (Software Developer's Manual). Volume 3A is system programming. Skim the chapter on paging. Try to write the address-translation pseudocode yourself.
  8. Run a Spectre PoC. Public proof-of-concept code is widely available. On a vulnerable machine (or with mitigations off), watch a user-space program read kernel memory. Then enable mitigations and watch it fail.