Hands-on cements understanding. A short list:
- Iron Law calculation. Take a real benchmark on your machine. Use
perf stat ./programto read instruction count, cycles, and frequency. Compute IPC (1/CPI). Try a different compiler optimization level. Watch IPC and instruction count both move. - Cache simulation. Pin the SimpleScalar or Cachegrind tool on a small program. Observe miss rates as you vary cache size, associativity, line size. See how they trade off.
- Compile to assembly.
gcc -S -O0 hello.cthengcc -S -O3 hello.c. Compare. The optimizer's tricks (loop unrolling, vectorization, constant propagation) become visible. - Compiler Explorer (godbolt.org). Paste C code; pick x86, ARM, RISC-V; see instructions side by side. Identical algorithms compile differently.
- Probe the cache. Write a microbenchmark that accesses memory in strides of varying sizes; plot access time vs stride. The plateaus reveal L1, L2, L3, and DRAM. (Ulrich Drepper's "What Every Programmer Should Know About Memory" walks through this in detail.)
- Build a tiny CPU on FPGA. A 32-bit RISC-V core in Verilog: fetch, decode, execute, write back. ~500 lines. Many open-source designs exist (PicoRV32, VexRiscv) to learn from.
- Read the Intel SDM (Software Developer's Manual). Volume 3A is system programming. Skim the chapter on paging. Try to write the address-translation pseudocode yourself.
- Run a Spectre PoC. Public proof-of-concept code is widely available. On a vulnerable machine (or with mitigations off), watch a user-space program read kernel memory. Then enable mitigations and watch it fail.