12. Things to Try // Computer Architecture and Organization // bhaswanth

Hands-on cements understanding. A short list:

Iron Law calculation. Take a real benchmark on your machine. Use perf stat ./program to read instruction count, cycles, and frequency. Compute IPC (1/CPI). Try a different compiler optimization level. Watch IPC and instruction count both move.
Cache simulation. Pin the SimpleScalar or Cachegrind tool on a small program. Observe miss rates as you vary cache size, associativity, line size. See how they trade off.
Compile to assembly. gcc -S -O0 hello.c then gcc -S -O3 hello.c. Compare. The optimizer's tricks (loop unrolling, vectorization, constant propagation) become visible.
Compiler Explorer (godbolt.org). Paste C code; pick x86, ARM, RISC-V; see instructions side by side. Identical algorithms compile differently.
Probe the cache. Write a microbenchmark that accesses memory in strides of varying sizes; plot access time vs stride. The plateaus reveal L1, L2, L3, and DRAM. (Ulrich Drepper's "What Every Programmer Should Know About Memory" walks through this in detail.)
Build a tiny CPU on FPGA. A 32-bit RISC-V core in Verilog: fetch, decode, execute, write back. ~500 lines. Many open-source designs exist (PicoRV32, VexRiscv) to learn from.
Read the Intel SDM (Software Developer's Manual). Volume 3A is system programming. Skim the chapter on paging. Try to write the address-translation pseudocode yourself.
Run a Spectre PoC. Public proof-of-concept code is widely available. On a vulnerable machine (or with mitigations off), watch a user-space program read kernel memory. Then enable mitigations and watch it fail.