3. RISC vs CISC: A History Lesson with Modern Twists // Computer Architecture and Organization // bhaswanth

The dispute that defined a generation of CPU design.

3.1 Where CISC came from

In the 1970s, memory was expensive and slow. Compilers were primitive. Hand-written assembly was common. Machine designers reasoned: if memory is expensive, instructions should be dense (do a lot of work per byte). If compilers are primitive, instructions should map closely to high-level concepts (STRING_COMPARE, LOOP_AND_DECREMENT). The Digital Equipment VAX in 1977 was the apex of this style; it had instructions for inserting an entry into a doubly-linked list and for evaluating polynomials.

The result: CISC (Complex Instruction Set Computer). Many instructions, variable length (1 to 15+ bytes), many addressing modes, many specialized operations.

3.2 The RISC counterargument

In the early 1980s, John Cocke at IBM, then David Patterson at Berkeley, and John Hennessy at Stanford, all noticed that most of the CISC instructions were almost never used. Compiled programs spent 90% of their cycles in a small subset of simple instructions. The complex ones existed but did not pay rent.

Worse, the complex instructions slowed down the simple ones. To handle a STRING_COMPARE opcode, the chip needed multi-cycle microcode and slow decoders, which meant every ADD instruction had to wait through a long pipeline.

The RISC manifesto: keep the instruction set simple, regular, fixed-length, mostly load-store. Pay the cost of more instructions per program, win the cost back by clocking faster and pipelining cleanly. Berkeley RISC-I (1981) and Stanford MIPS (1981) demonstrated the approach. ARM (Acorn RISC Machine, 1985) and SPARC (1987) followed.

3.3 The cage match

Property	RISC	CISC
Number of instructions	Tens to a few hundred	Hundreds to a few thousand
Instruction length	Fixed (4 bytes typical)	Variable (1-15 bytes on x86)
Operands	Mostly register; only load/store touch memory	Memory operands allowed in many instructions
Addressing modes	Few (3-5)	Many (10+)
Cycles per instruction	Mostly 1 (with pipeline)	Variable (1 to dozens)
Decoder complexity	Simple, parallelizable	Complex, often serial
Code density	Lower	Higher
Examples	ARM, MIPS, RISC-V, PowerPC, SPARC	x86, x86-64, IBM Z/Architecture, m68k

3.4 The modern hybrid: how x86 won by becoming RISC inside

Here is the punchline that nobody saw coming in 1985: x86 did not lose. It just stopped being CISC inside.

Starting with the Intel Pentium Pro in 1995, x86 CPUs internally translate each CISC instruction into one or more µops (micro-operations) that look like RISC instructions: simple, fixed-length, three-operand, register-to-register. A mov [edi+ecx*4], eax becomes (roughly) one address-generation µop and one store µop. The complex pushf becomes a sequence of µops handled by microcode.

The decoder produces µops; the rest of the pipeline (rename, schedule, execute, retire) handles µops. The chip is a RISC machine wearing a CISC hat. The same trick is used by AMD Athlon onward, and by every x86 since.

Intel went further and added a µop cache in Sandy Bridge (2011) so that recently-decoded x86 instructions skip the slow CISC decoder on the hot path. Apple's M1 (2020) and later, on the ARM side, also use µop translation but for a different reason: to break ARM instructions into smaller pieces that match the ALU port layout.

Meanwhile, ARM has accumulated specialized instructions over the decades (NEON SIMD, crypto extensions, AMX matrix instructions on Apple silicon). RISC-V started spartan and is gaining vector and bit-manipulation extensions. The classical RISC-CISC distinction is now academic.

3.5 Encoding: fixed vs variable

The encoding choice has real consequences:

Fixed-length (4-byte ARM/MIPS/RISC-V). The CPU can fetch the next $N$ instructions in parallel because each is 4 bytes. Decoders are simple. Code is bigger.
Variable-length (1 to 15-byte x86). Each instruction starts at the byte after the previous one ends, so finding instruction boundaries is sequential. To decode 4 instructions in parallel, the chip predicts where each starts using a "length pre-decoder" that runs ahead. Code is smaller.

The instruction-cache footprint matters because L1I is small (~32 KB). x86's compactness is a real win there. ARM Thumb (a 16-bit subset) was added precisely to recover code density.

Hardware-security tie-in. Variable-length encoding interacts badly with shellcode and ROP (return-oriented programming). On x86, an attacker can jump into the middle of an instruction and have the bytes there decode as different (useful, malicious) instructions. The x86 instruction stream is dense with these unintended gadgets. ARM, with fixed 4-byte alignment, has many fewer gadget locations and is therefore harder to ROP. We will revisit in Chapter 24.