1. The 8086: Where x86 Began // Microprocessors and Microcontrollers // bhaswanth

In June 1978, Intel released the 8086. It was a 16-bit microprocessor running at 5 MHz, packing 29,000 transistors, costing about $360 in 1978 dollars. From those humble specs grew, by direct evolutionary descent, every x86 CPU in every Windows or Linux PC and server you have ever touched. The Core Ultra in a 2026 laptop runs the same instruction encodings as the 8086, plus thousands of layers of additions piled on top, plus internal translation to micro-ops, plus speculative execution and SMT and hundreds of cores — but underneath, if you load it in real mode and feed it a 1978 binary, it still runs.

Why study an "obsolete" chip in 2026? Because every quirk of modern x86 traces back here. The strange names of the registers (EAX, EBX, ECX, EDX) are 32-bit extensions of AX, BX, CX, DX. The segmentation in modern x86, fossilized but still present, is an 8086 leftover. Real mode at boot is 8086 mode. The number 0xFFFF0 where the BIOS jump-vector lives is an 8086 reset address. The historical layer is real silicon real estate on every modern CPU. Reading the 8086 is like reading the Latin etymology under English: once you see it, you cannot unsee it.

1.1 What was a microprocessor in 1978?

Before the 8086, computers were either gigantic (mainframes, minicomputers) or built from discrete logic chips (early home computers, calculators). A "microprocessor" — a CPU on a single chip — was a relatively new idea: the Intel 4004 (1971), the 8008 (1972), the 8080 (1974), and competitors like the Motorola 6800 and Zilog Z80. These were 4-bit and 8-bit machines, capable of running BASIC interpreters, simple games, and embedded controllers. The 8086 was Intel's leap to 16 bits, with enough address space (1 MB) and computational throughput to run applications a serious business might actually want.

The IBM PC, launched in August 1981, used the 8086's cousin the 8088 — same instruction set, but an 8-bit external bus, allowing IBM to use cheaper 8-bit memory chips on the motherboard. That single decision made x86 the architecture of personal computing. From the 8088 came the 80286 (the AT, 1984), the 80386 (1985), the 80486 (1989), the Pentium (1993), Pentium Pro/II/III, Pentium 4, Core, Core 2, i3/5/7, Xeon, all the way to today's Ryzen and Core Ultra. Forty-eight years of compatibility. No other commercial architecture has held that lineage so faithfully.

1.2 Main features of the 8086

16-bit data bus internal and external.
20-bit address bus (multiplexed with data on most pins) → 1 MB (2²⁰ bytes) addressable.
Clock: 5 MHz initially; 8 and 10 MHz parts followed.
Eight 16-bit general-purpose registers, four segment registers, instruction pointer, flags.
A 6-byte prefetch queue for instructions.
256-entry interrupt vector table at the bottom of memory.
Two operating modes selected at reset: minimum (single-CPU systems) and maximum (multi-CPU systems with bus arbiters).
About 29,000 NMOS transistors on a 40-pin DIP package.

For comparison: a 2026 smartphone SoC has tens of billions of transistors and runs at 3 GHz. The 8086 had 0.000005% as many transistors and ran 600 times slower. Yet the concepts in the 8086 — fetch-decode-execute, interrupts, segmented memory, prefetching — are still alive in that smartphone. The transistor count grew. The ideas mostly didn't.

1.3 Pin diagram (40-pin DIP, minimum mode)

plaintext

                  ┌─────────────┐
        GND ──┤ 1│             │40├── VCC (+5 V)
       AD14 ──┤ 2│             │39├── AD15
       AD13 ──┤ 3│             │38├── A16/S3
       AD12 ──┤ 4│             │37├── A17/S4
       AD11 ──┤ 5│             │36├── A18/S5
       AD10 ──┤ 6│             │35├── A19/S6
        AD9 ──┤ 7│             │34├── BHE/S7
        AD8 ──┤ 8│   8086 CPU  │33├── MN/MX
        AD7 ──┤ 9│             │32├── RD
        AD6 ──┤10│             │31├── HOLD
        AD5 ──┤11│             │30├── HLDA
        AD4 ──┤12│             │29├── WR
        AD3 ──┤13│             │28├── M/IO
        AD2 ──┤14│             │27├── DT/R
        AD1 ──┤15│             │26├── DEN
        AD0 ──┤16│             │25├── ALE
        NMI ──┤17│             │24├── INTA
       INTR ──┤18│             │23├── TEST
        CLK ──┤19│             │22├── READY
        GND ──┤20│             │21├── RESET
                  └─────────────┘

The pins fall into a few logical groups. AD0–AD15 (pins 2–16, 39): these are time-multiplexed address and data lines. During the first clock of a bus cycle they carry the low 16 bits of the 20-bit address; for the rest of the cycle they carry data. The CPU asserts ALE (Address Latch Enable, pin 25) for one clock during the address phase so external logic can latch the address into a 74LS373 transparent latch and free the pins to carry data. This multiplexing trick saved pins (a precious resource on a 40-pin package); modern CPUs have hundreds of separate pins and do not bother. A16/S3 through A19/S6 (pins 35–38): the upper four address lines, also multiplexed (with status bits S3–S6 in the data phase, and BHE on pin 34 for bus-high-enable on the upper byte). CLK (pin 19), RESET (pin 21), READY (pin 22): clock and timing. The CPU goes through a wait state — it stretches the bus cycle — whenever READY is held low, allowing slow memory or peripherals to keep up. INTR (pin 18), NMI (pin 17), INTA (pin 24): interrupt request lines. INTR is the maskable interrupt (the CPU listens to it only when its IF flag is set); NMI is non-maskable (always serviced — meant for catastrophic events like memory parity errors); INTA is the acknowledgment the CPU sends back to whoever interrupted it. MN/MX (pin 33): the mode select. Tie it high for minimum mode (the CPU itself drives the bus control signals — RD, WR, M/IO, DT/R, DEN, ALE — directly), or tie it low for maximum mode (the meaning of pins 24–31 changes; an external 8288 bus controller decodes status outputs and generates the same control signals, plus a bus arbiter 8289 for multi-CPU systems). HOLD/HLDA (pins 31, 30): for letting another bus master (typically a DMA controller) take over the bus.

1.4 The 8086 family tree to modern x86

plaintext

8086  (1978, 16-bit, 5 MHz, 1 MB)
  │
  ├── 8088 (1979, 8-bit external bus) ── IBM PC (1981)
  │
  └── 80186 (1982, integrated peripherals — DMA, timers, IC)
       │
       └── 80286 (1982, "Protected mode": 16 MB address, memory protection rings) ── IBM PC/AT
            │
            └── 80386 (1985, 32-bit, paging, virtual-8086 mode, 4 GB)
                 │
                 └── 80486 (1989, integrated FPU, 8 KB on-chip cache, 5-stage pipeline)
                      │
                      └── Pentium (1993, superscalar, branch prediction)
                           │
                           └── Pentium Pro/II/III (out-of-order, MMX, SSE)
                                │
                                └── Pentium 4 (deep pipeline, hyperthreading)
                                     │
                                     └── Core / Core 2 (multi-core, 64-bit via x86-64)
                                          │
                                          └── i3 / i5 / i7 / i9 / Xeon / Core Ultra ─ today

Each step preserved the previous step's instruction set as a subset and added new things. Real mode (8086 behavior) is still the boot mode of every x86 CPU manufactured today; the BIOS or UEFI firmware switches the CPU to protected and then long mode during early boot. So even your laptop, which runs 64-bit Linux or Windows in long mode at 4 GHz with 8 cores, starts by executing 16-bit 8086 instructions.

1.5 Internal architecture: BIU and EU as early pipelining

The 8086 is split internally into two units that operate in parallel:

plaintext

                 ┌────────────────────────────────┐
                 │           BIU                  │
                 │  (Bus Interface Unit)          │
                 │  ┌──────────────────────────┐  │
                 │  │ Segment registers        │  │
                 │  │  CS, DS, SS, ES (16-bit) │  │
                 │  ├──────────────────────────┤  │
                 │  │ Instruction Pointer (IP) │  │
                 │  ├──────────────────────────┤  │
                 │  │ Adder (segment×16+offset)│  │
                 │  ├──────────────────────────┤  │
                 │  │ 6-byte Prefetch Queue    │  │
                 │  └──────────────────────────┘  │
                 │  Bus control logic to pins     │
                 └────────────────────────────────┘
                                ▲
                                │ instructions
                                ▼
                 ┌────────────────────────────────┐
                 │           EU                   │
                 │  (Execution Unit)              │
                 │  ┌──────────────────────────┐  │
                 │  │ AX BX CX DX (16-bit)     │  │
                 │  │ SI DI BP SP (16-bit)     │  │
                 │  ├──────────────────────────┤  │
                 │  │ 16-bit ALU               │  │
                 │  ├──────────────────────────┤  │
                 │  │ Flags (CF PF AF ZF SF    │  │
                 │  │        TF IF DF OF)      │  │
                 │  ├──────────────────────────┤  │
                 │  │ Instruction Decoder      │  │
                 │  └──────────────────────────┘  │
                 └────────────────────────────────┘

The Bus Interface Unit (BIU) owns the world outside the chip. It holds the segment registers (CS, DS, SS, ES), the instruction pointer (IP), and a 6-byte queue of pre-fetched instruction bytes. Whenever the bus is idle and the queue has space, the BIU issues a memory read at the address CS:IP, fetches a byte, dumps it into the queue, increments IP. It does this autonomously, not when the EU asks for it.

The Execution Unit (EU) owns the inside. It pulls the next instruction byte from the prefetch queue, decodes it, optionally fetches one or two more bytes for operands, and executes — using the general-purpose registers, the ALU, the flags. When the EU needs data from memory (say, a MOV AX, [BX]), it asks the BIU to do a memory read on its behalf.

Why this matters. In a chip without prefetching, every instruction takes (instruction fetch time) + (decode time) + (execute time), end-to-end, because the fetch only starts when the previous instruction finished. The 8086's BIU can fetch the next instruction while the EU is still chewing on the current one, so as long as the EU runs slower than the BIU (which it usually does — many instructions take several clocks to execute), the fetch is "hidden" inside the execution time. This is the embryonic form of pipelining. Modern CPUs use the same idea, except they have 15–20 stage pipelines, fetch wider chunks (32+ bytes at once), have multiple decoders running in parallel, and dispatch micro-ops out of order to multiple execution units. The principle was already there in 1978.

The prefetch queue gets flushed whenever a branch is taken. The BIU has been stuffing the queue with bytes from past CS:IP; when a JMP modifies IP, those bytes are now stale, and the BIU has to refill from the new address. Modern CPUs hide this with branch prediction, but the 8086 just paid the cost.

1.6 The register set in detail

General-purpose 16-bit registers (each accessible as a full 16-bit, or as two 8-bit halves):

plaintext

   AX  =  AH | AL     ; Accumulator. Default for many ALU ops, port I/O, MUL/DIV.
   BX  =  BH | BL     ; Base. Default base address register for [BX] indirection.
   CX  =  CH | CL     ; Count. Default for LOOP, REP, shift counts.
   DX  =  DH | DL     ; Data. Used in MUL/DIV (high half), port I/O addresses (>255).

Pointer/index registers (16-bit only, no halves):

plaintext

   SI                 ; Source Index. String operations source.
   DI                 ; Destination Index. String operations destination.
   BP                 ; Base Pointer. Default base for stack-frame access.
   SP                 ; Stack Pointer. Top of stack.

Segment registers (16-bit, used to compute physical addresses):

plaintext

   CS                 ; Code Segment.  Instruction fetches use CS:IP.
   DS                 ; Data Segment.  Default for data references.
   ES                 ; Extra Segment. Default destination for string ops (with DI).
   SS                 ; Stack Segment. Stack accesses use SS:SP and SS:BP.

Special:

plaintext

   IP                 ; Instruction Pointer (cannot be written directly except via JMP/CALL).
   FLAGS              ; 16-bit status word (only 9 of 16 bits used in 8086).

FLAGS bits (the ones that matter):

Bit	Name	Set when
0	CF (Carry)	Last add carried out / last sub borrowed.
2	PF (Parity)	Low byte of result has even number of 1s.
4	AF (Aux)	Carry between bits 3 and 4 (BCD arithmetic).
6	ZF (Zero)	Last result was zero.
7	SF (Sign)	Last result had MSB = 1 (negative two's-complement).
8	TF (Trap)	If set, single-step trap after every instruction (used by debuggers).
9	IF (Interrupt enable)	If set, INTR is honored.
10	DF (Direction)	String ops increment SI/DI (DF=0) or decrement (DF=1).
11	OF (Overflow)	Signed-arithmetic overflow.

The CPU updates these flags as a side effect of every arithmetic and logical instruction. Conditional jumps then read the flags to decide whether to branch.

1.7 Segmentation: the hack that gave us 1 MB from 16-bit registers

Here is one of the great real-world engineering hacks, and a source of misery for an entire generation of programmers. The 8086 has 16-bit registers but a 20-bit address bus. Sixteen bits hold values from 0 to 65,535 — only 64 KB. But the address bus is 20 bits — addresses 0 to 1,048,575, a full megabyte. How do you specify a 20-bit address using only 16-bit registers?

Intel's answer: segmented addressing. Every memory address is given as two parts:

$\text{physical address} = \text{segment} \times 16 + \text{offset}$

The segment is a 16-bit value from one of the segment registers (CS, DS, SS, ES). The offset is a 16-bit value from a pointer register or direct address. The CPU's address adder does the math: shift segment left by 4 bits (multiply by 16), then add the offset. Result: a 20-bit physical address.

Worked derivation. Suppose CS = 0x1234 and IP = 0x5678. Where does the CPU fetch its next instruction?

plaintext

   CS  = 0001 0010 0011 0100                          (16 bits)
   shift left 4:
       = 0001 0010 0011 0100 0000                     (20 bits)
       = 0x12340
 
   IP                          0101 0110 0111 1000    (16 bits)
   add:
   physical = 0001 0111 1001 1010 1000               (20 bits)
            = 0x179B8

So the next instruction is fetched from physical address 0x179B8. The notation 1234h:5678h always refers to this combination.

The clever part: with 16 bits each for segment and offset, you can address $2^{16} \times 16 + 2^{16} =$ about $2^{20}$ bytes — which is exactly the 1 MB of address space. There is overlap: the same physical address can be written many ways (e.g., 0000:0010 and 0001:0000 both equal physical 0x10), but the CPU does not care.

Why it became infamous. Programs larger than 64 KB had to deal with segmentation explicitly. To copy data between two regions in different segments, you had to load DS, do the read, load ES, do the write — fiddly. The 64 KB segment limit haunted PC programmers all through the 1980s and gave us "memory models" like tiny, small, medium, compact, large, huge, each with different code/data segment rules. When the 80386 added 32-bit flat addressing, programmers everywhere wept with relief. Modern x86-64 still has segmentation, but in long mode the segment bases are forced to 0 and the segments span the full address space, effectively neutralizing them. Segmentation is now a fossil.

Security angle. Segmentation was an early form of memory protection — the 80286 onwards added access rights per segment (read-only, execute-only). But because programs could shoot at any segment they liked in real mode, segmentation gave essentially no protection on the 8086 itself. Modern protection is via paging and per-page permissions in protected/long mode.

1.8 Interrupts: 256 doors into the kernel

The 8086 supports 256 interrupt types, numbered 0 through 255. Each has an entry in the Interrupt Vector Table (IVT) at physical address 0x00000–0x003FF. Each entry is 4 bytes: 2 bytes for the offset (new IP), 2 bytes for the segment (new CS).

plaintext

   Physical addr:  00000  00002  00004  00006  ...   003FE
                  ┌─────┬─────┬─────┬─────┬─    ─┬─────┐
   IVT entries:   │IP_0 │CS_0 │IP_1 │CS_1 │ ...  │CS255│
                  └─────┴─────┴─────┴─────┴─    ─┴─────┘
                   vec0          vec1                  vec255

When interrupt $n$ fires, the CPU:

Pushes FLAGS onto the stack.
Clears IF (so further INTR don't preempt) and TF.
Pushes CS, then IP.
Loads CS from [n*4 + 2] and IP from [n*4].
Begins executing at the new CS:IP.

The handler ends with IRET, which pops IP, CS, and FLAGS in reverse — restoring the interrupted program and re-enabling interrupts.

Interrupt sources fall into three categories. Hardware interrupts come from physical pins: NMI (pin 17, vector 2, non-maskable, used for catastrophic hardware faults) and INTR (pin 18, vector supplied by the device or interrupt controller, maskable via the IF flag). Software interrupts come from the INT n instruction, which deliberately triggers vector $n$ . DOS used INT 21h for system calls; the BIOS used INT 10h for video, INT 13h for disk, and so on. Exceptions are CPU-generated faults: divide-by-zero (vector 0), single-step (vector 1, fired after every instruction when TF=1), breakpoint (vector 3, the INT 3 debugger trap), into-overflow (vector 4).

rendering diagram...

Security angle. Software interrupts were the original system-call mechanism. Pre-Pentium-II Linux used INT 80h. Modern x86 uses SYSCALL/SYSENTER for performance, but INT 80h still works on x86. Anyone who has read a buffer-overflow exploit for old Linux has seen int 0x80 after the shellcode-builder set up eax for execve. The interrupt-vector mechanism is the kernel-mode entry point. Tampering with the IVT was, on DOS, the way to install a TSR (terminate-and-stay-resident) program — also the way DOS viruses hooked themselves into the system. Modern protected mode replaces the flat IVT with the Interrupt Descriptor Table (IDT) with proper privilege checks, but the concept of "vector number → handler" is the same.

1.9 8086 system timing: T-states

A bus cycle on the 8086 takes 4 clock cycles, called T1, T2, T3, T4.

T1: CPU drives address on AD0–AD15, A16–A19, BHE. ALE pulses high for the latch.
T2: AD lines float; RD or WR is asserted; if memory is slow, T-WAIT states are inserted (controlled by READY).
T3: Data is read from or written to AD lines.
T4: Bus released.

If the addressed device is slow, it pulls READY low during T2, and the CPU inserts wait states (Tw) until READY goes high. This is how slow ROMs and peripherals coexist with the CPU.

1.10 Min mode and max mode

plaintext

                MN/MX̄ tied HIGH             MN/MX̄ tied LOW
                ┌───────────────┐            ┌───────────────┐
   8086 ──pin29│ WR (write)    │   8086 ───┤ S0/S1/S2 status │
        ──pin28│ M/IO (mem-IO) │       │   │ to 8288 bus     │
        ──pin27│ DT/R          │       │   │ controller      │
        ──pin26│ DEN           │       │   │ which generates │
        ──pin25│ ALE           │       │   │ MRDC, MWTC,     │
        ──pin24│ INTA          │       │   │ IORC, IOWC, ALE │
                └───────────────┘            └───────────────┘
                  Minimum mode:               Maximum mode:
                  one CPU, no arbiter          multi-CPU; 8289
                                                arbiter handles bus

In minimum mode, the 8086 itself drives the bus control signals. Simple, single-CPU systems use this. The original IBM PC's 8088 was in minimum mode.

In maximum mode, a few of the same pins instead carry encoded status bits (S0, S1, S2). An external 8288 bus controller decodes these and produces the actual memory/I-O read/write strobes (MRDC, MWTC, IORC, IOWC). An 8289 bus arbiter handles bus contention when multiple CPUs share the same bus. Multiprocessor 8086 systems used max mode. Educational labs sometimes do too, because the 8288/8289 chips are pedagogically interesting.