4.1 Memory-mapped vs port-mapped I/O
How does the CPU talk to a UART chip or a network card? Two approaches:
Memory-mapped I/O. The device's control and data registers appear at specific memory addresses. To read the UART receive register, do a load from address (or wherever the SoC vendor put it). The same LDR/STR instructions work for memory and for devices. Used by ARM, RISC-V, modern x86 (PCIe configuration space, MMIO BARs). The dominant approach today.
Port-mapped I/O. A separate address space for I/O, accessed by special instructions (IN and OUT on x86). Address spaces don't overlap. Legacy on x86 (parallel port at , keyboard controller at /, PIC at /). Modern x86 still supports it for backward compatibility but new devices use MMIO.
Memory-mapped wins because it unifies the instruction set. The MMU and cache handling can apply to device addresses (with non-cacheable attributes for device regions, so writes go through immediately).
4.2 Programmed I/O: the polling loop
The simplest way to get data from a device:
while (!(uart->status & RX_READY)) { /* spin */ }
char c = uart->data;The CPU asks the device "are you ready?" over and over until it answers yes. Then it reads the data. Simple, correct, and a colossal waste of CPU cycles for slow devices. A 9600-baud UART produces a byte every ms; a 3 GHz CPU could be doing 3 million other operations in that millisecond, but the polling loop wastes them.
Polling is right when:
- The device is fast and the CPU has nothing else to do.
- The latency budget is tight and you cannot afford an interrupt.
- The system is simple enough that the polling loop is the entire program (a bare-metal sensor reader).
4.3 Interrupt-driven I/O
The smarter pattern: the device interrupts the CPU when it has something to say.
Device: "I have data ready!" ──asserts IRQ──▶ CPU
CPU: "Hold on, finishing this instruction..."
CPU: pushes PC, status flags onto stack
CPU: jumps to interrupt service routine (ISR) in the interrupt vector table
ISR: reads device, copies byte to a queue, signals waiting task
CPU: pops PC, status, resumes original workThe CPU does productive work between events instead of spinning. Interrupts are asynchronous (they can happen at any cycle), which means the hardware needs to:
- Recognize the IRQ on a clock edge.
- Finish the currently-executing instruction (so the saved PC points to a clean boundary).
- Save enough state so the ISR can run without trampling the interrupted code's data.
- Look up the ISR address in a vector table indexed by IRQ number.
- Jump to the ISR.
When the ISR finishes, it executes a "return from interrupt" instruction (RTI, RETI, RFE) that restores the saved state and resumes.
Vectored interrupts and priorities
Multiple devices share IRQ lines through a controller chip:
- PIC (Programmable Interrupt Controller). Original IBM PC: 8259 chip, 8 IRQ lines, cascadable.
- APIC (Advanced PIC). Modern x86, more lines, message-signaled interrupts.
- GIC (Generic Interrupt Controller). ARM systems-on-chip.
- NVIC (Nested Vectored Interrupt Controller). ARM Cortex-M; integrates priority handling.
The controller does priority arbitration: if a higher-priority interrupt arrives during a lower-priority ISR, the lower ISR is itself interrupted (nested interrupts). Each priority level has its own vector slot.
Hardware-security tie-in. Interrupts can leak timing. A spy process measuring its own scheduling latency can detect when the OS handles an interrupt for a victim process (Spectre-class side channels). Real-time systems often disable interrupts during cryptographic operations to avoid leaking through this channel.
4.4 DMA: Direct Memory Access
Some devices generate data fast (a 1 Gbps network card, an NVMe SSD at 3 GB/s, a sound card). Even with interrupts, having the CPU copy each byte is too slow. Direct Memory Access (DMA) lets the device move data to/from memory without CPU involvement.
The flow:
- CPU sets up a DMA descriptor: source address, destination address, length, direction.
- CPU tells the DMA controller to start.
- DMA controller takes the bus and moves data (one byte / word per cycle, or a burst).
- CPU is free to execute other instructions in the meantime.
- When the transfer completes, DMA fires an interrupt.
- CPU handles completion in the ISR.
DMA modes
- Burst mode. DMA grabs the bus and transfers everything in one shot. Fast but locks out the CPU.
- Cycle stealing. DMA transfers one word per cycle, alternating with CPU bus cycles. Smoother latency for the CPU, slower DMA.
- Transparent (hidden) mode. DMA only takes the bus when the CPU is not using it (during instruction-decode cycles, etc.). Best behavior, requires close coordination.
DMA is everywhere: every disk controller, network card, USB controller, GPU memory transfer, audio codec, ADC streaming.
Hardware-security tie-in. DMA bypasses normal CPU-mediated access checks. A malicious peripheral plugged into Thunderbolt or PCIe can use DMA to read all of system memory, including kernel keys. DMA attacks are a well-documented class. Mitigations:
- IOMMU. A second MMU between devices and memory translates and checks device DMA requests. Linux supports it via VFIO and DMAR; macOS and Windows have similar.
- Bus mastering is restricted to known-safe device classes by default in modern OSes.
- Cold-boot DMA attacks (an attacker brings a hostile laptop to a powered system) get harder when the OS programs the IOMMU correctly.
4.5 Bus standards in the wild
A quick survey of the buses you will meet:
- PCI / PCIe. The dominant expansion bus on PCs and servers. PCIe Gen 5 hits 32 GB/s per direction per lane, lanes bundled into x1, x4, x8, x16 slots.
- USB. The friendly outside-world bus. USB 3.2 hits 10 Gbps; USB4 hits 40 Gbps.
- I²C. Two-wire low-speed inter-chip bus. Most sensors, EEPROMs, real-time clocks. 100 kHz or 400 kHz typical.
- SPI. Four-wire fast bus. Flash chips, SD cards, displays. 50-100 MHz.
- CAN. Robust automotive bus. Differential, multi-master, used in your car's engine, brakes, infotainment.
- AXI / AHB / APB. ARM AMBA family. Internal SoC buses connecting cores, caches, memory controllers, peripherals.
- UPI / Infinity Fabric / NVLink / CXL. High-speed coherent interconnects between CPUs, between CPU and accelerator, in modern data centers.
Each has its own electrical layer, framing, arbitration, and error detection. The architectural ideas (master/slave, point-to-point vs shared, priorities) recur.