>
section 2 of 918 min read

2. Memory Technology: Where Bits Actually Live

A digital chip without memory is a calculator. A digital chip with memory is a computer. Every modern system has multiple kinds of memory: register files in the CPU, cache in SRAM, main memory in DRAM, persistent storage in NAND flash, configuration in EEPROM, code in NOR flash, lookup tables in ROM. Each technology trades capacity, speed, persistence, cost, and power differently. This section walks each, the cell, the dynamics, and the security angle.

2.1 SRAM: the 6T cell

SRAM (Static RAM) stores each bit in a pair of cross-coupled inverters that hold each other's state. Two access transistors gate the cell to a pair of bit lines. Six transistors per bit, hence "6T cell."

plaintext
              VDD                       VDD
               │                         │
              [P1]                      [P2]
               │                         │
       ────[N3]┤Q                      Q'├[N4]──
       │       │                         │       │
       BL     [N1]                      [N2]    BL_bar
               │                         │
              GND                       GND
                       WL (word line)

                          v (gates of N3, N4)

The two cross-coupled inverters have two stable states: Q high / Q' low (a stored 1) and Q low / Q' high (a stored 0). The inverters pump current into each other to maintain the state. Static means no refresh is needed; as long as the supply is on, the bit stays.

To read: assert the word line WL. Access transistors N3 and N4 connect Q to BL and Q' to BL_bar. A sense amplifier on the bit lines detects which line is being pulled lower (by the cell's storage) and reads out the stored value.

To write: drive BL and BL_bar to the desired values, then assert WL. The bit-line drivers are stronger than the cell's internal feedback and force the cell to flip into the new state.

The 6T cell wins on speed: a few hundred picoseconds per access, the fastest memory technology in production. The cost is density: six transistors per bit. Compared with DRAM's single-transistor cell, SRAM uses about 6× the area per bit. So SRAM is fast and expensive. Where do we use it?

  • CPU register files. The 32 or 128 architectural registers each map to an SRAM cell.
  • CPU caches. L1, L2, L3. On a modern Intel server CPU, the cache memory occupies more than half the die area, and it is all SRAM.
  • FPGA block RAMs. Embedded SRAM blocks (typically 18 Kbit or 36 Kbit) inside the FPGA fabric.
  • Microcontroller scratch RAM. A few kilobytes of SRAM for variables on small chips.
  • Network-switch lookup tables. SRAM stores the routing table; SRAM fast enough to look up the destination MAC at line rate.

Hardware-security tie-in: SRAM PUFs. When you power up an uninitialized SRAM cell, the cross-coupled inverters race to a steady state. Manufacturing variations make one inverter slightly stronger than the other, biasing the cell toward 0 or 1. The pattern of biases across an SRAM array is unique to that chip — a silicon "fingerprint" — and reproducible across power cycles. Physically Unclonable Functions (PUFs) use this: the chip's startup pattern is its identity, never written to memory and impossible to clone without re-fabricating the silicon at the same atomic-scale variations. Used for device authentication in IoT, key derivation in TPMs, anti-counterfeit tagging. The first successful commercial product was in 2008 (Intrinsic-ID).

2.2 DRAM: the leaky bucket

DRAM (Dynamic RAM) stores each bit as charge on a tiny capacitor. One transistor connects the capacitor to a bit line. One transistor and one capacitor per bit — the 1T1C cell.

plaintext
              WL (word line, gate of T)

       BL ──────[T]

                  ├── Cs (storage capacitor)

                 GND

To write: drive BL to 0 or VDDV_{DD}, assert WL, charge or discharge Cs.

To read: pre-charge BL to VDD/2V_{DD}/2, assert WL. The capacitor's stored charge slightly tugs BL up (if the cell stored a 1) or down (if it stored a 0). A sense amplifier amplifies this tiny difference. The read is destructive: the storage cap's charge equilibrates with the bit-line capacitance, and the original signal is gone. The sense amp's output then writes the value back into the cell automatically. This write-after-read is a fundamental property of DRAM and the reason DRAM is slower than SRAM.

Why one transistor instead of six? Density. SRAM needs four transistors for the cross-coupled inverters plus two for access. DRAM needs one transistor and one capacitor. The capacitor takes some area, but with modern trench or stacked-capacitor processes, the total cell area is roughly 6F26F^2 where FF is the minimum feature size. SRAM is 120F2\sim 120 F^2. That is a 20× density advantage for DRAM. Your 16 GB DDR4 stick is possible only because DRAM is so much smaller per bit.

The catch: capacitors leak. Even the tiny storage cap, isolated by the access transistor and surrounded by silicon dioxide, loses charge to leakage currents (gate leakage, junction leakage, sub-threshold conduction). After enough time, a stored 1 droops below the sense-amp threshold and reads as 0. The bit forgets itself.

Leaky-bucket analogy. Every DRAM bit is a bucket with a slow leak. Fill it to the top (write a 1) and within seconds, water trickles out. Read the bucket and refill it before it empties. Forget to refill, and the bucket reads empty even though you stored a full bucket. Refilling is refresh.

2.2.1 Deriving the refresh interval

Every DRAM has to be refreshed periodically. How often? Set up the leakage:

The cap holds VDD/2V_{DD}/2 (high state, after sense amp boost) or VDD/2-V_{DD}/2 (low) on storage capacitance CsC_s. Leakage current IleakI_{leak} drains the cap. For the bit to read correctly, the voltage must remain above VDD/2+VmarginV_{DD}/2 + V_{margin} where VmarginV_{margin} is the sense-amp's reliable threshold (a few hundred millivolts).

Allowable voltage drop: ΔV=VDD/2Vmargin\Delta V = V_{DD}/2 - V_{margin}.

Time to leak away that much charge:

tref=CsΔVIleakt_{ref} = \frac{C_s \Delta V}{I_{leak}}

For typical DDR4 numbers: Cs25C_s \approx 25 fF, ΔV0.4\Delta V \approx 0.4 V, Ileak100I_{leak} \approx 100 fA per cell (at 85 °C; less at lower temperatures). Plug in:

tref=25×1015×0.4100×1015=100mst_{ref} = \frac{25 \times 10^{-15} \times 0.4}{100 \times 10^{-15}} = 100\,\text{ms}

Industry pads this with a comfortable margin and specifies refresh every 64 ms (every cell, at the worst-case temperature). At higher temperatures, leakage doubles every 10 °C, and "extended temperature range" DDR4 chips refresh every 32 ms.

Refresh consumes time and energy. A DDR4 chip spends about 5% of its bandwidth on refresh, and that fraction grows as densities scale: DDR5 introduced fine-granularity refresh and same-bank refresh to mitigate the loss.

Hardware-security tie-in: cold-boot attacks. The 64 ms spec assumes 85 °C. Cool DRAM to -50 °C with a can of compressed air (sprayed upside-down so the propellant boils) and leakage drops to picoamperes; bits hold their state for minutes, not milliseconds. An attacker who steals a powered-on laptop, freezes the RAM, removes it, and transplants it into their own system can recover the contents — including the FDE encryption key still resident in memory. Halderman et al., 2008. Mitigations: encrypt RAM contents (Intel TME, AMD SME), wipe keys before suspend, or use SRAM-only secure enclaves whose state is destroyed instantly on power loss.

2.2.2 DRAM organization: rows, columns, banks

A DRAM chip is not a flat array. It is organized in a hierarchy: chip, bank, row, column.

  • Bit lines run vertically; word lines run horizontally.
  • A row is the set of cells sharing a word line. To access any cell in a row, you assert the word line and all cells in the row dump their charge onto the bit lines simultaneously. A row of sense amps reads the entire row at once into the row buffer.
  • The row stays in the row buffer until you close it. Read or write to any column in that row is fast (just selecting a sense amp output).
  • A bank is a group of rows that share one row buffer. Multiple banks let you have multiple rows open at once (in different banks).

Access sequence: send the bank and row address (RAS, "row address strobe"), wait tRCDt_{RCD} ns, send the column address (CAS), wait tCASt_{CAS} ns, get data. To switch rows in the same bank: precharge (tRPt_{RP}), then RAS again. Modern DDR4 has tRCD=14t_{RCD} = 14 ns, tCAS=14t_{CAS} = 14 ns, tRP=14t_{RP} = 14 ns; the famous "14-14-14" timings.

DDR4 takes this architecture and pumps two transfers per clock (data on both edges). DDR5 goes to two channels per DIMM and on-die ECC. LPDDR5 is the mobile variant with low-power features (deep power-down, partial-array refresh).

rendering diagram...

2.2.3 Rowhammer

Adjacent rows in a DRAM array sit so close together that repeated activation of one row capacitively couples charge onto adjacent rows, accelerating their leakage. Hammer one row hundreds of thousands of times within the refresh interval, and bits in neighboring rows flip.

This was first reported in 2014 (Kim et al., "Flipping Bits in Memory Without Accessing Them"). Within months, Project Zero demonstrated weaponized Rowhammer that escalated to root on Linux by flipping bits in page-table entries. Subsequent work (Drammer, Rambleed, RAMBleed Plus, Half-Double) generalized it to mobile RAM, ECC RAM, and remote attacks over JavaScript.

Mitigations:

  • TRR (Targeted Row Refresh). DRAM controllers detect aggressive row activations and refresh nearby rows preemptively.
  • ECC. Single-bit-correct codes catch many Rowhammer flips, but multi-bit flips slip through.
  • Row activation counters. DDR5 and LPDDR5 added per-row activation counters and PASR (Per-Activation Self-Refresh).
  • Software mitigations. Memory layout shuffling, isolating sensitive pages from attacker-controlled pages.

Rowhammer is a working example of how analog physics breaks digital abstraction. We will return to it whenever we discuss memory.

2.3 SRAM vs DRAM: a side-by-side

PropertySRAMDRAM
Cell6 transistors1 transistor + 1 cap
Cell area~120 F²~6 F²
Readnon-destructivedestructive (needs write-back)
Refreshnoneevery 64 ms
Access time<1 ns10-50 ns
Cost per bithighlow
Volatileyesyes
Used incaches, registersmain memory

Both are volatile: lose power and the data is gone within microseconds (SRAM) or seconds (DRAM, depending on temperature). For non-volatile storage we move to ROM family.

2.4 ROM family: read-mostly memory

ROM (Read-Only Memory) holds bits without power. Several technologies.

Mask ROM. Bits are encoded as physical patterns in the chip's metal layer at fabrication time. Cheapest in volume (the ROM data is part of the chip's photomask), but fixed for the life of the chip. Used for boot ROMs in mass-produced consumer electronics where the firmware is finalized and the production run is millions. The Nintendo Entertainment System cartridges, the Game Boy boot ROM, the ROM tables of every CRT TV's microcontroller.

PROM (Programmable ROM). Field-programmable once. Each bit is a tiny fuse; programming "blows" the fuse with a high current. Nichrome fuses and polysilicon fuses are the historic technologies. Wasteful but simple; the programming equipment is just a high-current driver. Used for one-off configurations and as eFuses inside modern chips for one-time settings (boot mode selection, debug-port lock).

EPROM (Erasable PROM). Charge-trap floating-gate transistor. The floating gate sits inside oxide, isolated. Inject electrons onto the floating gate by high-voltage hot-carrier injection (the cell stores a 0); the trapped electrons raise the threshold voltage of the transistor, which now reads as off. To erase, expose the chip to ultraviolet light through a quartz window on top: UV photons excite trapped electrons, kick them off the floating gate, and the cell returns to 1. The famous EPROM with a window — every retro-computer pre-1990 had one staring up at you.

EEPROM (Electrically Erasable PROM). Same floating gate, but with a thinner oxide layer (the tunnel oxide) under the gate that allows Fowler-Nordheim tunneling: applying a high voltage across the thin oxide drives electrons on or off the floating gate by quantum tunneling. Erase by tunneling rather than UV. Byte-level program and erase. Slower than RAM, but writeable in-circuit. Used in microcontroller config storage (the EEPROM block in an ATmega328 or PIC18) for a few KB of persistent settings.

Hot-carrier injection vs Fowler-Nordheim tunneling. Both are ways to push electrons onto a floating gate, but they work differently. Hot-carrier injection runs current through the channel of the transistor; some electrons gain enough energy from the channel field to "jump" up into the floating gate. Fast but stresses the device. Fowler-Nordheim tunneling applies a strong electric field across the thin tunnel oxide and lets electrons quantum-tunnel through. Slower per cell, gentler on the device. EPROM uses HCI to program, UV to erase. EEPROM and Flash use FN tunneling for both program and erase, often with HCI as an alternate write mechanism.

Flash memory. Like EEPROM but the unit of erase is a block (typically 4 KB to 256 KB) rather than a byte. Smaller cells (no per-byte erase circuitry needed), so denser. Two flavors:

  • NOR Flash. Cells in parallel, like NOR gates. Random access (any byte readable in tens of nanoseconds), but slow write and slow erase. Used for execute-in-place code storage: the CPU can fetch instructions directly from NOR flash. BIOS chips, microcontroller program memory, network-switch firmware.
  • NAND Flash. Cells in series, like NAND gates. Higher density (cells share contacts), but reads come in pages (a few KB at a time), not random bytes. Writes/erases are even slower. NAND has two reliability quirks: wear (each cell tolerates only thousands to millions of program/erase cycles before the tunnel oxide degrades) and bit errors (cells lose retention over time and need ECC). Used in: SD cards, USB sticks, eMMC, SSDs, mobile-phone storage. Every smartphone and SSD on Earth is NAND flash.

NAND flash today is 3D NAND: cell stacks vertical channels around 96+ tiers high, multiplying density without shrinking the lithography. Single chips hold a terabit each. Densities have grown 30× in 10 years.

MemoryCellsReadWriteEraseEnduranceUsed in
Mask ROMmetal mask30-50 nsn/an/ainfiniteboot ROMs
PROMfuses30-50 nsonce, slown/a1 cycleeFuses
EPROMfloat gate50 nsslow, in programmerUV, 20 min~100 cyclesretro
EEPROMfloat gate100 ns1 ms / byte1 ms / byte100k-1M cyclesMCU config
NOR Flashfloat gate50-100 nsslow100 ms / sector100k cyclescode
NAND Flashfloat gateμ\mus / pageμ\mus / pagems / block1k-10k cyclesbulk storage
FRAMferroelectric50 ns50 nsincluded101410^{14} cycleslogging
MRAMmagnetic5-30 ns5-30 nsincluded101510^{15} cyclesaerospace

Hardware-security tie-in: flash data remanence. Erased flash cells are not always cleanly empty. Some residual charge can remain on the floating gate. Specialized lab techniques (electron microscopy, scanning probe microscopy) can read residual states even after a sector erase. Secure-erase procedures perform multiple program/erase cycles or write a known pattern before erasing, increasing the erase fidelity. NIST SP 800-88 specifies acceptable wipe procedures. Many drives also support cryptographic erase: the drive holds a master key, all data is encrypted, and "erase" simply discards the key. Quick but only as strong as the key storage.

Old microcontrollers with EPROM-based code-protect fuses were defeated by exposing the chip to UV through unmarked spots on the package. The same UV that erased EPROM data also erased the lock fuses, freeing the firmware for dumping. PIC microcontrollers from the 1990s were notoriously vulnerable.

2.5 Memory organization: word and bit dimensions

A 1 Mbit memory chip is just a 2D array of cells: 2102^{10} rows × 2102^{10} columns, addressed by 10 row and 10 column bits. Not all chips give you parallel access to a row; many have an external word width of 1, 4, 8, 16, 32 bits.

A "256 K × 8" SRAM is a chip with 256 K addresses, each holding one byte. Internally, the cells are organized as roughly 292^9 rows × 2122^{12} columns, where each row read produces 8 bits in parallel.

To expand the word width: put two chips side by side. Tie their address lines together. Read both in parallel. A 256 K × 8 chip pair becomes a 256 K × 16 memory. Easy.

To expand the address space: use a chip-select decoder. Put two 256 K × 8 chips in series along the address axis. Use the high address bit to select which chip. Two 256 K × 8 chips become 512 K × 8.

plaintext
   CPU's 19-bit address

        ├── A0..A17 ──────────────────────┐
        │                                 │
        └── A18 ──[1-to-2 decoder]        │
                       │  │               │
                       │  └─/CS for chip2 │
                       │                  │
                       └─/CS for chip1    │

   Chip1 (low half):  A0..A17, /CS1, D0..D7
   Chip2 (high half): A0..A17, /CS2, D0..D7

Bigger systems generalize. A motherboard with four DIMM slots has chip-select logic that turns the high-order address bits into one DIMM-select line each. The processor's memory controller does the decoding internally.

Address decoders are usually built from a tree of NAND/NOR gates or, for very wide decoders, a hierarchical decoder with predecode stages. In modern chips they are integrated into the memory's row/column logic and not separately specified.

2.6 Sense amplifiers: the analog heart of digital memory

A sense amp is a small analog differential amplifier sitting between bit lines and the output. Its job: detect the tiny voltage difference (a few hundred millivolts at best, sometimes only tens of millivolts) between a cell's bit line and the precharge reference, and slam the difference into a full-rail digital signal.

Sense amps matter because cells are tiny and their drive is weak. A 25 fF storage cap on a bit line shared with hundreds of other cells (combined bit-line capacitance ~100 fF) only swings the bit line by

ΔVBL=VcellCsCs+CBL=VDD/22525+100100mV\Delta V_{BL} = V_{cell} \cdot \frac{C_s}{C_s + C_{BL}} = V_{DD}/2 \cdot \frac{25}{25+100} \approx 100\,\text{mV}

The sense amp catches this 100 mV signal in the presence of mismatch, noise, and supply variation, and amplifies it. A bad sense amp gives unreliable reads; a good sense amp pushes the technology closer to its theoretical limits.

Sense amps are also the reason DRAM reads are destructive: the act of sensing draws the cell's charge into the bit line, after which the sense amp regenerates it and writes the row back. Without that automatic write-back, the cell would be empty after every read.

2.7 CAM, FIFO, and register files

Three specialized memory structures worth meeting.

CAM (Content-Addressable Memory). A reverse memory: instead of "give me the data at this address," you ask "what address holds this data?" Each cell has a comparator; assert a search word and every cell that matches asserts a match line. Used in network switches' MAC-address lookup tables (forwarding decisions in nanoseconds), CPU TLBs (translating virtual to physical addresses), and pattern-matching engines.

FIFO (First-In First-Out). Memory with two pointers (read and write). Push data at the write pointer, pop from the read pointer. Used as buffers between two clock domains running at different rates: USB host writes 60 MB/s into a FIFO; the destination peripheral reads at its own rate. Implemented as a small dual-port SRAM with associated pointer logic.

Register file. Multi-port SRAM, typically with two read ports and one write port. The CPU's architectural register set lives here. Two read ports means the ALU can fetch both source operands in one cycle. Custom layouts make register files denser and faster than ordinary SRAM.

2.8 Memory hierarchy preview

Different memory technologies trade speed and capacity. Real systems combine them in a hierarchy, with the fastest near the CPU and the slowest farthest away:

rendering diagram...

The factor of ten between each level is no coincidence — it is the cost-vs-speed tradeoff of the underlying technologies. We will explore this in detail in Chapter 14 (Computer Architecture and Memory Hierarchy). For now, note that the memory cells we have just met (registers in CMOS register file, SRAM in caches, DRAM in main memory, NAND in SSD) all show up in their natural rung of the ladder.