>
section 9 of 223 min read

9. Sheet Resistance, Capacitance, Wire Delay

When wires were short (in old nodes), wire delay was negligible compared to gate delay. Today wires often dominate, and the layout strategy is largely about wire management.

9.1 Sheet resistance

For any thin film, the resistance of a rectangular slab is

R=ρLWt=RsLW,Rs=ρ/tR = \rho \frac{L}{Wt} = R_s \frac{L}{W}, \qquad R_s = \rho/t

where ρ\rho is resistivity, tt is film thickness, and RsR_s is the sheet resistance in ohms per square. The "per square" refers to the dimensionless ratio L/W: a square of any side has resistance RsR_s.

Typical RsR_s:

  • Polysilicon: 20 to 40 Ω/square (heavily doped).
  • Salicided poly: 5 to 10 Ω/square (a thin metal silicide reduces resistance).
  • Diffusion: similar to salicide.
  • Metal 1 to 4 (thin): 0.1 to 0.3 Ω/square.
  • Metal 9 to 12 (thick): 0.02 to 0.05 Ω/square.

You see immediately why long signal wires belong on upper metals and why poly is bad for anything but short connections.

9.2 Capacitance per unit area

Each metal layer has capacitance to:

  • The substrate or lower metals (vertical, "to ground" cap).
  • Adjacent wires on the same layer (horizontal, "coupling" cap).

Coupling cap is the dominant one in modern designs and is the source of crosstalk noise. The fix is to keep parallel wires short or to insert shield wires (fixed to V_DD or GND) between them.

9.3 RC delay of a wire

Approximate the wire as a distributed RC line. The delay scales as:

twire0.69RwireCwire=0.69Rs(L/W)cLW=0.69RscL2t_{wire} \approx 0.69 \cdot R_{wire} \cdot C_{wire} = 0.69 \cdot R_s (L/W) \cdot c \cdot L W = 0.69 R_s c L^2

The squared dependence on length is the fundamental problem. Doubling the length of a wire quadruples its delay. This is why long buses are broken up by repeaters every few millimeters.

9.4 Buffer chains and the e2.7e \approx 2.7 optimum

If you must drive a load CLC_L much larger than your input capacitance CinC_{in}, a single-stage drive is too slow (because to grow strong enough you must grow your input cap, which slows the previous stage). Instead, insert a chain of inverters with each stage ff times bigger than the last. The total delay is

ttotal=Nfτ0t_{total} = N \cdot f \cdot \tau_0

where τ0\tau_0 is the intrinsic per-stage delay and ff is the per-stage scaling factor, with fN=CL/Cinf^N = C_L/C_{in}. Substituting N=ln(CL/Cin)/lnfN = \ln(C_L/C_{in})/\ln f:

ttotal=τ0ln(CL/Cin)flnft_{total} = \tau_0 \ln(C_L/C_{in}) \cdot \frac{f}{\ln f}

To minimize over ff: take d/dfd/df of f/lnff/\ln f, set to zero, get lnf=1\ln f = 1, so f=e2.718f = e \approx 2.718.

In practice, designers use f=4f = 4 ("FO4 delay" is the propagation delay of an inverter driving four identical inverters and is the standard speed metric of a process). Using f=4f = 4 instead of ee gives slightly more delay but uses fewer stages, which saves area and power.

9.5 Wire delay vs gate delay at modern nodes

At 180 nm, gate delay was ~30 ps and wire delay (for typical interconnect lengths) was negligible. At 7 nm, gate delay is ~5 ps but a millimeter-long wire on M2 has 200+ ps of delay. Wires are the bottleneck. This drives:

  • Aggressive use of upper, thick metals for long signals.
  • Repeater insertion every few hundred microns.
  • 3D stacking (HBM, chiplets) to keep distances short.