Power is the binding constraint of modern chip design, more than area or speed. The design tools and methodologies are organized around it.
14.1 Clock gating
A flip-flop's clock pin is only useful when its data is changing. If a register is not loaded this cycle, gating its clock saves dynamic power in the flop and downstream logic. Clock gating inserts an AND or latch-based gate on the clock to selectively enable it. EDA tools insert these automatically based on enable conditions in the RTL.
14.2 Power gating
For longer idle periods, the entire block's is disconnected by a chain of "header" PMOS sleep transistors. This eliminates leakage, not just dynamic power. Wake-up takes microseconds (the block must redraw from registers). State retention may be needed via a small always-on shadow register.
14.3 Multi- domains
Different blocks of a chip run at different supply voltages. CPU at 0.9 V, on-chip SRAM at 1.0 V, USB I/O at 3.3 V. Level shifters between domains. Modern SoCs may have 10+ voltage rails.
14.4 Multi- assignment
We covered HVT/SVT/LVT cells in Section 4.4. Synthesis tools mix-and-match cells along each path to meet timing with minimum leakage.
14.5 DVFS: Dynamic Voltage and Frequency Scaling
The CPU monitors workload and lowers and when light, raising them when busy. Mobile chips do this aggressively; ARM big.LITTLE and Apple's P-cores/E-cores are extensions of this idea.
14.6 Body biasing
Modulate the body voltage to shift at runtime. Forward body bias lowers (faster, more leaky) for high-performance bursts; reverse body bias raises (slower, less leaky) for idle. Used in some specialized processes; harder on FinFET.
14.7 Architectural levers
- Pipelining and parallelism allow lower frequency at constant throughput.
- Specialized accelerators (NPU for matrix multiply, ISP for camera, video codec) deliver the same compute in 10x less energy than general-purpose cores.
- Heterogeneous cores (big.LITTLE, P/E cores) match workload to the smallest sufficient core.