>
section 5 of 1310 min read

5. Combinational Circuits

A combinational circuit's output depends only on the current inputs. There is no memory, no history, no clock. Apply inputs, wait for propagation delay, read outputs. We build them out of gates.

5.1 Half adder and full adder

The simplest non-trivial combinational block is the half adder, which adds two single bits.

ABSumCarry
0000
0110
1010
1101

By inspection: Sum = ABA \oplus B (XOR), Carry = ABA \cdot B (AND). Two gates, done.

For multi-bit addition, each bit position has three inputs: the two operand bits plus a carry-in from the bit below. That is the full adder.

ABCinSumCout
00000
00110
01010
01101
10010
10101
11001
11111

Working out the K-maps yields:

Sum=ABCin,Cout=AB+(AB)Cin.\text{Sum} = A \oplus B \oplus C_{in}, \qquad C_{out} = AB + (A \oplus B) C_{in}.

A full adder is two half adders plus an OR gate, structurally. In CMOS it is typically about 28 transistors using shared logic between Sum and Cout.

5.2 Ripple-carry adder

To add two nn-bit numbers, chain nn full adders: the carry-out of bit ii becomes the carry-in of bit i+1i+1.

plaintext
   A0 B0       A1 B1       A2 B2       A3 B3
    \ /         \ /         \ /         \ /
   [FA0]──C──→[FA1]──C──→[FA2]──C──→[FA3]── Cout
    │           │           │           │
    S0          S1          S2          S3

This is the ripple-carry adder. Conceptually it is exactly what you do when you add two decimal numbers on paper: you propagate the carry from right to left.

The drawback is that the carry has to ripple through every bit. Bit nn's sum cannot stabilize until bit n1n-1's carry-out has stabilized, which depends on bit n2n-2's, and so on. A 64-bit ripple adder has worst-case delay roughly 64 full-adder delays, making it slow.

5.3 Carry-lookahead adder

If we are willing to throw more gates at the problem, we can compute all the carries in parallel. Define for each bit:

  • Generate Gi=AiBiG_i = A_i B_i. Bit ii generates a carry no matter what carry came in.
  • Propagate Pi=AiBiP_i = A_i \oplus B_i. Bit ii propagates an incoming carry to its output. (Some texts define Pi=Ai+BiP_i = A_i + B_i; either works for the carry equations.)

The carry equation is then

Ci+1=Gi+PiCi.C_{i+1} = G_i + P_i C_i.

Substitute recursively:

C2=G1+P1C1=G1+P1(G0+P0C0)=G1+P1G0+P1P0C0.C_2 = G_1 + P_1 C_1 = G_1 + P_1(G_0 + P_0 C_0) = G_1 + P_1 G_0 + P_1 P_0 C_0. C3=G2+P2G1+P2P1G0+P2P1P0C0.C_3 = G_2 + P_2 G_1 + P_2 P_1 G_0 + P_2 P_1 P_0 C_0.

Each CiC_i depends only on the GGs, PPs, and C0C_0, all of which are available immediately at the start. So the carries can be computed in two gate delays (one for the AND-trees, one for the OR), independent of nn. The cost is fan-in: a 16-bit lookahead carry needs 17-input AND/OR gates, which is impractical, so designers organize lookahead in groups of 4 bits, with second-level lookahead between groups.

Modern CPUs use even more aggressive structures: Kogge-Stone, Brent-Kung, Han-Carlson adders organize the carry computation as a parallel-prefix tree with O(logn)O(\log n) depth and various tradeoffs in fan-out and wire count. The 64-bit adders inside an x86 or ARM core are typically Kogge-Stone or hybrid variants.

5.4 Subtractors

Subtraction in 2's complement is just addition with the subtrahend negated. A 2's-complement subtractor is a standard adder with the second input bit-inverted and a forced 1 on the carry-in. So:

AB=A+Bˉ+1.A - B = A + \bar{B} + 1.

The same physical adder can be repurposed for subtraction by toggling a single mode signal that XORs each BiB_i with mode and feeds mode into C0C_0. An 8-bit "adder/subtractor" looks identical to an 8-bit adder plus eight XOR gates.

5.5 Decoders

A decoder has nn inputs and 2n2^n outputs. It activates exactly one output line based on the binary code on its inputs. A 3-to-8 decoder has 3 inputs and 8 outputs; for input 011, output line 3 is high and all others are low.

Decoders are the basic addressing primitive. Your CPU's instruction decoder is a generalized version: it takes the opcode field of an instruction and asserts exactly one "control" line that says "this is an ADD" or "this is a LOAD." Memory address decoders inside RAM chips take the row/column address and assert exactly one row/column line. Display drivers use decoders to select which segment to light. The 74LS138 (3-to-8 decoder) is a workhorse part still widely available.

A 4-to-16 decoder can be built from two 3-to-8 decoders with the highest address bit selecting between them via the chip-enable input. This recursive structure scales: real RAM chips combine row decoders, column decoders, and bank decoders to address billions of cells.

5.6 Encoders

The inverse of a decoder: 2n2^n inputs, nn outputs. Whichever input is high, the encoder asserts the corresponding nn-bit binary code. An 8-to-3 encoder reads 00001000 and outputs 011.

Plain encoders break if more than one input is high simultaneously. The output becomes the OR of the binary codes for all asserted inputs, which is generally garbage.

5.7 Priority encoders

A priority encoder handles multiple-active-input cases by reporting the highest-priority active input (usually the highest-numbered). The classic example is interrupt arbitration: multiple peripherals raise interrupt requests simultaneously, and the priority encoder picks one for the CPU to service first.

The 74LS148 (8-to-3 priority encoder) has been used in everything from disk controllers to keyboards. Inside an ARM Cortex-M's NVIC (Nested Vector Interrupt Controller), a generalized priority encoder picks the highest-priority pending interrupt on every cycle. Inside the OS scheduler, a "ready-task priority encoder" picks the next thread to run.

5.8 Multiplexers

A multiplexer (mux) has 2n2^n data inputs, nn select inputs, and 1 output. The select chooses which data input gets routed to the output. An 8-to-1 mux with select=011 routes data input 3 to its output.

Muxes are everywhere:

  • Datapath selection in CPUs. "Which register do I feed into the ALU's A input?" A 32-to-1 mux with 5-bit select picks one of 32 registers in a typical RISC register file.
  • Bus arbitration. Which device's data goes onto the system bus this cycle? A mux gated by ownership signals.
  • SPI / I2C bus expanders. A part like the TS3A5018 is essentially a tiny analog mux that routes one SPI bus to one of several SD cards based on chip-select.
  • Implementing arbitrary Boolean functions. Any truth table can be realized as a mux with appropriate hard-wired data inputs. A 3-input function is implemented as an 8-to-1 mux with the data inputs being the 8 truth-table outputs and the selects being the 3 input variables.
  • FPGAs essentially are made of muxes. Each "lookup table" (LUT) inside a logic cell is a mux feeding from SRAM cells whose contents define a 4- or 6-input truth table.

5.9 Demultiplexers

The inverse of a mux: 1 input, 2n2^n outputs, nn selects. Routes the input to whichever output the select chooses; other outputs are forced to a default (usually 0 or high-impedance).

Demuxes route serial-coming data into one of many destinations: think of routing a single SPI master to one of multiple slaves, or distributing a single video stream to one of multiple displays.

5.10 Comparators

A comparator compares two nn-bit inputs and outputs three signals: A>B, A=B, A<B.

Equality (A=B) is built from XNORs: bit-by-bit, A_i XNOR B_i is 1 when bits match. AND all those bits and you get a single equality flag.

Magnitude (A>B, A<B) requires a slice-by-slice cascade. Start from the MSB. If A_n > B_n, A is bigger regardless of lower bits; output A>B and stop. If A_n < B_n, A is smaller; output A<B and stop. If A_n == B_n, fall through to the next bit. A chain of these slices implements an nn-bit comparator. The 74HC85 (4-bit comparator) cascades into wider comparators by chaining its expansion inputs.

Comparators are used heavily in CPUs (branch decisions: if (a > b) translates to a subtractor whose sign and zero flags drive the branch), memory protection (address-range checks), and security (constant-time equality comparison for password and HMAC checking).

Hardware-security note: timing attacks on comparators. A naive equality comparator that returns early on the first byte mismatch leaks information through timing: the longer the response, the more bytes matched. Cryptographic equality checks (memcmp-replacements like Python's secrets.compare_digest or C's consttime_memequal) are hardwired or coded to take exactly the same time regardless of the data, by ANDing all bit-equalities together rather than short-circuiting. A comparator that fails this discipline has been the root cause of many practical authentication bypasses, including the 2009 Keyczar HMAC-comparison vulnerability.

5.11 Parity generators and checkers

A parity generator computes the parity bit for an nn-bit input: just XOR all the bits together. Even parity makes the count of 1s (including the parity bit) even; odd parity makes it odd.

A parity checker XORs all n+1n+1 bits (data plus parity) and outputs 0 if parity is satisfied, 1 if there is an error.

UART chips, RAM modules, network interface cards, and most older bus protocols all include parity generation and checking. The 74LS280 9-bit odd/even parity chip was a TTL classic. Parity is enough to detect single-bit errors; correcting them requires Hamming or beyond.

5.12 ALU concept

The arithmetic and logic unit (ALU) of a CPU is a configurable combinational block that does one of several operations on its inputs: add, subtract, AND, OR, XOR, NOT, shift, compare. A "function select" input picks the operation.

Conceptually, an nn-bit ALU has:

  • An adder/subtractor for the arithmetic operations.
  • Bitwise AND/OR/XOR/NOT logic for the logical operations.
  • A barrel shifter for shifts.
  • A multiplexer that picks which of these results becomes the output, based on the function-select inputs.

Status flags fall out for free: Zero (output is all zeros, an nn-input NOR), Negative (MSB of result), Carry (carry-out of MSB for unsigned arithmetic), Overflow (XOR of carry into MSB and carry out of MSB, indicating signed overflow). Conditional-branch instructions read these flags to decide whether to take the branch.

5.13 7-segment decoder: a worked combinational example

Drive a seven-segment digit display. The input is a 4-bit BCD digit (0–9); the output is 7 bits (one per segment a–g), set to light the right segments.

plaintext
     a
    ───
  f│   │b
    ─g─
  e│   │c
    ───
     d

Truth table: 10 rows for input 0–9, the remaining 6 rows (10–15) are don't-cares. K-map each output, simplify, build with gates. Or use the 74LS47 chip that does exactly this. In modern designs you would just put the seven-bit pattern in firmware and bit-bang it to a generic display driver.

The 7-segment decoder is a classic exercise because every step is hands-on:

  1. Decide which segments to light for each digit (e.g., "1" lights b and c only, "8" lights all seven).
  2. Build the truth table.
  3. Build seven 4-variable K-maps with don't-cares in cells 10–15.
  4. Read off seven simplified Boolean expressions.
  5. Wire each one up.

By the end you have built a real, useful chip, in your head, from the parts you have learned in this chapter.