6. Channel Coding: Errors are Cheap if You Plan for Them // Digital Communications // bhaswanth

Once we accept that the channel will introduce errors, we add controlled redundancy to detect or correct them. This is channel coding.

6.1 The redundancy intuition

Redundancy is the spare tire in your car. Most days you do not need it, and it adds weight. The day you blow a tire, it saves you. Channel coding adds extra bits that the receiver can use to figure out where errors happened and undo them.

The cost is rate. A code of rate $k/n$ carries $k$ information bits per $n$ transmitted bits. The reward is coding gain, the SNR improvement at a target BER versus uncoded transmission. Modern codes achieve 5 to 9 dB of coding gain at modest rate penalty, which is the same as raising your transmit power by a factor of 4 to 8. Or, equivalently, doubling or tripling the range of every link.

6.2 Linear block codes and minimum distance

A block code chops the bit stream into blocks of $k$ information bits, encodes each as a codeword of $n$ bits, and transmits. A code is linear if the sum of two codewords (mod 2) is also a codeword. That symmetry buys us a huge mathematical machinery.

The Hamming distance between two binary strings is the number of positions where they differ. The minimum distance $d_\text{min}$ of a code is the smallest Hamming distance between any two distinct codewords. Equivalently for a linear code, $d_\text{min}$ is the minimum weight (number of 1s) of any non-zero codeword.

Why does $d_\text{min}$ matter?

Error detection. A code with minimum distance $d$ can detect any error pattern of weight up to $d - 1$ , because changing $d - 1$ bits of a valid codeword cannot reach another valid codeword.
Error correction. It can correct any error pattern of weight up to $\lfloor (d - 1)/2 \rfloor$ , because each codeword is surrounded by a sphere of radius $\lfloor (d-1)/2 \rfloor$ that contains no other codeword's sphere; a noisy received vector is decoded as the nearest codeword.

For the (7,4) Hamming code, $d_\text{min} = 3$ : detect 2 errors or correct 1.

6.3 Hamming codes, generator and parity-check matrices

A (7,4) Hamming code maps 4 data bits $\mathbf{u} = (u_1, u_2, u_3, u_4)$ to 7 codeword bits $\mathbf{c}$ by $\mathbf{c} = \mathbf{u} G$ , where $G$ is the generator matrix:

plaintext

        [ 1 0 0 0 | 1 1 0 ]
G   =   [ 0 1 0 0 | 1 0 1 ]
        [ 0 0 1 0 | 0 1 1 ]
        [ 0 0 0 1 | 1 1 1 ]

The first four columns form an identity (so the data bits appear unchanged in the codeword). The last three columns are the parity bits: $p_1 = u_1 \oplus u_2 \oplus u_4$ , $p_2 = u_1 \oplus u_3 \oplus u_4$ , $p_3 = u_2 \oplus u_3 \oplus u_4$ .

The companion parity-check matrix $H$ has $\mathbf{c} H^T = 0$ for every codeword:

plaintext

        [ 1 1 0 1 | 1 0 0 ]
H   =   [ 1 0 1 1 | 0 1 0 ]
        [ 0 1 1 1 | 0 0 1 ]

If a codeword $\mathbf{c}$ is sent and a noise vector $\mathbf{e}$ corrupts it, the received word is $\mathbf{r} = \mathbf{c} + \mathbf{e}$ . The receiver computes the syndrome:

$\mathbf{s} = \mathbf{r} H^T = (\mathbf{c} + \mathbf{e}) H^T = \mathbf{e} H^T.$

The syndrome depends only on the error, not on the codeword. For a single-bit error in position $i$ , the syndrome equals the $i$ th column of $H$ . Each column of $H$ is distinct and nonzero, so we can look up which bit was flipped from the syndrome and flip it back. That is syndrome decoding in one paragraph.

python

import numpy as np
 
G = np.array([
    [1,0,0,0, 1,1,0],
    [0,1,0,0, 1,0,1],
    [0,0,1,0, 0,1,1],
    [0,0,0,1, 1,1,1],
], dtype=int)
 
H = np.array([
    [1,1,0,1, 1,0,0],
    [1,0,1,1, 0,1,0],
    [0,1,1,1, 0,0,1],
], dtype=int)
 
def encode(u):
    return (u @ G) % 2
 
def decode(r):
    s = (r @ H.T) % 2
    if not s.any():
        return r[:4]                                        # no error detected
    # find which column of H equals the syndrome
    for i in range(7):
        if np.array_equal(H[:, i], s):
            r[i] ^= 1
            return r[:4]
    return None                                             # uncorrectable
 
u = np.array([1, 0, 1, 1])
c = encode(u)
r = c.copy()
r[3] ^= 1                                                   # inject one error
print(decode(r))                                            # -> [1 0 1 1], correct

The (7,4) Hamming code has rate 4/7 and corrects any single-bit error per block. It is the spiritual ancestor of every ECC scheme on Earth. ECC RAM in servers uses an extended (72,64) SECDED Hamming variant on every memory access. NAND flash controllers use stronger BCH or LDPC codes derived from the same algebraic ideas.

6.4 Cyclic codes and CRC

A code is cyclic if every cyclic shift of a codeword is also a codeword. Cyclic codes have a beautifully compact algebraic description: codewords are multiples of a generator polynomial $g(x)$ , and arithmetic happens modulo $x^n - 1$ in $GF(2)$ . Cyclic encoders fit naturally in shift registers with feedback (LFSRs from Chapter 11), so they are stupendously cheap in hardware.

The cyclic redundancy check (CRC) is a cyclic code used for error detection. Treat the message as a polynomial $m(x)$ , multiply by $x^{n-k}$ (shift left), divide by the generator polynomial $g(x)$ , and take the remainder $r(x)$ . Append $r(x)$ to the shifted message to get $c(x) = x^{n-k} m(x) + r(x)$ . By construction $g(x) | c(x)$ . The receiver divides the received polynomial by $g(x)$ ; a non-zero remainder flags an error.

A worked example. Let $g(x) = x^3 + x + 1$ (binary 1011). Message $m = 1101$ , polynomial $m(x) = x^3 + x^2 + 1$ . Shift left by 3: $1101000$ . Divide by 1011:

plaintext

              1 1 1 0
            ───────────
   1 0 1 1 ) 1 1 0 1 0 0 0
             1 0 1 1
             ───────
               1 1 0 0
               1 0 1 1
               ───────
                 1 1 1 0
                 1 0 1 1
                 ───────
                   1 0 1 ← remainder = CRC

CRC = 101. Transmitted word = $1101 \,||\, 101 = 1101101$ . Receiver divides by 1011, gets remainder zero, accepts. If any single bit (or many error patterns) had flipped, the remainder would be non-zero and the frame rejected.

A few real-world generators:

CRC-8 (ATM, Bluetooth header): $x^8 + x^2 + x + 1$ .
CRC-16-CCITT (used in HDLC, XMODEM): $x^{16} + x^{12} + x^5 + 1$ .
CRC-32 (Ethernet, ZIP, PNG, gzip): $x^{32} + x^{26} + x^{23} + ... + 1$ . It is on every Ethernet frame in your house, every TCP segment that traverses your house, every WAV chunk, every PNG, every gzip. By volume of bytes-checked-per-second, CRC-32 may be the most-executed algorithm on Earth.

A CRC of $r$ bits catches any burst error of length $\le r$ and a fraction $1 - 2^{-r}$ of longer bursts. Excellent error detection at near-zero cost.

Hardware-security implication: CRC is not a MAC. CRC is a linear function of the message. Given a message and its CRC, an attacker can compute the CRC of any modified message in closed form, without knowing a key (because there is no key). CRC was designed to detect random channel errors, not to authenticate. Yet it has been misused for authentication a depressing number of times. WEP (the original Wi-Fi security) authenticated frames with CRC-32 inside RC4 encryption; the linearity of CRC let attackers flip arbitrary bits in the ciphertext and adjust the encrypted CRC to match, breaking integrity. Always use HMAC, AES-GMAC, or Poly1305 for authentication; CRC is a checksum, not a cryptographic primitive.

6.5 BCH and Reed-Solomon

BCH codes generalize Hamming to correct multiple errors per block. Reed-Solomon (RS) codes are non-binary BCH codes operating on $m$ -bit symbols (not bits) over a Galois field $GF(2^m)$ . Because they correct symbol errors, they shine when errors come in bursts (one bad symbol can have all bits wrong yet count as one symbol error).

RS codes are everywhere bursty errors live: CDs (a single scratch is one giant burst), DVDs, Blu-ray, QR codes, RAID 6 disk arrays, deep-space probes (Voyager uses an RS(255, 223) code as its outer code), DSL, 100GBASE-R Ethernet (RS-FEC), and more. They have been workhorses for forty years and they are not going away.

6.6 Convolutional codes

Convolutional codes process the bit stream continuously instead of in blocks. The encoder is a small shift register (length $K$ , the constraint length) plus a few XOR taps that produce $n$ output bits per $k$ input bits, giving rate $k/n$ .

A canonical rate-1/2, $K = 3$ encoder uses two taps:

plaintext

    Input ─┬──[D]──[D]
           │   │    │
           │  XOR──XOR──── output 1 (taps 111)
           │   │
           └──XOR───────── output 2 (taps 101)

Two output bits per input bit, depending on the current bit and the two previous bits stored in the register. The encoder has $2^{K-1} = 4$ states.

6.7 Trellis and Viterbi decoding

Draw the encoder's state on a vertical axis and time on the horizontal axis; mark transitions by the input bit (0 or 1). The result is a trellis diagram that shows every possible state path through time:

plaintext

   t=0      t=1      t=2      t=3
   00 ●────●────●────●  (top row: state 00)
       ╲    ╲    ╲
        ●────●────●────●  (state 01)
       ╲ ╲  ╲ ╲  ╲ ╲
        ●────●────●────●  (state 10)
            ╱    ╱
   11 ●────●────●────●  (state 11)
   ↓ Each branch is labeled with input bit and emitted bits.

The transmitted bit stream corresponds to a unique path through the trellis. The receiver gets a noisy version. The Viterbi algorithm (Andrew Viterbi, 1967) finds the path of minimum cumulative metric (Hamming distance to received bits, or Euclidean distance to received samples) using dynamic programming. At each state, at each time, it remembers the single best path that ends in that state. After the message ends, traceback recovers the maximum-likelihood decoded sequence.

Viterbi is one of the most-implemented algorithms in silicon. Every Wi-Fi chipset, every classic LTE/GSM/UMTS receiver, every GPS receiver, every old voiceband modem, and every digital satellite receiver runs a Viterbi decoder somewhere.

Hardware-security note. Viterbi accelerators are juicy targets. Their data-dependent path metrics turn into data-dependent power consumption, and DPA-style attacks against Viterbi accelerators in early cellular handsets recovered intermediate state. Modern designs use balanced metric units and randomize traceback timing.

6.8 Modern codes: turbo, LDPC, polar

Three discoveries reshaped channel coding in the last 30 years.

Turbo codes (Berrou et al., 1993). Two parallel convolutional encoders separated by a pseudo-random interleaver, with iterative decoding that ping-pongs soft information between two BCJR decoders. Achieves within 0.5 dB of Shannon at modest block lengths. UMTS, LTE, deep-space (Cassini, Mars Reconnaissance Orbiter) all use turbo codes.
LDPC codes (Gallager 1962, rediscovered Mackay 1996). Long sparse parity-check matrices, decoded iteratively by belief propagation on a Tanner graph. Within a hair of Shannon at long blocks. Wi-Fi 6, 5G NR data channels, DVB-S2 satellite, 10GBASE-T Ethernet, NAND flash, and modern hard drives all use LDPC.
Polar codes (Arikan, 2009). The first family proven to achieve Shannon capacity in the limit, via channel polarization. Adopted by 5G NR for control channels.

Why iterative decoding? Modern decoders treat each bit's value as a probability, not a hard 0 or 1, and pass these probabilities (often as log-likelihood ratios) between code constraints. After several rounds, the probabilities firm up into bits. The reason to iterate is that each constraint sees only part of the picture; after several passes, all constraints have integrated all the information.

The progression Hamming → BCH → convolutional → turbo → LDPC → polar parallels the closing of the gap to Shannon. Hamming (1948-ish): 9 dB above Shannon. Convolutional with Viterbi: 4 to 5 dB. Turbo: 0.5 dB. LDPC: 0.1 dB. Polar (asymptotic): exactly Shannon. We are living at the end of an arc.