>
section 2 of 1410 min read

2. Pulse Code Modulation: The Bedrock of Digital Telephony

Pulse Code Modulation (PCM) is the simplest, oldest, and most widely deployed digitization scheme. Bell Labs cooked it up in the 1930s and the world has run on it ever since. Every uncompressed WAV file, every CD audio track, every DSL voice channel, every classic telephone trunk, every classic uncompressed video frame, is PCM. Every analog-to-digital converter in your house is, at heart, a PCM encoder.

2.1 The three stages

PCM does three things in sequence.

  1. Sample the analog signal at a rate fsf_s above the Nyquist rate 2W2W, where WW is the message bandwidth. From Chapter 3, this is the only way the discrete samples can faithfully represent the continuous-time signal.
  2. Quantize each sample by rounding to one of L=2nL = 2^n allowed levels.
  3. Encode each quantized sample as an nn-bit binary word.

The output is a stream of nfsn \cdot f_s bits per second. For 8-bit PCM at 8 kHz (the venerable G.711 codec used in landline telephony), the bit rate is 64 kbps. For 16-bit PCM at 44.1 kHz (a CD audio channel), it is 705.6 kbps per channel.

rendering diagram...

The anti-aliasing filter is non-negotiable. Without it, anything above fs/2f_s / 2 folds back into the band of interest as irreversible noise (Chapter 3 covered this in detail).

2.2 Quantization is the new noise source

Sampling is reversible if Nyquist is satisfied. Quantization is not. The instant you round 1.4137 V to "level 1" and 1.4138 V to "level 1" you have lost the difference between them forever. The error is irrecoverable but, crucially, it is bounded.

Suppose the quantizer has LL levels uniformly spaced by step size Δ\Delta across a peak-to-peak range VppV_{pp}, so Δ=Vpp/L\Delta = V_{pp} / L. For a sample xx, let the rounded value be x^\hat{x}. The quantization error is e=xx^e = x - \hat{x}, and it satisfies Δ/2eΔ/2-\Delta/2 \le e \le \Delta/2. We can no longer say what ee is for any one sample without knowing xx exactly, but we can say what ee does statistically. If the input signal varies continuously over many quantization steps (true for almost any non-trivial signal), then ee is well-modelled as uniformly distributed on [Δ/2,Δ/2][-\Delta/2, \Delta/2], with zero mean.

Pinball-bumper analogy. Imagine a long row of bumpers on a pinball table, each of width Δ\Delta. Drop the ball anywhere along the row; whichever bumper it lands on, the centre of that bumper is Δ/2\Delta/2 at most from where the ball actually fell. Over many drops, the offsets fill out uniformly. That is quantization noise.

2.3 Deriving Δ2/12\Delta^2 / 12, the most-used line in DSP

Let us actually work out the noise power. The mean-square value of a uniform random variable ee on [Δ/2,Δ/2][-\Delta/2, \Delta/2] is

E[e2]=1ΔΔ/2Δ/2e2de=1Δ[e33]Δ/2Δ/2=1Δ23(Δ2)3=Δ212.E[e^2] = \frac{1}{\Delta} \int_{-\Delta/2}^{\Delta/2} e^2 \, de = \frac{1}{\Delta} \left[ \frac{e^3}{3} \right]_{-\Delta/2}^{\Delta/2} = \frac{1}{\Delta} \cdot \frac{2}{3}\cdot\left(\frac{\Delta}{2}\right)^3 = \frac{\Delta^2}{12}.

So the quantization noise power is σq2=Δ2/12\sigma_q^2 = \Delta^2 / 12. That single result is the cornerstone of every ADC's data sheet.

This formula assumes three things, all close to true in practice. First, the signal exercises many levels (busy enough that the error truly looks random). Second, the noise is uncorrelated from sample to sample (true if the signal moves at least one step per sample). Third, the levels are uniform. Companding (later) violates the third assumption deliberately and we will adjust.

2.4 Deriving the SNR formula 6.02n+1.766.02 n + 1.76 dB

Now suppose the input is a full-scale sinusoid x(t)=Asin(2πft)x(t) = A \sin(2\pi f t), so its peak-to-peak is 2A2A and it just spans the quantizer's range, Vpp=2AV_{pp} = 2A. The sinusoid's mean-square value is A2/2A^2 / 2. With L=2nL = 2^n levels and step size Δ=2A/2n=A21n\Delta = 2A / 2^n = A \cdot 2^{1-n}:

σq2=Δ212=A2222n12=A2322n.\sigma_q^2 = \frac{\Delta^2}{12} = \frac{A^2 \cdot 2^{2-2n}}{12} = \frac{A^2}{3 \cdot 2^{2n}}.

The signal-to-quantization-noise ratio is

SQNR=A2/2A2/(322n)=3222n.\text{SQNR} = \frac{A^2/2}{A^2 / (3 \cdot 2^{2n})} = \frac{3}{2} \cdot 2^{2n}.

Take 10 log of both sides:

SQNRdB=10log10(32)+10log10(22n)=1.76+6.02n dB.\text{SQNR}_\text{dB} = 10 \log_{10}\left(\tfrac{3}{2}\right) + 10 \log_{10}(2^{2n}) = 1.76 + 6.02 n \text{ dB.}

There is the famous formula. Each extra bit of resolution buys you 6.02 dB of SNR for a sinusoidal input that fully exercises the quantizer.

BitsSQNR (full-scale sine)
849.9 dB
1274.0 dB
1486.0 dB
1698.1 dB
18110.2 dB
20122.2 dB
24146.2 dB

Real ADCs are usually somewhat below these numbers because of thermal noise, jitter, integral and differential nonlinearity, and so on. The vendor reports an effective number of bits (ENOB) that is typically 2 to 4 below the nameplate bit count. A 24-bit audio ADC with 21 ENOB is excellent. A 16-bit successive-approximation ADC at 14.5 ENOB is normal.

2.5 PCM bandwidth

Each sample becomes nn bits, all delivered in 1/fs1/f_s seconds. If the sample rate is fsf_s and the message bandwidth is WW, with fs2Wf_s \ge 2W, the bit rate is Rb=nfs2nWR_b = n f_s \ge 2 n W. The minimum PCM bandwidth, before any pulse shaping or modulation, is therefore on the order of nWn W for binary signalling (one bit per channel use) or nW/log2Mn W / \log_2 M for MM-ary signalling.

Whatever you save on quantization fidelity, you pay in spectrum. Eight-bit PCM telephone audio (4 kHz message, 8 kHz sample rate, 8 bits) needs roughly 32 kHz of bandwidth on the wire. CD-quality stereo (44.1 kHz, 16 bits, two channels) needs roughly 1.4 MHz. This is the bandwidth-power tradeoff embodied in transmission of digitized audio.

2.6 Companding: throwing bits where they matter

Voice signals are a strange beast. The amplitude distribution is peaked sharply around zero, with long but rare excursions to large amplitudes. Most samples sit in the middle of the range. A uniform quantizer wastes resolution on the rare large samples; small signals (weak voice, low background) are quantized too coarsely.

Compand (compress + expand) is the trick: pass the signal through a logarithmic compressor before quantizing, so small values get more resolution; on the receive side, expand it back with the inverse curve. Net effect: a non-uniform quantizer with finer steps near zero and coarser ones at the extremes, but built using a uniform quantizer in the middle.

Two companding curves dominate.

  • A-law (Europe, ITU-T G.711). Uses a piecewise compressor with A=87.6A = 87.6. Sample at 8 kHz, compand, quantize to 8 bits, transmit at 64 kbps. This is the European telephone standard.
  • µ-law (North America, Japan). Uses μ=255\mu = 255. Same goal, slightly different curve, slightly better performance for very low amplitudes.

Both compress the effective dynamic range from 13 bits down to 8 bits, giving a roughly 13-bit subjective audio quality from an 8-bit pipe at 64 kbps. The world's telephone backbone ran on this scheme for half a century. Even today, "PSTN-quality" audio means G.711 with one of these companders.

Suitcase analogy. Imagine packing for a trip. Uniform quantization is like dividing your suitcase into equal-sized cubes, each holding one item regardless of size. Companding is rolling clothes tighter and stacking small items in many small cubes while large items get one big cube. You fit more useful stuff in. Companding gives small voice signals lots of cubes (resolution) and rare loud ones a few big cubes.

2.7 DPCM: encode the difference

Voice samples are highly correlated from one moment to the next. The amplitude at sample nn is a great predictor of sample n+1n+1. So instead of encoding the full sample, encode the prediction residual. If your predictor is even mediocre, the residual has much smaller dynamic range than the signal itself, so you can quantize it with fewer bits at the same fidelity.

Differential PCM (DPCM) adds a simple predictor (often a single delay, x^[n]=x[n1]\hat{x}[n] = x[n-1], or a more sophisticated linear filter trained on speech statistics) and quantizes only the difference d[n]=x[n]x^[n]d[n] = x[n] - \hat{x}[n]. Real-world DPCM cuts the bit rate roughly in half for the same audio quality.

Adaptive DPCM (ADPCM) adapts the predictor and quantizer step size to track the signal, packing telephone-grade voice into 32 kbps (G.726). When you used to hear cordless phones say "DECT" or saw your voicemail use ADPCM, this was at work. ADPCM is the conceptual ancestor of every modern speech codec; CELP, AMR, EVS, and Opus all use the prediction-and-residual idea, just with much fancier predictors.

2.8 Delta modulation: one bit per sample

Push DPCM to the extreme: one bit per sample. The encoder sends a single bit each step that says only "go up by Δ\Delta" or "go down by Δ\Delta." The receiver maintains a staircase that follows the signal one step at a time.

plaintext
Signal m(t): smooth curve
Staircase :  +Δ +Δ +Δ -Δ +Δ -Δ -Δ -Δ -Δ ...
Bit stream:   1  1  1  0  1  0  0  0  0

Two failure modes are baked in.

  • Slope overload. If the signal changes faster than Δfs\Delta f_s, the staircase cannot keep up. The reconstructed signal lags and clips off the fast peaks.
  • Granular noise. When the signal is flat, the staircase oscillates ±Δ\pm \Delta around it, never settling. You hear a steady background hash.

You can trade these two off by choosing Δ\Delta. Small Δ\Delta kills granular noise but worsens slope overload; large Δ\Delta does the reverse. The two demands fight each other.

Adaptive delta modulation (ADM) wins by adjusting Δ\Delta on the fly: shrink it when the signal is slow, grow it when the signal is fast. CVSD (continuously variable slope delta) is a famous variant used in early military and Bluetooth voice (HV3 SCO links).

Delta modulation never replaced PCM in the trunk network, but its descendants survive in audio and oversampled converters. A modern delta-sigma ADC (the ΣΔ\Sigma\Delta in your phone's audio codec) is essentially an over-clocked, feedback-stabilized delta modulator that achieves 24-bit equivalent resolution by sampling 256 times faster than Nyquist and pushing quantization noise out of the audio band.

2.9 Noise considerations in PCM systems

Three sources of error matter.

  1. Quantization noise (we just derived Δ2/12\Delta^2 / 12). It is irreducible at the encoder and dominant when the channel is good.
  2. Channel noise in the analog modem-to-bit path. If the digital symbols are demodulated with bit error rate pp, each erroneous bit changes a sample by anywhere from Δ\Delta (least significant bit flipped) to up to roughly half the full-scale (most significant bit flipped). The mean-square signal-domain noise contribution from channel errors works out, for a uniform nn-bit code with error rate pp, to about p4Vpp2/3p \cdot 4 V_{pp}^2 / 3, dwarfing quantization noise as soon as pp exceeds about 22n2^{-2n}.
  3. Sampling jitter in the clock. A jitter of σt\sigma_t on the sample clock turns into amplitude noise of σa=(2πf)σtA\sigma_a = (2\pi f) \sigma_t \cdot A for a sinusoid of amplitude AA at frequency ff. The faster the signal, the more jitter hurts. Modern audio ADCs spec jitter in tens of femtoseconds for the 24-bit / 192 kHz market.

The dominant story is simple. PCM-encoded data through a clean channel: quantization noise sets the floor. PCM through a noisy channel without coding: random bit flips trash the audio. PCM with a strong channel code: the coded system threshold-protects us, and quantization noise reasserts itself as the floor. Channel coding lets PCM enjoy quantization-limited fidelity over a wide range of channel conditions instead of going off a cliff.