2. Pulse Code Modulation: The Bedrock of Digital Telephony // Digital Communications // bhaswanth

Pulse Code Modulation (PCM) is the simplest, oldest, and most widely deployed digitization scheme. Bell Labs cooked it up in the 1930s and the world has run on it ever since. Every uncompressed WAV file, every CD audio track, every DSL voice channel, every classic telephone trunk, every classic uncompressed video frame, is PCM. Every analog-to-digital converter in your house is, at heart, a PCM encoder.

2.1 The three stages

PCM does three things in sequence.

Sample the analog signal at a rate $f_s$ above the Nyquist rate $2W$ , where $W$ is the message bandwidth. From Chapter 3, this is the only way the discrete samples can faithfully represent the continuous-time signal.
Quantize each sample by rounding to one of $L = 2^n$ allowed levels.
Encode each quantized sample as an $n$ -bit binary word.

The output is a stream of $n \cdot f_s$ bits per second. For 8-bit PCM at 8 kHz (the venerable G.711 codec used in landline telephony), the bit rate is 64 kbps. For 16-bit PCM at 44.1 kHz (a CD audio channel), it is 705.6 kbps per channel.

rendering diagram...

The anti-aliasing filter is non-negotiable. Without it, anything above $f_s / 2$ folds back into the band of interest as irreversible noise (Chapter 3 covered this in detail).

2.2 Quantization is the new noise source

Sampling is reversible if Nyquist is satisfied. Quantization is not. The instant you round 1.4137 V to "level 1" and 1.4138 V to "level 1" you have lost the difference between them forever. The error is irrecoverable but, crucially, it is bounded.

Suppose the quantizer has $L$ levels uniformly spaced by step size $\Delta$ across a peak-to-peak range $V_{pp}$ , so $\Delta = V_{pp} / L$ . For a sample $x$ , let the rounded value be $\hat{x}$ . The quantization error is $e = x - \hat{x}$ , and it satisfies $-\Delta/2 \le e \le \Delta/2$ . We can no longer say what $e$ is for any one sample without knowing $x$ exactly, but we can say what $e$ does statistically. If the input signal varies continuously over many quantization steps (true for almost any non-trivial signal), then $e$ is well-modelled as uniformly distributed on $[-\Delta/2, \Delta/2]$ , with zero mean.

Pinball-bumper analogy. Imagine a long row of bumpers on a pinball table, each of width $\Delta$ . Drop the ball anywhere along the row; whichever bumper it lands on, the centre of that bumper is $\Delta/2$ at most from where the ball actually fell. Over many drops, the offsets fill out uniformly. That is quantization noise.

2.3 Deriving $\Delta^2 / 12$ , the most-used line in DSP

Let us actually work out the noise power. The mean-square value of a uniform random variable $e$ on $[-\Delta/2, \Delta/2]$ is

$E[e^2] = \frac{1}{\Delta} \int_{-\Delta/2}^{\Delta/2} e^2 \, de = \frac{1}{\Delta} \left[ \frac{e^3}{3} \right]_{-\Delta/2}^{\Delta/2} = \frac{1}{\Delta} \cdot \frac{2}{3}\cdot\left(\frac{\Delta}{2}\right)^3 = \frac{\Delta^2}{12}.$

So the quantization noise power is $\sigma_q^2 = \Delta^2 / 12$ . That single result is the cornerstone of every ADC's data sheet.

This formula assumes three things, all close to true in practice. First, the signal exercises many levels (busy enough that the error truly looks random). Second, the noise is uncorrelated from sample to sample (true if the signal moves at least one step per sample). Third, the levels are uniform. Companding (later) violates the third assumption deliberately and we will adjust.

2.4 Deriving the SNR formula $6.02 n + 1.76$ dB

Now suppose the input is a full-scale sinusoid $x(t) = A \sin(2\pi f t)$ , so its peak-to-peak is $2A$ and it just spans the quantizer's range, $V_{pp} = 2A$ . The sinusoid's mean-square value is $A^2 / 2$ . With $L = 2^n$ levels and step size $\Delta = 2A / 2^n = A \cdot 2^{1-n}$ :

$\sigma_q^2 = \frac{\Delta^2}{12} = \frac{A^2 \cdot 2^{2-2n}}{12} = \frac{A^2}{3 \cdot 2^{2n}}.$

The signal-to-quantization-noise ratio is

$\text{SQNR} = \frac{A^2/2}{A^2 / (3 \cdot 2^{2n})} = \frac{3}{2} \cdot 2^{2n}.$

Take 10 log of both sides:

$\text{SQNR}_\text{dB} = 10 \log_{10}\left(\tfrac{3}{2}\right) + 10 \log_{10}(2^{2n}) = 1.76 + 6.02 n \text{ dB.}$

There is the famous formula. Each extra bit of resolution buys you 6.02 dB of SNR for a sinusoidal input that fully exercises the quantizer.

Bits	SQNR (full-scale sine)
8	49.9 dB
12	74.0 dB
14	86.0 dB
16	98.1 dB
18	110.2 dB
20	122.2 dB
24	146.2 dB

Real ADCs are usually somewhat below these numbers because of thermal noise, jitter, integral and differential nonlinearity, and so on. The vendor reports an effective number of bits (ENOB) that is typically 2 to 4 below the nameplate bit count. A 24-bit audio ADC with 21 ENOB is excellent. A 16-bit successive-approximation ADC at 14.5 ENOB is normal.

2.5 PCM bandwidth

Each sample becomes $n$ bits, all delivered in $1/f_s$ seconds. If the sample rate is $f_s$ and the message bandwidth is $W$ , with $f_s \ge 2W$ , the bit rate is $R_b = n f_s \ge 2 n W$ . The minimum PCM bandwidth, before any pulse shaping or modulation, is therefore on the order of $n W$ for binary signalling (one bit per channel use) or $n W / \log_2 M$ for $M$ -ary signalling.

Whatever you save on quantization fidelity, you pay in spectrum. Eight-bit PCM telephone audio (4 kHz message, 8 kHz sample rate, 8 bits) needs roughly 32 kHz of bandwidth on the wire. CD-quality stereo (44.1 kHz, 16 bits, two channels) needs roughly 1.4 MHz. This is the bandwidth-power tradeoff embodied in transmission of digitized audio.

2.6 Companding: throwing bits where they matter

Voice signals are a strange beast. The amplitude distribution is peaked sharply around zero, with long but rare excursions to large amplitudes. Most samples sit in the middle of the range. A uniform quantizer wastes resolution on the rare large samples; small signals (weak voice, low background) are quantized too coarsely.

Compand (compress + expand) is the trick: pass the signal through a logarithmic compressor before quantizing, so small values get more resolution; on the receive side, expand it back with the inverse curve. Net effect: a non-uniform quantizer with finer steps near zero and coarser ones at the extremes, but built using a uniform quantizer in the middle.

Two companding curves dominate.

A-law (Europe, ITU-T G.711). Uses a piecewise compressor with $A = 87.6$ . Sample at 8 kHz, compand, quantize to 8 bits, transmit at 64 kbps. This is the European telephone standard.
µ-law (North America, Japan). Uses $\mu = 255$ . Same goal, slightly different curve, slightly better performance for very low amplitudes.

Both compress the effective dynamic range from 13 bits down to 8 bits, giving a roughly 13-bit subjective audio quality from an 8-bit pipe at 64 kbps. The world's telephone backbone ran on this scheme for half a century. Even today, "PSTN-quality" audio means G.711 with one of these companders.

Suitcase analogy. Imagine packing for a trip. Uniform quantization is like dividing your suitcase into equal-sized cubes, each holding one item regardless of size. Companding is rolling clothes tighter and stacking small items in many small cubes while large items get one big cube. You fit more useful stuff in. Companding gives small voice signals lots of cubes (resolution) and rare loud ones a few big cubes.

2.7 DPCM: encode the difference

Voice samples are highly correlated from one moment to the next. The amplitude at sample $n$ is a great predictor of sample $n+1$ . So instead of encoding the full sample, encode the prediction residual. If your predictor is even mediocre, the residual has much smaller dynamic range than the signal itself, so you can quantize it with fewer bits at the same fidelity.

Differential PCM (DPCM) adds a simple predictor (often a single delay, $\hat{x}[n] = x[n-1]$ , or a more sophisticated linear filter trained on speech statistics) and quantizes only the difference $d[n] = x[n] - \hat{x}[n]$ . Real-world DPCM cuts the bit rate roughly in half for the same audio quality.

Adaptive DPCM (ADPCM) adapts the predictor and quantizer step size to track the signal, packing telephone-grade voice into 32 kbps (G.726). When you used to hear cordless phones say "DECT" or saw your voicemail use ADPCM, this was at work. ADPCM is the conceptual ancestor of every modern speech codec; CELP, AMR, EVS, and Opus all use the prediction-and-residual idea, just with much fancier predictors.

2.8 Delta modulation: one bit per sample

Push DPCM to the extreme: one bit per sample. The encoder sends a single bit each step that says only "go up by $\Delta$ " or "go down by $\Delta$ ." The receiver maintains a staircase that follows the signal one step at a time.

plaintext

Signal m(t): smooth curve
Staircase :  +Δ +Δ +Δ -Δ +Δ -Δ -Δ -Δ -Δ ...
Bit stream:   1  1  1  0  1  0  0  0  0

Two failure modes are baked in.

Slope overload. If the signal changes faster than $\Delta f_s$ , the staircase cannot keep up. The reconstructed signal lags and clips off the fast peaks.
Granular noise. When the signal is flat, the staircase oscillates $\pm \Delta$ around it, never settling. You hear a steady background hash.

You can trade these two off by choosing $\Delta$ . Small $\Delta$ kills granular noise but worsens slope overload; large $\Delta$ does the reverse. The two demands fight each other.

Adaptive delta modulation (ADM) wins by adjusting $\Delta$ on the fly: shrink it when the signal is slow, grow it when the signal is fast. CVSD (continuously variable slope delta) is a famous variant used in early military and Bluetooth voice (HV3 SCO links).

Delta modulation never replaced PCM in the trunk network, but its descendants survive in audio and oversampled converters. A modern delta-sigma ADC (the $\Sigma\Delta$ in your phone's audio codec) is essentially an over-clocked, feedback-stabilized delta modulator that achieves 24-bit equivalent resolution by sampling 256 times faster than Nyquist and pushing quantization noise out of the audio band.

2.9 Noise considerations in PCM systems

Three sources of error matter.

Quantization noise (we just derived $\Delta^2 / 12$ ). It is irreducible at the encoder and dominant when the channel is good.
Channel noise in the analog modem-to-bit path. If the digital symbols are demodulated with bit error rate $p$ , each erroneous bit changes a sample by anywhere from $\Delta$ (least significant bit flipped) to up to roughly half the full-scale (most significant bit flipped). The mean-square signal-domain noise contribution from channel errors works out, for a uniform $n$ -bit code with error rate $p$ , to about $p \cdot 4 V_{pp}^2 / 3$ , dwarfing quantization noise as soon as $p$ exceeds about $2^{-2n}$ .
Sampling jitter in the clock. A jitter of $\sigma_t$ on the sample clock turns into amplitude noise of $\sigma_a = (2\pi f) \sigma_t \cdot A$ for a sinusoid of amplitude $A$ at frequency $f$ . The faster the signal, the more jitter hurts. Modern audio ADCs spec jitter in tens of femtoseconds for the 24-bit / 192 kHz market.

The dominant story is simple. PCM-encoded data through a clean channel: quantization noise sets the floor. PCM through a noisy channel without coding: random bit flips trash the audio. PCM with a strong channel code: the coded system threshold-protects us, and quantization noise reasserts itself as the floor. Channel coding lets PCM enjoy quantization-limited fidelity over a wide range of channel conditions instead of going off a cliff.