Let's apply everything in this chapter to one real-world device. When you record audio with your phone and the file is encoded as MP3:
rendering diagram...
Reading top-down:
- The microphone captures continuous-time audio: pressure variations in air, converted to voltage by a piezoelectric or condenser sensor.
- An anti-aliasing filter (analog, cutoff around 20 kHz) blocks high frequencies before sampling. Section 3.4.
- Sampling and quantization turn the analog signal into discrete-time, discrete-amplitude samples, typically 44.1 kHz, 16 bits per sample. Now we have . Section 3.
- The encoder splits into short frames (about 25 ms each, 1152 samples). Audio is locally stationary on this scale; treat each frame independently.
- Each frame is filtered into 32 sub-bands by a polyphase filterbank, then each sub-band is processed with a Modified Discrete Cosine Transform (MDCT), a Fourier relative tuned for spectral analysis of overlapping windows. Now we know how much energy is at each frequency. Section 2.
- A psychoacoustic model decides, for each sub-band, the minimum precision (number of bits) needed to encode that frequency without audible artifacts. Quiet sub-bands near loud ones get fewer bits; they are "masked" by their neighbors. This is where MP3 really earns its compression: throw away inaudible information.
- The sub-band coefficients are quantized with the chosen precision.
- The bits are Huffman-coded for further compression.
- The compressed bitstream is the MP3 file.
Decoding is the reverse: bitstream → Huffman decode → de-quantize → inverse transform → discrete-time audio → DAC → analog audio → speaker. The reconstruction filter at the DAC is exactly Section 3.5.
Every concept in this chapter shows up: sampling theorem (step 3), Fourier (step 5), LTI (the filterbank), quantization (step 7). You can't build MP3 without this chapter, and you can't break a side-channel attack without it either, as the next section makes concrete.