2.1 Why we need the DFT
The DTFT is continuous in . A computer cannot store a continuous function; it can only store samples. So we sample at uniformly spaced frequencies for , evaluating the DTFT at each:
If is finite-length, of length , only contributes. This gives the Discrete Fourier Transform (DFT):
with inverse
Both transform pairs are length- sequences; the DFT is a one-to-one mapping between two -vectors.
It is convenient to define the twiddle factor . Then
The twiddle factors are th roots of unity living on the unit circle: are evenly spaced points on the circle. We will see them again in the FFT.
2.2 DFT versus DTFT versus DFS, the cast of characters
A common confusion. Let us name names.
- DTFT : discrete in time, continuous in frequency, periodic in with period . The "true" frequency-domain representation of a discrete-time signal. Mathematical, not directly computable.
- DFS (Discrete Fourier Series): for a periodic discrete-time signal of period , this expands it into harmonics. Coefficients are exactly the DFT of one period.
- DFT : discrete in time, discrete in frequency. What you actually compute on a computer. Implicitly assumes the input is one period of a periodic signal; hence the circular wraparound that we are about to meet.
The DFT is what you call when you write numpy.fft.fft(x). The FFT is just a fast algorithm to compute the DFT.
2.3 Properties of the DFT, with their gotchas
Most of the DFT's properties mirror the DTFT's, but one mismatch matters and trips up everyone the first time: circularity.
Treat as if it were defined for all by periodically repeating its -sample window: . The DFT sees this periodic extension. A "shift" in time is not a linear shift but a circular shift: samples that fall off one end of the window come back around the other end.
Properties of the DFT:
- Linearity: .
- Circular shift in time: . The shift is circular, not linear.
- Circular convolution: . The DFT turns circular convolution into multiplication, not linear convolution. This is the gotcha.
- Symmetry: for a real-valued , (conjugate symmetry around ). So real signals' DFTs have only unique values; the rest are mirror copies.
- Parseval's theorem: . Energy in time equals energy in frequency, modulo the scale factor.
Why is circular convolution wrong for filtering? Imagine you want to filter audio with FIR coefficients . Real filtering is linear convolution: each sample of the input excites a copy of , the copies are shifted in time and added, no wraparound. The DFT, applied directly, gives circular convolution, which assumes the input wraps around: so the end of the signal contaminates the beginning. The fix, derived next, is to zero-pad before DFT.
2.4 Linear convolution via the DFT, with zero-padding
To compute linear convolution of (length ) with (length ) via the DFT:
- The linear convolution has length .
- Zero-pad both and to length .
- Compute and via length- DFTs.
- Multiply pointwise: .
- Inverse DFT to get .
The key fact: when , the wraparound region of the circular convolution is zeroed out by the zero-padding, so circular convolution coincides with linear convolution over the relevant indices. Pick a power of two for FFT efficiency and you have the standard fast convolution.
2.5 Filtering long sequences: overlap-add and overlap-save
Now suppose is a million samples long and is a 64-tap filter. Computing one giant DFT of a million points works but is wasteful (and not real-time). The trick is to chunk the input.
Overlap-add (OLA):
Divide into non-overlapping blocks of length (typically or ). For each block of length :
- Zero-pad and to length where .
- FFT each, multiply, IFFT. This gives the linear convolution of and , length .
- Add this into the output, starting at position . Successive output blocks overlap in their first samples; those overlaps are summed.
The output is the correct linear convolution of the entire and , computed block by block.
Overlap-save (OLS):
Different bookkeeping, same final answer.
Divide into overlapping blocks of length , each overlapping the previous by samples. For each block:
- FFT the length- block, multiply by (a length- FFT of zero-padded ), IFFT. This gives a length- block.
- Discard the first samples (they are corrupted by circular wraparound from the previous block).
- The remaining samples are valid output, concatenated end-to-end.
Worked example. Filter a 10000-sample input with a 65-tap FIR. Choose samples per OLA block, then (next power of 2 above ). For each input block:
- FFT of zero-padded length-256 block.
- Multiply pointwise by precomputed (length-256 FFT of zero-padded ).
- IFFT gives length-256 output block.
- Add into output buffer at offset 0, 192, 384, ...
Compare to direct convolution: multiplies. With OLA: each block is one length-256 FFT ( multiplies times 2 for FFT and IFFT), one pointwise multiply (256), so per block. With blocks, multiplies. Roughly speedup, growing rapidly with longer filters.
Real-time audio processing (effects pedals, room equalization, hearing aids, convolution reverbs) uses OLA or OLS with very small block sizes for low latency, often combined with partitioned convolution: split into segments, convolve each with the appropriate input chunk, and combine.
2.6 The FFT, why and not
A straightforward computation of the DFT, as defined, is : for each of output bins, sum products. For , that is over a million complex multiplies. For (a small audio FFT), four billion. Not practical.
The Fast Fourier Transform, published by Cooley and Tukey in 1965 (rediscovered, in fact: Gauss had something similar around 1805), gets the same answer in operations. For , about ten thousand multiplies. A hundred-fold speedup. For , a million-fold speedup. The FFT is one of the most important algorithms ever discovered; it is what made real-time spectrum analysis, CT scanners, JPEG compression, OFDM modems, and modern radar/sonar feasible.
Matryoshka analogy. A length- DFT can be computed by combining two length- DFTs. Each of those length- DFTs can be computed by combining two length- DFTs. Recurse all the way down to length-1 DFTs, which are trivial. Like Russian nesting dolls, each FFT contains two smaller FFTs, those contain four still-smaller ones, and so on. The total work piles up to instead of because most of the multiplies in the direct method are the same multiplies done over and over; the FFT reuses them.
Here is the derivation. Split the sum by even and odd :
(used ).
Define as the length- DFT of the even-indexed samples and as the length- DFT of the odd-indexed samples. Then:
(used the identity has period in , and ).
So a length- DFT decomposes into two length- DFTs and one combine step that costs complex multiplies and additions. Recurrence:
By the master theorem (or simply unrolling): . That is the magic.
Concretely, complex multiplies (the simplest bookkeeping). For , that is . The direct DFT was . Hundred-fold.
2.7 Radix-2 DIT FFT (decimation-in-time)
The decimation we just did, splitting input by even/odd, is decimation-in-time (DIT). Implemented stage by stage, it produces a beautiful butterfly structure. Let's draw the 4-point DIT FFT:
x[0] ────●──────────● X[0]
│\ /│
│ \ W₄⁰ / │
x[2] ────●──╲────╱──● X[1]
│ ╲ ╱ │
│ ╲╱ │
x[1] ────●────╲╱────● X[2]
│ W₄⁰╲╱ │
│ ╱ ╲W₄¹│
x[3] ────●──╱────╲──● X[3]
W₄²Each "X" pair is a butterfly: two inputs go in, two outputs and come out. Two stages, two butterflies per stage. The input order is , the bit-reversal of (). The output is in natural order.
For an 8-point FFT, three stages of butterflies, butterflies total, with twiddle factors in the last stage, in the middle, and in the first.
8-point radix-2 DIT FFT, schematic:
Stage 1 (W₂): Stage 2 (W₄): Stage 3 (W₈):
x[0] ●──╮ ●──╮ ●──╮
⊕ ⊕ ⊕
x[4] ●──╯ W₂⁰ ●─╯ W₄⁰ ●─╯ W₈⁰
●──╮ ●──╮
⊕ ⊕
x[2] ●──╮ ●─╯ W₄¹ ●─╯ W₈¹
⊕ ●──╮
x[6] ●──╯ W₂⁰ ●──╮ ⊕
⊕ ●─╯ W₈²
x[1] ●──╮ ●─╯ W₄⁰
⊕ ...
x[5] ●──╯ W₂⁰ ●──╮
⊕
x[3] ●──╮ ●─╯ W₄¹
⊕
x[7] ●──╯ W₂⁰The butterflies have a beautifully regular structure. Each stage has butterflies and multiplies. There are stages. Total: complex multiplies. Even better than our naive count.
2.8 Radix-2 DIF FFT (decimation-in-frequency)
A different decomposition: split the output into even-indexed and odd-indexed bins. The algebra leads to splitting the input into first-half and second-half (rather than even-and-odd), with butterflies first and twiddles after. The signal flow graph is a mirror image of DIT: bit-reversal happens at the output this time, with input in natural order.
DIT and DIF have identical complexity and do the same job. Pick whichever you like for your implementation.
2.9 Inverse FFT
The IDFT differs from the DFT only in the sign of the exponent and the scaling:
So you can reuse your FFT code with a sign flip on the twiddle factors. A clever alternative: take the conjugate of the input, do an FFT, take the conjugate again, divide by . Same answer.
In numpy, numpy.fft.ifft(X) does this for you.
2.10 Mixed-radix and split-radix FFTs
Radix-2 requires to be a power of two. Real-life FFT lengths sometimes are not (audio FFTs of 4410 samples for 100 ms at 44.1 kHz, etc.). Mixed-radix FFTs handle composite by recursing with different radices at each level. Cooley-Tukey was originally formulated this way; radix-2 is the simplest special case.
For prime where mixed-radix does not apply, Rader's algorithm and Bluestein's chirp-z transform convert prime-length DFTs into convolutions, which can be done by power-of-two FFT and zero-padding. Modern FFT libraries (FFTW, MKL, cuFFT) use a portfolio of such algorithms, choosing automatically by length.
Split-radix FFT uses a clever 2/4 split (radix-2 for even-indexed outputs, radix-4 for odd) that achieves the lowest known multiplier count for power-of-two . About real multiplies, slightly better than radix-2's count.
2.11 FFT, in code
Here is the FFT from scratch, the recursive radix-2 DIT, in numpy. Notice how the math from 2.6 maps to code line for line.
import numpy as np
def fft_radix2(x):
"""Recursive radix-2 DIT FFT. Input length must be a power of 2."""
N = len(x)
if N == 1:
return x
if N % 2 != 0:
raise ValueError("FFT length must be a power of 2")
# Recurse on even and odd subsequences
Xe = fft_radix2(x[0::2])
Xo = fft_radix2(x[1::2])
# Twiddle factors
k = np.arange(N // 2)
W = np.exp(-2j * np.pi * k / N)
# Combine
X = np.empty(N, dtype=complex)
X[:N//2] = Xe + W * Xo
X[N//2:] = Xe - W * Xo
return X
# Sanity check against numpy
x = np.random.randn(1024) + 1j * np.random.randn(1024)
X_ours = fft_radix2(x)
X_numpy = np.fft.fft(x)
print(np.allclose(X_ours, X_numpy)) # TrueThe recursion does the work; the explicit divide-and-conquer mirrors the matryoshka picture exactly. Real production FFTs unroll the recursion into iterative loops, use bit-reversal permutations, and exploit SIMD instructions, but the math is identical.
2.12 FFT applications
FFT is everywhere. A short tour, with hardware-security flavors at the end:
- Spectrum analyzers. Every benchtop spectrum analyzer (or, in software, every FFT plot in Audacity or sox) is just a windowed-FFT of a captured time series. Real-time spectrum displays in radio scanners, RF test equipment, and audio spectrograms are FFTs every few milliseconds.
- OFDM modulation. Wi-Fi (802.11 a/g/n/ac/ax/be), LTE, 5G, DVB-T, ADSL: all carry data on hundreds or thousands of parallel sub-carriers, each modulated with QAM. The trick: an OFDM symbol is generated by inverse-FFT of a vector of QAM constellation points (one per sub-carrier), and decoded by FFT at the receiver. Frequency-selective fading channels become a per-sub-carrier scaling that can be equalized one-tap-at-a-time. The whole OFDM revolution depended on the FFT being fast enough to compute every symbol time.
- Image processing. JPEG uses the DCT (a Fourier relative); JPEG-2000 uses wavelets; medical imaging (MRI) reconstructs images by 2D inverse-FFT of measured k-space samples.
- Audio effects. Convolution reverbs do partitioned-overlap-save FFT convolution to apply long impulse responses (Carnegie Hall, your favorite cathedral) to dry recordings.
- Vibration analysis. Industrial machinery (motors, turbines, bearings) generates characteristic vibration spectra. Bearing wear, shaft imbalance, and gear-tooth defects each show up at predictable frequencies. Predictive-maintenance systems FFT a continuous accelerometer feed and watch for telltale spectral changes.
- Radar. Pulse-Doppler radar uses FFT across multiple pulse returns to separate moving targets by velocity. Synthetic-aperture radar (SAR) uses 2D FFT to form images.
- Side-channel attacks (DPA, DEMA). Cross-correlation of a captured power trace with a hypothesis trace is fastest in the frequency domain (multiply FFTs, inverse-FFT). Spectral analysis of leaked EM radiation reveals which clock harmonics carry which sub-circuit's switching, helping attackers locate the cryptographic core on a chip. We will return to this in Chapter 24.