>
section 2 of 1113 min read

2. The DFT and the FFT

2.1 Why we need the DFT

The DTFT X(ejω)X(e^{j\omega}) is continuous in ω\omega. A computer cannot store a continuous function; it can only store samples. So we sample X(ejω)X(e^{j\omega}) at NN uniformly spaced frequencies ωk=2πk/N\omega_k = 2\pi k/N for k=0,1,,N1k = 0, 1, \ldots, N-1, evaluating the DTFT at each:

X[k]=X(ej2πk/N)=n=x[n]ej2πkn/NX[k] = X(e^{j 2\pi k/N}) = \sum_{n=-\infty}^{\infty} x[n]\,e^{-j 2\pi kn/N}

If x[n]x[n] is finite-length, of length NN, only n=0,,N1n = 0, \ldots, N-1 contributes. This gives the Discrete Fourier Transform (DFT):

X[k]=n=0N1x[n]ej2πkn/N,k=0,1,,N1\boxed{\,X[k] = \sum_{n=0}^{N-1} x[n]\,e^{-j 2\pi kn/N}, \quad k = 0, 1, \ldots, N-1\,}

with inverse

x[n]=1Nk=0N1X[k]ej2πkn/N,n=0,1,,N1x[n] = \frac{1}{N}\sum_{k=0}^{N-1} X[k]\,e^{j 2\pi kn/N}, \quad n = 0, 1, \ldots, N-1

Both transform pairs are length-NN sequences; the DFT is a one-to-one mapping between two NN-vectors.

It is convenient to define the twiddle factor WN=ej2π/NW_N = e^{-j 2\pi/N}. Then

X[k]=n=0N1x[n]WNknX[k] = \sum_{n=0}^{N-1} x[n]\,W_N^{kn}

The twiddle factors are NNth roots of unity living on the unit circle: WN0,WN1,,WNN1W_N^0, W_N^1, \ldots, W_N^{N-1} are evenly spaced points on the circle. We will see them again in the FFT.

2.2 DFT versus DTFT versus DFS, the cast of characters

A common confusion. Let us name names.

  • DTFT X(ejω)X(e^{j\omega}): discrete in time, continuous in frequency, periodic in ω\omega with period 2π2\pi. The "true" frequency-domain representation of a discrete-time signal. Mathematical, not directly computable.
  • DFS (Discrete Fourier Series): for a periodic discrete-time signal of period NN, this expands it into harmonics. Coefficients are exactly the DFT of one period.
  • DFT X[k]X[k]: discrete in time, discrete in frequency. What you actually compute on a computer. Implicitly assumes the input is one period of a periodic signal; hence the circular wraparound that we are about to meet.

The DFT is what you call when you write numpy.fft.fft(x). The FFT is just a fast algorithm to compute the DFT.

2.3 Properties of the DFT, with their gotchas

Most of the DFT's properties mirror the DTFT's, but one mismatch matters and trips up everyone the first time: circularity.

Treat x[n]x[n] as if it were defined for all nn by periodically repeating its NN-sample window: x~[n]=x[nmodN]\tilde{x}[n] = x[n \bmod N]. The DFT sees this periodic extension. A "shift" in time is not a linear shift but a circular shift: samples that fall off one end of the window come back around the other end.

Properties of the DFT:

  • Linearity: ax1[n]+bx2[n]aX1[k]+bX2[k]a x_1[n] + b x_2[n] \leftrightarrow a X_1[k] + b X_2[k].
  • Circular shift in time: x[(nn0)modN]X[k]ej2πkn0/Nx[(n - n_0) \bmod N] \leftrightarrow X[k]\,e^{-j 2\pi k n_0/N}. The shift is circular, not linear.
  • Circular convolution: (x1x2)[n]    X1[k]X2[k]\,(x_1 \circledast x_2)[n] \;\leftrightarrow\; X_1[k]\,X_2[k]. The DFT turns circular convolution into multiplication, not linear convolution. This is the gotcha.
  • Symmetry: for a real-valued x[n]x[n], X[k]=X[Nk]X[k] = X^*[N - k] (conjugate symmetry around k=0k = 0). So real signals' DFTs have only N/2+1N/2 + 1 unique values; the rest are mirror copies.
  • Parseval's theorem: n=0N1x[n]2=(1/N)k=0N1X[k]2\sum_{n=0}^{N-1} |x[n]|^2 = (1/N) \sum_{k=0}^{N-1} |X[k]|^2. Energy in time equals energy in frequency, modulo the scale factor.

Why is circular convolution wrong for filtering? Imagine you want to filter audio x[n]x[n] with FIR coefficients h[n]h[n]. Real filtering is linear convolution: each sample of the input excites a copy of hh, the copies are shifted in time and added, no wraparound. The DFT, applied directly, gives circular convolution, which assumes the input wraps around: so the end of the signal contaminates the beginning. The fix, derived next, is to zero-pad before DFT.

2.4 Linear convolution via the DFT, with zero-padding

To compute linear convolution of x[n]x[n] (length LxL_x) with h[n]h[n] (length LhL_h) via the DFT:

  1. The linear convolution has length Ly=Lx+Lh1L_y = L_x + L_h - 1.
  2. Zero-pad both xx and hh to length NLyN \geq L_y.
  3. Compute X[k]X[k] and H[k]H[k] via length-NN DFTs.
  4. Multiply pointwise: Y[k]=X[k]H[k]Y[k] = X[k] H[k].
  5. Inverse DFT to get y[n]y[n].

The key fact: when NLx+Lh1N \geq L_x + L_h - 1, the wraparound region of the circular convolution is zeroed out by the zero-padding, so circular convolution coincides with linear convolution over the relevant indices. Pick NN a power of two for FFT efficiency and you have the standard fast convolution.

2.5 Filtering long sequences: overlap-add and overlap-save

Now suppose x[n]x[n] is a million samples long and h[n]h[n] is a 64-tap filter. Computing one giant DFT of a million points works but is wasteful (and not real-time). The trick is to chunk the input.

Overlap-add (OLA):

Divide x[n]x[n] into non-overlapping blocks of length LL (typically L256L \sim 256 or 10241024). For each block xm[n]x_m[n] of length LL:

  1. Zero-pad xmx_m and hh to length NL+M1N \geq L + M - 1 where M=LhM = L_h.
  2. FFT each, multiply, IFFT. This gives the linear convolution ymy_m of xmx_m and hh, length L+M1L + M - 1.
  3. Add this ymy_m into the output, starting at position mLmL. Successive output blocks overlap in their first M1M-1 samples; those overlaps are summed.

The output is the correct linear convolution of the entire xx and hh, computed block by block.

Overlap-save (OLS):

Different bookkeeping, same final answer.

Divide x[n]x[n] into overlapping blocks of length NN, each overlapping the previous by M1M - 1 samples. For each block:

  1. FFT the length-NN block, multiply by H[k]H[k] (a length-NN FFT of zero-padded hh), IFFT. This gives a length-NN block.
  2. Discard the first M1M - 1 samples (they are corrupted by circular wraparound from the previous block).
  3. The remaining NM+1N - M + 1 samples are valid output, concatenated end-to-end.

Worked example. Filter a 10000-sample input with a 65-tap FIR. Choose L=192L = 192 samples per OLA block, then N=256N = 256 (next power of 2 above 192+651=256192 + 65 - 1 = 256). For each input block:

  • FFT of zero-padded length-256 block.
  • Multiply pointwise by precomputed H[k]H[k] (length-256 FFT of zero-padded hh).
  • IFFT gives length-256 output block.
  • Add into output buffer at offset 0, 192, 384, ...

Compare to direct convolution: 10000×65=650,00010000 \times 65 = 650{,}000 multiplies. With OLA: each block is one length-256 FFT (256log2256=2048\sim 256 \log_2 256 = 2048 multiplies times 2 for FFT and IFFT), one pointwise multiply (256), so 4400\sim 4400 per block. With 10000/1925310000/192 \approx 53 blocks, 233,000\sim 233{,}000 multiplies. Roughly 3×3\times speedup, growing rapidly with longer filters.

Real-time audio processing (effects pedals, room equalization, hearing aids, convolution reverbs) uses OLA or OLS with very small block sizes for low latency, often combined with partitioned convolution: split hh into segments, convolve each with the appropriate input chunk, and combine.

2.6 The FFT, why O(NlogN)O(N \log N) and not O(N2)O(N^2)

A straightforward computation of the DFT, as defined, is O(N2)O(N^2): for each of NN output bins, sum NN products. For N=1024N = 1024, that is over a million complex multiplies. For N=65536N = 65536 (a small audio FFT), four billion. Not practical.

The Fast Fourier Transform, published by Cooley and Tukey in 1965 (rediscovered, in fact: Gauss had something similar around 1805), gets the same answer in O(NlogN)O(N \log N) operations. For N=1024N = 1024, about ten thousand multiplies. A hundred-fold speedup. For N=65536N = 65536, a million-fold speedup. The FFT is one of the most important algorithms ever discovered; it is what made real-time spectrum analysis, CT scanners, JPEG compression, OFDM modems, and modern radar/sonar feasible.

Matryoshka analogy. A length-NN DFT can be computed by combining two length-N/2N/2 DFTs. Each of those length-N/2N/2 DFTs can be computed by combining two length-N/4N/4 DFTs. Recurse all the way down to length-1 DFTs, which are trivial. Like Russian nesting dolls, each FFT contains two smaller FFTs, those contain four still-smaller ones, and so on. The total work piles up to Nlog2NN \log_2 N instead of N2N^2 because most of the multiplies in the N2N^2 direct method are the same multiplies done over and over; the FFT reuses them.

Here is the derivation. Split the sum by even and odd nn:

X[k]=n=0N1x[n]WNknX[k] = \sum_{n=0}^{N-1} x[n]\,W_N^{kn}         =m=0N/21x[2m]WN2mk+m=0N/21x[2m+1]WN(2m+1)k\;\;\;\;= \sum_{m=0}^{N/2-1} x[2m]\,W_N^{2mk} + \sum_{m=0}^{N/2-1} x[2m+1]\,W_N^{(2m+1)k}         =m=0N/21x[2m]WN/2mk+WNkm=0N/21x[2m+1]WN/2mk\;\;\;\;= \sum_{m=0}^{N/2-1} x[2m]\,W_{N/2}^{mk} + W_N^{k}\sum_{m=0}^{N/2-1} x[2m+1]\,W_{N/2}^{mk}

(used WN2=WN/2W_N^2 = W_{N/2}).

Define Xe[k]X_e[k] as the length-N/2N/2 DFT of the even-indexed samples and Xo[k]X_o[k] as the length-N/2N/2 DFT of the odd-indexed samples. Then:

X[k]=Xe[k]+WNkXo[k],k=0,,N/21X[k] = X_e[k] + W_N^k\,X_o[k], \quad k = 0, \ldots, N/2 - 1 X[k+N/2]=Xe[k]WNkXo[k],k=0,,N/21X[k + N/2] = X_e[k] - W_N^k\,X_o[k], \quad k = 0, \ldots, N/2 - 1

(used the identity WN/2mkW_{N/2}^{mk} has period N/2N/2 in kk, and WNN/2=1W_N^{N/2} = -1).

So a length-NN DFT decomposes into two length-N/2N/2 DFTs and one combine step that costs N/2N/2 complex multiplies and NN additions. Recurrence:

T(N)=2T(N/2)+NT(N) = 2\,T(N/2) + N

By the master theorem (or simply unrolling): T(N)=O(Nlog2N)T(N) = O(N \log_2 N). That is the magic.

Concretely, T(N)=Nlog2NT(N) = N \log_2 N complex multiplies (the simplest bookkeeping). For N=1024N = 1024, that is 102410=10,2401024 \cdot 10 = 10{,}240. The direct DFT was 102421061024^2 \approx 10^6. Hundred-fold.

2.7 Radix-2 DIT FFT (decimation-in-time)

The decimation we just did, splitting input by even/odd, is decimation-in-time (DIT). Implemented stage by stage, it produces a beautiful butterfly structure. Let's draw the 4-point DIT FFT:

plaintext
   x[0] ────●──────────● X[0]
            │\        /│
            │ \  W₄⁰ / │
   x[2] ────●──╲────╱──● X[1]
            │   ╲  ╱   │
            │    ╲╱    │
   x[1] ────●────╲╱────● X[2]
            │ W₄⁰╲╱    │
            │   ╱  ╲W₄¹│
   x[3] ────●──╱────╲──● X[3]
                W₄²

Each "X" pair is a butterfly: two inputs a,ba, b go in, two outputs a+Wba + W b and aWba - W b come out. Two stages, two butterflies per stage. The input order is x[0],x[2],x[1],x[3]x[0], x[2], x[1], x[3], the bit-reversal of 0,1,2,30, 1, 2, 3 (00,10,01,1100, 10, 01, 11). The output is in natural order.

For an 8-point FFT, three stages of butterflies, 4+4+4=124 + 4 + 4 = 12 butterflies total, with twiddle factors W8kW_8^k in the last stage, W82k=W4kW_8^{2k} = W_4^k in the middle, and W84k=W2k=±1W_8^{4k} = W_2^k = \pm 1 in the first.

plaintext
8-point radix-2 DIT FFT, schematic:
 
Stage 1 (W₂):     Stage 2 (W₄):    Stage 3 (W₈):
x[0] ●──╮        ●──╮            ●──╮
        ⊕         ⊕               ⊕
x[4] ●──╯ W₂⁰    ●─╯ W₄⁰          ●─╯ W₈⁰
                 ●──╮              ●──╮
                  ⊕                 ⊕
x[2] ●──╮        ●─╯ W₄¹          ●─╯ W₈¹
        ⊕                          ●──╮
x[6] ●──╯ W₂⁰    ●──╮               ⊕
                 ⊕                 ●─╯ W₈²
x[1] ●──╮        ●─╯ W₄⁰
        ⊕                          ...
x[5] ●──╯ W₂⁰    ●──╮

x[3] ●──╮        ●─╯ W₄¹

x[7] ●──╯ W₂⁰

The butterflies have a beautifully regular structure. Each stage has N/2N/2 butterflies and N/2N/2 multiplies. There are log2N\log_2 N stages. Total: (N/2)log2N(N/2)\log_2 N complex multiplies. Even better than our naive count.

2.8 Radix-2 DIF FFT (decimation-in-frequency)

A different decomposition: split the output into even-indexed and odd-indexed bins. The algebra leads to splitting the input into first-half and second-half (rather than even-and-odd), with butterflies first and twiddles after. The signal flow graph is a mirror image of DIT: bit-reversal happens at the output this time, with input in natural order.

DIT and DIF have identical complexity and do the same job. Pick whichever you like for your implementation.

2.9 Inverse FFT

The IDFT differs from the DFT only in the sign of the exponent and the 1/N1/N scaling:

x[n]=1Nk=0N1X[k]WNknx[n] = \frac{1}{N} \sum_{k=0}^{N-1} X[k]\,W_N^{-kn}

So you can reuse your FFT code with a sign flip on the twiddle factors. A clever alternative: take the conjugate of the input, do an FFT, take the conjugate again, divide by NN. Same answer.

In numpy, numpy.fft.ifft(X) does this for you.

2.10 Mixed-radix and split-radix FFTs

Radix-2 requires NN to be a power of two. Real-life FFT lengths sometimes are not (audio FFTs of 4410 samples for 100 ms at 44.1 kHz, etc.). Mixed-radix FFTs handle composite N=N1N2N = N_1 N_2 \cdots by recursing with different radices at each level. Cooley-Tukey was originally formulated this way; radix-2 is the simplest special case.

For prime NN where mixed-radix does not apply, Rader's algorithm and Bluestein's chirp-z transform convert prime-length DFTs into convolutions, which can be done by power-of-two FFT and zero-padding. Modern FFT libraries (FFTW, MKL, cuFFT) use a portfolio of such algorithms, choosing automatically by length.

Split-radix FFT uses a clever 2/4 split (radix-2 for even-indexed outputs, radix-4 for odd) that achieves the lowest known multiplier count for power-of-two NN. About 4Nlog2N6N\sim 4N \log_2 N - 6N real multiplies, slightly better than radix-2's count.

2.11 FFT, in code

Here is the FFT from scratch, the recursive radix-2 DIT, in numpy. Notice how the math from 2.6 maps to code line for line.

python
import numpy as np
 
def fft_radix2(x):
    """Recursive radix-2 DIT FFT. Input length must be a power of 2."""
    N = len(x)
    if N == 1:
        return x
    if N % 2 != 0:
        raise ValueError("FFT length must be a power of 2")
    # Recurse on even and odd subsequences
    Xe = fft_radix2(x[0::2])
    Xo = fft_radix2(x[1::2])
    # Twiddle factors
    k = np.arange(N // 2)
    W = np.exp(-2j * np.pi * k / N)
    # Combine
    X = np.empty(N, dtype=complex)
    X[:N//2] = Xe + W * Xo
    X[N//2:] = Xe - W * Xo
    return X
 
# Sanity check against numpy
x = np.random.randn(1024) + 1j * np.random.randn(1024)
X_ours = fft_radix2(x)
X_numpy = np.fft.fft(x)
print(np.allclose(X_ours, X_numpy))   # True

The recursion does the work; the explicit divide-and-conquer mirrors the matryoshka picture exactly. Real production FFTs unroll the recursion into iterative loops, use bit-reversal permutations, and exploit SIMD instructions, but the math is identical.

2.12 FFT applications

FFT is everywhere. A short tour, with hardware-security flavors at the end:

  • Spectrum analyzers. Every benchtop spectrum analyzer (or, in software, every FFT plot in Audacity or sox) is just a windowed-FFT of a captured time series. Real-time spectrum displays in radio scanners, RF test equipment, and audio spectrograms are FFTs every few milliseconds.
  • OFDM modulation. Wi-Fi (802.11 a/g/n/ac/ax/be), LTE, 5G, DVB-T, ADSL: all carry data on hundreds or thousands of parallel sub-carriers, each modulated with QAM. The trick: an OFDM symbol is generated by inverse-FFT of a vector of QAM constellation points (one per sub-carrier), and decoded by FFT at the receiver. Frequency-selective fading channels become a per-sub-carrier scaling that can be equalized one-tap-at-a-time. The whole OFDM revolution depended on the FFT being fast enough to compute every symbol time.
  • Image processing. JPEG uses the DCT (a Fourier relative); JPEG-2000 uses wavelets; medical imaging (MRI) reconstructs images by 2D inverse-FFT of measured k-space samples.
  • Audio effects. Convolution reverbs do partitioned-overlap-save FFT convolution to apply long impulse responses (Carnegie Hall, your favorite cathedral) to dry recordings.
  • Vibration analysis. Industrial machinery (motors, turbines, bearings) generates characteristic vibration spectra. Bearing wear, shaft imbalance, and gear-tooth defects each show up at predictable frequencies. Predictive-maintenance systems FFT a continuous accelerometer feed and watch for telltale spectral changes.
  • Radar. Pulse-Doppler radar uses FFT across multiple pulse returns to separate moving targets by velocity. Synthetic-aperture radar (SAR) uses 2D FFT to form images.
  • Side-channel attacks (DPA, DEMA). Cross-correlation of a captured power trace with a hypothesis trace is fastest in the frequency domain (multiply FFTs, inverse-FFT). Spectral analysis of leaked EM radiation reveals which clock harmonics carry which sub-circuit's switching, helping attackers locate the cryptographic core on a chip. We will return to this in Chapter 24.