2. The DFT and the FFT // Digital Signal Processing // bhaswanth

2.1 Why we need the DFT

The DTFT $X(e^{j\omega})$ is continuous in $\omega$ . A computer cannot store a continuous function; it can only store samples. So we sample $X(e^{j\omega})$ at $N$ uniformly spaced frequencies $\omega_k = 2\pi k/N$ for $k = 0, 1, \ldots, N-1$ , evaluating the DTFT at each:

$X[k] = X(e^{j 2\pi k/N}) = \sum_{n=-\infty}^{\infty} x[n]\,e^{-j 2\pi kn/N}$

If $x[n]$ is finite-length, of length $N$ , only $n = 0, \ldots, N-1$ contributes. This gives the Discrete Fourier Transform (DFT):

$\boxed{\,X[k] = \sum_{n=0}^{N-1} x[n]\,e^{-j 2\pi kn/N}, \quad k = 0, 1, \ldots, N-1\,}$

with inverse

$x[n] = \frac{1}{N}\sum_{k=0}^{N-1} X[k]\,e^{j 2\pi kn/N}, \quad n = 0, 1, \ldots, N-1$

Both transform pairs are length- $N$ sequences; the DFT is a one-to-one mapping between two $N$ -vectors.

It is convenient to define the twiddle factor $W_N = e^{-j 2\pi/N}$ . Then

$X[k] = \sum_{n=0}^{N-1} x[n]\,W_N^{kn}$

The twiddle factors are $N$ th roots of unity living on the unit circle: $W_N^0, W_N^1, \ldots, W_N^{N-1}$ are evenly spaced points on the circle. We will see them again in the FFT.

2.2 DFT versus DTFT versus DFS, the cast of characters

A common confusion. Let us name names.

DTFT $X(e^{j\omega})$ : discrete in time, continuous in frequency, periodic in $\omega$ with period $2\pi$ . The "true" frequency-domain representation of a discrete-time signal. Mathematical, not directly computable.
DFS (Discrete Fourier Series): for a periodic discrete-time signal of period $N$ , this expands it into harmonics. Coefficients are exactly the DFT of one period.
DFT $X[k]$ : discrete in time, discrete in frequency. What you actually compute on a computer. Implicitly assumes the input is one period of a periodic signal; hence the circular wraparound that we are about to meet.

The DFT is what you call when you write numpy.fft.fft(x). The FFT is just a fast algorithm to compute the DFT.

2.3 Properties of the DFT, with their gotchas

Most of the DFT's properties mirror the DTFT's, but one mismatch matters and trips up everyone the first time: circularity.

Treat $x[n]$ as if it were defined for all $n$ by periodically repeating its $N$ -sample window: $\tilde{x}[n] = x[n \bmod N]$ . The DFT sees this periodic extension. A "shift" in time is not a linear shift but a circular shift: samples that fall off one end of the window come back around the other end.

Properties of the DFT:

Linearity: $a x_1[n] + b x_2[n] \leftrightarrow a X_1[k] + b X_2[k]$ .
Circular shift in time: $x[(n - n_0) \bmod N] \leftrightarrow X[k]\,e^{-j 2\pi k n_0/N}$ . The shift is circular, not linear.
Circular convolution: $\,(x_1 \circledast x_2)[n] \;\leftrightarrow\; X_1[k]\,X_2[k]$ . The DFT turns circular convolution into multiplication, not linear convolution. This is the gotcha.
Symmetry: for a real-valued $x[n]$ , $X[k] = X^*[N - k]$ (conjugate symmetry around $k = 0$ ). So real signals' DFTs have only $N/2 + 1$ unique values; the rest are mirror copies.
Parseval's theorem: $\sum_{n=0}^{N-1} |x[n]|^2 = (1/N) \sum_{k=0}^{N-1} |X[k]|^2$ . Energy in time equals energy in frequency, modulo the scale factor.

Why is circular convolution wrong for filtering? Imagine you want to filter audio $x[n]$ with FIR coefficients $h[n]$ . Real filtering is linear convolution: each sample of the input excites a copy of $h$ , the copies are shifted in time and added, no wraparound. The DFT, applied directly, gives circular convolution, which assumes the input wraps around: so the end of the signal contaminates the beginning. The fix, derived next, is to zero-pad before DFT.

2.4 Linear convolution via the DFT, with zero-padding

To compute linear convolution of $x[n]$ (length $L_x$ ) with $h[n]$ (length $L_h$ ) via the DFT:

The linear convolution has length $L_y = L_x + L_h - 1$ .
Zero-pad both $x$ and $h$ to length $N \geq L_y$ .
Compute $X[k]$ and $H[k]$ via length- $N$ DFTs.
Multiply pointwise: $Y[k] = X[k] H[k]$ .
Inverse DFT to get $y[n]$ .

The key fact: when $N \geq L_x + L_h - 1$ , the wraparound region of the circular convolution is zeroed out by the zero-padding, so circular convolution coincides with linear convolution over the relevant indices. Pick $N$ a power of two for FFT efficiency and you have the standard fast convolution.

2.5 Filtering long sequences: overlap-add and overlap-save

Now suppose $x[n]$ is a million samples long and $h[n]$ is a 64-tap filter. Computing one giant DFT of a million points works but is wasteful (and not real-time). The trick is to chunk the input.

Overlap-add (OLA):

Divide $x[n]$ into non-overlapping blocks of length $L$ (typically $L \sim 256$ or $1024$ ). For each block $x_m[n]$ of length $L$ :

Zero-pad $x_m$ and $h$ to length $N \geq L + M - 1$ where $M = L_h$ .
FFT each, multiply, IFFT. This gives the linear convolution $y_m$ of $x_m$ and $h$ , length $L + M - 1$ .
Add this $y_m$ into the output, starting at position $mL$ . Successive output blocks overlap in their first $M-1$ samples; those overlaps are summed.

The output is the correct linear convolution of the entire $x$ and $h$ , computed block by block.

Overlap-save (OLS):

Different bookkeeping, same final answer.

Divide $x[n]$ into overlapping blocks of length $N$ , each overlapping the previous by $M - 1$ samples. For each block:

FFT the length- $N$ block, multiply by $H[k]$ (a length- $N$ FFT of zero-padded $h$ ), IFFT. This gives a length- $N$ block.
Discard the first $M - 1$ samples (they are corrupted by circular wraparound from the previous block).
The remaining $N - M + 1$ samples are valid output, concatenated end-to-end.

Worked example. Filter a 10000-sample input with a 65-tap FIR. Choose $L = 192$ samples per OLA block, then $N = 256$ (next power of 2 above $192 + 65 - 1 = 256$ ). For each input block:

FFT of zero-padded length-256 block.
Multiply pointwise by precomputed $H[k]$ (length-256 FFT of zero-padded $h$ ).
IFFT gives length-256 output block.
Add into output buffer at offset 0, 192, 384, ...

Compare to direct convolution: $10000 \times 65 = 650{,}000$ multiplies. With OLA: each block is one length-256 FFT ( $\sim 256 \log_2 256 = 2048$ multiplies times 2 for FFT and IFFT), one pointwise multiply (256), so $\sim 4400$ per block. With $10000/192 \approx 53$ blocks, $\sim 233{,}000$ multiplies. Roughly $3\times$ speedup, growing rapidly with longer filters.

Real-time audio processing (effects pedals, room equalization, hearing aids, convolution reverbs) uses OLA or OLS with very small block sizes for low latency, often combined with partitioned convolution: split $h$ into segments, convolve each with the appropriate input chunk, and combine.

2.6 The FFT, why $O(N \log N)$ and not $O(N^2)$

A straightforward computation of the DFT, as defined, is $O(N^2)$ : for each of $N$ output bins, sum $N$ products. For $N = 1024$ , that is over a million complex multiplies. For $N = 65536$ (a small audio FFT), four billion. Not practical.

The Fast Fourier Transform, published by Cooley and Tukey in 1965 (rediscovered, in fact: Gauss had something similar around 1805), gets the same answer in $O(N \log N)$ operations. For $N = 1024$ , about ten thousand multiplies. A hundred-fold speedup. For $N = 65536$ , a million-fold speedup. The FFT is one of the most important algorithms ever discovered; it is what made real-time spectrum analysis, CT scanners, JPEG compression, OFDM modems, and modern radar/sonar feasible.

Matryoshka analogy. A length- $N$ DFT can be computed by combining two length- $N/2$ DFTs. Each of those length- $N/2$ DFTs can be computed by combining two length- $N/4$ DFTs. Recurse all the way down to length-1 DFTs, which are trivial. Like Russian nesting dolls, each FFT contains two smaller FFTs, those contain four still-smaller ones, and so on. The total work piles up to $N \log_2 N$ instead of $N^2$ because most of the multiplies in the $N^2$ direct method are the same multiplies done over and over; the FFT reuses them.

Here is the derivation. Split the sum by even and odd $n$ :

$X[k] = \sum_{n=0}^{N-1} x[n]\,W_N^{kn}$ $\;\;\;\;= \sum_{m=0}^{N/2-1} x[2m]\,W_N^{2mk} + \sum_{m=0}^{N/2-1} x[2m+1]\,W_N^{(2m+1)k}$ $\;\;\;\;= \sum_{m=0}^{N/2-1} x[2m]\,W_{N/2}^{mk} + W_N^{k}\sum_{m=0}^{N/2-1} x[2m+1]\,W_{N/2}^{mk}$

(used $W_N^2 = W_{N/2}$ ).

Define $X_e[k]$ as the length- $N/2$ DFT of the even-indexed samples and $X_o[k]$ as the length- $N/2$ DFT of the odd-indexed samples. Then:

$X[k] = X_e[k] + W_N^k\,X_o[k], \quad k = 0, \ldots, N/2 - 1$ $X[k + N/2] = X_e[k] - W_N^k\,X_o[k], \quad k = 0, \ldots, N/2 - 1$

(used the identity $W_{N/2}^{mk}$ has period $N/2$ in $k$ , and $W_N^{N/2} = -1$ ).

So a length- $N$ DFT decomposes into two length- $N/2$ DFTs and one combine step that costs $N/2$ complex multiplies and $N$ additions. Recurrence:

$T(N) = 2\,T(N/2) + N$

By the master theorem (or simply unrolling): $T(N) = O(N \log_2 N)$ . That is the magic.

Concretely, $T(N) = N \log_2 N$ complex multiplies (the simplest bookkeeping). For $N = 1024$ , that is $1024 \cdot 10 = 10{,}240$ . The direct DFT was $1024^2 \approx 10^6$ . Hundred-fold.

2.7 Radix-2 DIT FFT (decimation-in-time)

The decimation we just did, splitting input by even/odd, is decimation-in-time (DIT). Implemented stage by stage, it produces a beautiful butterfly structure. Let's draw the 4-point DIT FFT:

plaintext

   x[0] ────●──────────● X[0]
            │\        /│
            │ \  W₄⁰ / │
   x[2] ────●──╲────╱──● X[1]
            │   ╲  ╱   │
            │    ╲╱    │
   x[1] ────●────╲╱────● X[2]
            │ W₄⁰╲╱    │
            │   ╱  ╲W₄¹│
   x[3] ────●──╱────╲──● X[3]
                W₄²

Each "X" pair is a butterfly: two inputs $a, b$ go in, two outputs $a + W b$ and $a - W b$ come out. Two stages, two butterflies per stage. The input order is $x[0], x[2], x[1], x[3]$ , the bit-reversal of $0, 1, 2, 3$ ( $00, 10, 01, 11$ ). The output is in natural order.

For an 8-point FFT, three stages of butterflies, $4 + 4 + 4 = 12$ butterflies total, with twiddle factors $W_8^k$ in the last stage, $W_8^{2k} = W_4^k$ in the middle, and $W_8^{4k} = W_2^k = \pm 1$ in the first.

plaintext

8-point radix-2 DIT FFT, schematic:
 
Stage 1 (W₂):     Stage 2 (W₄):    Stage 3 (W₈):
x[0] ●──╮        ●──╮            ●──╮
        ⊕         ⊕               ⊕
x[4] ●──╯ W₂⁰    ●─╯ W₄⁰          ●─╯ W₈⁰
                 ●──╮              ●──╮
                  ⊕                 ⊕
x[2] ●──╮        ●─╯ W₄¹          ●─╯ W₈¹
        ⊕                          ●──╮
x[6] ●──╯ W₂⁰    ●──╮               ⊕
                 ⊕                 ●─╯ W₈²
x[1] ●──╮        ●─╯ W₄⁰
        ⊕                          ...
x[5] ●──╯ W₂⁰    ●──╮
                 ⊕
x[3] ●──╮        ●─╯ W₄¹
        ⊕
x[7] ●──╯ W₂⁰

The butterflies have a beautifully regular structure. Each stage has $N/2$ butterflies and $N/2$ multiplies. There are $\log_2 N$ stages. Total: $(N/2)\log_2 N$ complex multiplies. Even better than our naive count.

2.8 Radix-2 DIF FFT (decimation-in-frequency)

A different decomposition: split the output into even-indexed and odd-indexed bins. The algebra leads to splitting the input into first-half and second-half (rather than even-and-odd), with butterflies first and twiddles after. The signal flow graph is a mirror image of DIT: bit-reversal happens at the output this time, with input in natural order.

DIT and DIF have identical complexity and do the same job. Pick whichever you like for your implementation.

2.9 Inverse FFT

The IDFT differs from the DFT only in the sign of the exponent and the $1/N$ scaling:

$x[n] = \frac{1}{N} \sum_{k=0}^{N-1} X[k]\,W_N^{-kn}$

So you can reuse your FFT code with a sign flip on the twiddle factors. A clever alternative: take the conjugate of the input, do an FFT, take the conjugate again, divide by $N$ . Same answer.

In numpy, numpy.fft.ifft(X) does this for you.

2.10 Mixed-radix and split-radix FFTs

Radix-2 requires $N$ to be a power of two. Real-life FFT lengths sometimes are not (audio FFTs of 4410 samples for 100 ms at 44.1 kHz, etc.). Mixed-radix FFTs handle composite $N = N_1 N_2 \cdots$ by recursing with different radices at each level. Cooley-Tukey was originally formulated this way; radix-2 is the simplest special case.

For prime $N$ where mixed-radix does not apply, Rader's algorithm and Bluestein's chirp-z transform convert prime-length DFTs into convolutions, which can be done by power-of-two FFT and zero-padding. Modern FFT libraries (FFTW, MKL, cuFFT) use a portfolio of such algorithms, choosing automatically by length.

Split-radix FFT uses a clever 2/4 split (radix-2 for even-indexed outputs, radix-4 for odd) that achieves the lowest known multiplier count for power-of-two $N$ . About $\sim 4N \log_2 N - 6N$ real multiplies, slightly better than radix-2's count.

2.11 FFT, in code

Here is the FFT from scratch, the recursive radix-2 DIT, in numpy. Notice how the math from 2.6 maps to code line for line.

python

import numpy as np
 
def fft_radix2(x):
    """Recursive radix-2 DIT FFT. Input length must be a power of 2."""
    N = len(x)
    if N == 1:
        return x
    if N % 2 != 0:
        raise ValueError("FFT length must be a power of 2")
    # Recurse on even and odd subsequences
    Xe = fft_radix2(x[0::2])
    Xo = fft_radix2(x[1::2])
    # Twiddle factors
    k = np.arange(N // 2)
    W = np.exp(-2j * np.pi * k / N)
    # Combine
    X = np.empty(N, dtype=complex)
    X[:N//2] = Xe + W * Xo
    X[N//2:] = Xe - W * Xo
    return X
 
# Sanity check against numpy
x = np.random.randn(1024) + 1j * np.random.randn(1024)
X_ours = fft_radix2(x)
X_numpy = np.fft.fft(x)
print(np.allclose(X_ours, X_numpy))   # True

The recursion does the work; the explicit divide-and-conquer mirrors the matryoshka picture exactly. Real production FFTs unroll the recursion into iterative loops, use bit-reversal permutations, and exploit SIMD instructions, but the math is identical.

2.12 FFT applications

FFT is everywhere. A short tour, with hardware-security flavors at the end:

Spectrum analyzers. Every benchtop spectrum analyzer (or, in software, every FFT plot in Audacity or sox) is just a windowed-FFT of a captured time series. Real-time spectrum displays in radio scanners, RF test equipment, and audio spectrograms are FFTs every few milliseconds.
OFDM modulation. Wi-Fi (802.11 a/g/n/ac/ax/be), LTE, 5G, DVB-T, ADSL: all carry data on hundreds or thousands of parallel sub-carriers, each modulated with QAM. The trick: an OFDM symbol is generated by inverse-FFT of a vector of QAM constellation points (one per sub-carrier), and decoded by FFT at the receiver. Frequency-selective fading channels become a per-sub-carrier scaling that can be equalized one-tap-at-a-time. The whole OFDM revolution depended on the FFT being fast enough to compute every symbol time.
Image processing. JPEG uses the DCT (a Fourier relative); JPEG-2000 uses wavelets; medical imaging (MRI) reconstructs images by 2D inverse-FFT of measured k-space samples.
Audio effects. Convolution reverbs do partitioned-overlap-save FFT convolution to apply long impulse responses (Carnegie Hall, your favorite cathedral) to dry recordings.
Vibration analysis. Industrial machinery (motors, turbines, bearings) generates characteristic vibration spectra. Bearing wear, shaft imbalance, and gear-tooth defects each show up at predictable frequencies. Predictive-maintenance systems FFT a continuous accelerometer feed and watch for telltale spectral changes.
Radar. Pulse-Doppler radar uses FFT across multiple pulse returns to separate moving targets by velocity. Synthetic-aperture radar (SAR) uses 2D FFT to form images.
Side-channel attacks (DPA, DEMA). Cross-correlation of a captured power trace with a hypothesis trace is fastest in the frequency domain (multiply FFTs, inverse-FFT). Spectral analysis of leaked EM radiation reveals which clock harmonics carry which sub-circuit's switching, helping attackers locate the cryptographic core on a chip. We will return to this in Chapter 24.