3. Discrete-Time Filter Architecture: Difference Equations to Block Diagrams // Digital Signal Processing // bhaswanth

Before we design filters, let us stop and look at what a digital filter actually is: an algorithm that processes a sample at a time, holding state in delay registers. The block diagram structures we choose matter for numerical accuracy, memory usage, and hardware mapping.

3.1 The basic operations: delay, multiply, add

Three building blocks suffice:

plaintext

   Delay (z⁻¹):          Gain:               Sum:
                                              ●
   x[n]  ┌─────┐         x[n]               ╱   ╲
   ──────│ z⁻¹ │──── x[n-1]  ─── × b ───   ●     ●─── y[n]
         └─────┘                            ╲   ╱
                                              ●

A delay ( $z^{-1}$ block) holds the previous sample. A gain multiplies by a coefficient. A sum adds two inputs. Every digital filter is a graph of these three blocks.

3.2 Direct Form I

Translate the difference equation literally into hardware:

$y[n] = \sum_{k=0}^{M} b_k\,x[n-k] - \sum_{k=1}^{N} a_k\,y[n-k]$

plaintext

            b₀
   x[n] ───●───────────────────●─────── y[n]
        │                       ▲
       z⁻¹                      │
        ▼  b₁                   │
   x[n-1]●────●───●             │
           ─a₁│   │             │
              ▼   ▼             │
              ●───●             │
              ▲   ▲             │
   ...        │   │
              z⁻¹ z⁻¹
              ▲   ▲
            y[n-1] x[n-1] etc.

(Schematic; real DF1 has separate feed-forward and feedback delay lines.)

DF1 has $M + N$ delay elements (one chain for inputs, one for outputs). Simple, intuitive, but uses more memory than necessary.

3.3 Direct Form II

Notice that the feed-forward and feedback parts can share their delay line. Move all the delays into a single chain, with the feedback computed first:

plaintext

   x[n] ──●─────────────────────●─── y[n]
          │                     │
          ●←─── -a₁ ──── b₀ ────●
          │  ●←─── -a₂ ─── b₁ ──●
          ▼  │                  │
          z⁻¹ ●─── b₂ ───────── ●
          │
          ▼
          z⁻¹
          │
          ...

Now there is only one chain of delays, with both the feedback ( $a_k$ ) and feed-forward ( $b_k$ ) coefficients tapping it. DF2 uses $\max(M, N)$ delays instead of $M + N$ : half the memory for a typical IIR. This is the "canonical" direct form.

3.4 Transposed Direct Form II

There is a graph-theoretic trick: take any signal-flow-graph implementation, reverse all the arrows, swap inputs with outputs, and the new graph computes the same transfer function. This is the transposition theorem.

Apply transposition to DF2 and you get the transposed direct form II:

plaintext

   x[n] ──●─── b₀ ──●──────────●─── y[n]
          │         ▲          │
          ●─── b₁ ──●          ▼
                    ▲          z⁻¹
          ●─── b₂ ──●          │
                    ▲          ▼ ●─── -a₁
                    z⁻¹        z⁻¹
                    │          │
                    ▼          ▼ ●─── -a₂
                    z⁻¹
                    │
                    ▼
                    ...

Transposed DF2 is often more numerically robust than DF2 because the multiply-by-coefficient happens before the accumulation, distributing rounding error more favorably. It is the default form in many DSP libraries and embedded implementations.

3.5 Cascade of biquads

For higher-order IIRs, neither direct form is numerically friendly: small coefficient errors in a high-order polynomial can shift poles dramatically (a phenomenon called coefficient sensitivity). Solution: factor the transfer function into a product of second-order biquads (one biquad per complex-conjugate pole pair) and implement each biquad in DF2 or transposed DF2. The biquads are connected in cascade.

A general biquad:

$H_i(z) = \frac{b_{0,i} + b_{1,i}\,z^{-1} + b_{2,i}\,z^{-2}}{1 + a_{1,i}\,z^{-1} + a_{2,i}\,z^{-2}}$

Order-8 IIR = four biquads in series. Each biquad has only five coefficients, and each pole pair is isolated. Coefficient quantization in one biquad does not move poles in another. This is the preferred IIR structure in production DSP.

3.6 Parallel form

Decompose $H(z)$ via partial fractions into a sum of biquads (rather than a product) and run them in parallel, summing outputs. Useful for some cases (e.g., when the impulse response has separable resonances), less common than cascade.

3.7 FIR direct form

For FIR filters, the same direct-form structure applies, but with no feedback. A length- $M$ FIR is just a tapped delay line:

plaintext

   x[n] ──●──z⁻¹──●──z⁻¹──●──z⁻¹──●─── ... 
          │       │       │       │
          ×b₀     ×b₁     ×b₂     ×b₃
          │       │       │       │
          └───────┴───────┴───────┘
                   │
                   ▼
                  y[n]

A length- $M$ FIR needs $M$ multiplies and $M-1$ adds per output sample. The "MAC" (multiply-accumulate) operation is the workhorse of digital filtering, and DSP processors are built around making MACs as fast as possible.

For symmetric FIR ( $h[n] = h[M-1-n]$ ), exploit symmetry: pair the symmetric taps, add the two corresponding samples, multiply once. Halves the multiply count.

3.8 Coefficient quantization effects

When you implement a filter on fixed-point hardware (say a 16-bit DSP), coefficients $b_k$ , $a_k$ are stored to finite precision. The actual implemented filter has coefficients that are quantized versions of the designed ones, and so it has slightly different poles and zeros.

Effects:

Magnitude response distortion: small ripples appear in passband or stopband.
Pole movement: in a high-order direct form, a tiny coefficient change can shift a pole significantly, sometimes outside the unit circle. Quantization can make a stable filter unstable.
Limit cycles: in IIR filters, rounding inside the recursion can sustain small oscillations even with zero input. Mitigated by using sufficient internal precision and noise-shaping.

The cure: cascaded biquads (low pole-sensitivity per section), wider internal accumulators (typically $2N$ -bit accumulator for $N$ -bit data), and saturating-arithmetic to handle overflow gracefully.