>
section 4 of 912 min read

4. Hardware Description Languages: Verilog and VHDL

A modern chip has a billion transistors. You cannot draw a billion-gate schematic. You write HDL (Hardware Description Language) code: textual descriptions of digital systems that synthesis tools turn into gate-level netlists.

There are two dominant HDLs.

  • Verilog (and its modern superset SystemVerilog). Born at Gateway Design Automation, 1984. C-like syntax, terse, popular in commercial design (US, Asia).
  • VHDL (VHSIC Hardware Description Language). Born under DoD VHSIC project, 1983. Pascal-like syntax, verbose, strongly typed, popular in defense and aerospace, more common in Europe.

Both compile to identical hardware. Choose based on team familiarity. SystemVerilog (the 2005 expansion of Verilog) is the modern industry default for new ASICs.

4.1 Why HDL beats schematics

Imagine writing a novel by drawing each word as a tiny picture. Possible, but absurd. HDL is to schematic editors what typing is to drawing: vastly more efficient at the relevant scale.

Specific benefits:

  • Parameterized. A 32-bit adder is the same description as a 64-bit adder, parameterized by a bit width.
  • Reusable. Modules are imported into bigger designs.
  • Verifiable. Testbenches simulate the design before tape-out.
  • Synthesizable. The same code runs in simulation and gets compiled to silicon.

4.2 The synthesis flow

rendering diagram...

The full ASIC/FPGA flow:

  1. Specification. What does the chip do?
  2. RTL coding. Write Verilog/VHDL describing the design.
  3. Simulation. Run testbenches in ModelSim, Verilator, GHDL, Vivado XSim, Cadence Xcelium. Find bugs.
  4. Synthesis. Run Synopsys Design Compiler, Cadence Genus, Xilinx Vivado, Intel Quartus, or Yosys (open source). Outputs a netlist of gates from a standard-cell library.
  5. Place and route. Assign each gate to a physical location and route wires. For ASICs: Cadence Innovus, Synopsys ICC2. For FPGAs: vendor tools or Nextpnr.
  6. Static timing analysis. Verify that the longest signal path fits inside one clock cycle for the target frequency.
  7. DRC/LVS (for ASICs). Layout vs schematic checks ensure the layout matches the netlist.
  8. Bitstream / GDSII. For FPGAs: produce a .bit file. For ASICs: produce GDSII layer data, send to fab.
  9. Test. Boot the silicon. Hope it works.

This flow has been the chip-design standard for thirty years. Every modern CPU, GPU, FPGA, mobile SoC, and microcontroller went through some version of it.

4.3 VHDL: structure of a module

A VHDL module has two parts: the entity (the interface) and the architecture (the implementation).

vhdl
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
 
entity adder is
    Port (
        a    : in  STD_LOGIC_VECTOR(7 downto 0);
        b    : in  STD_LOGIC_VECTOR(7 downto 0);
        sum  : out STD_LOGIC_VECTOR(8 downto 0)
    );
end adder;
 
architecture Behavioral of adder is
begin
    sum <= std_logic_vector(
             resize(unsigned(a), 9) + resize(unsigned(b), 9)
           );
end Behavioral;

The entity declares ports. The architecture describes behavior. The <= is a signal assignment; the right side is the expression to assign.

4.4 VHDL data types

  • STD_LOGIC: a single wire. 9-valued logic. Values: '0', '1', 'X' (unknown), 'Z' (high-Z), 'U' (uninitialized), 'L' (weak 0), 'H' (weak 1), 'W' (weak unknown), '-' (don't care). Models real silicon faithfully.
  • STD_LOGIC_VECTOR: an array of std_logic, typically a bus.
  • BIT: a 2-valued logic (0 or 1). Less expressive than std_logic; rarely used.
  • INTEGER: signed integer. Used for counters, indices, parameters.
  • BOOLEAN: true/false. Used in conditions.
  • SIGNED, UNSIGNED: arithmetic-friendly subtypes of std_logic_vector. Use these in numeric_std for arithmetic.

The strong typing is one of VHDL's selling points: you cannot accidentally connect an integer to a std_logic_vector. The compiler catches it before silicon does.

4.5 VHDL: signal vs variable

VHDL has both signals and variables. They look similar but behave differently.

  • Signal. Declared at architecture or process level. Updates after the process completes. Models a wire or register. Assigned with <=.
  • Variable. Declared inside a process. Updates immediately on assignment, like a software variable. Used for intermediate computation. Assigned with :=.

Subtle but important. Consider:

vhdl
process(clk)
    variable temp : std_logic;
begin
    if rising_edge(clk) then
        temp := a xor b;
        c <= temp;
        d <= temp;       -- d gets the NEW xor value
    end if;
end process;

vs:

vhdl
signal temp : std_logic;
...
process(clk)
begin
    if rising_edge(clk) then
        temp <= a xor b;  -- temp updates AFTER the process
        c <= temp;        -- c gets the OLD temp value
        d <= temp;
    end if;
end process;

In the variable version, c and d get the new XOR value computed in this clock cycle. In the signal version, c and d get the value of temp from the previous cycle, because signal updates are deferred.

4.6 VHDL: process and concurrent statements

VHDL has two kinds of statements:

  • Concurrent statements. Outside any process. Execute "in parallel," whenever any input changes. Model combinational logic naturally.
  • Sequential statements (inside a process). Execute in order, top to bottom, when any signal in the process's sensitivity list changes.

A process with sensitivity list (a, b) re-executes whenever a or b change. A clocked process has sensitivity list (clk, reset) and uses if rising_edge(clk) inside.

The canonical clocked-register process:

vhdl
process(clk, reset)
begin
    if reset = '1' then
        q <= '0';
    elsif rising_edge(clk) then
        q <= d;
    end if;
end process;

Synthesizes to a D flip-flop with asynchronous active-high reset.

4.7 VHDL: behavioral, dataflow, structural styles

You can describe the same module three ways:

Behavioral: describe what it does, in algorithm form.

vhdl
process(a, b)
begin
    if a = b then
        eq <= '1';
    else
        eq <= '0';
    end if;
end process;

Dataflow: describe what it does, in expression form.

vhdl
eq <= '1' when a = b else '0';

Structural: describe how it is built, in component instances.

vhdl
xor_gate : entity work.xor2 port map(a => a, b => b, y => xor_out);
inv_gate : entity work.inv  port map(a => xor_out, y => eq);

Real designs mix all three. Most RTL is dataflow plus clocked processes.

4.8 Verilog comparison

The same 8-bit adder in Verilog:

verilog
module adder(
    input  [7:0] a,
    input  [7:0] b,
    output [8:0] sum
);
    assign sum = {1'b0, a} + {1'b0, b};
endmodule

A clocked register in Verilog:

verilog
always @(posedge clk or posedge reset) begin
    if (reset)
        q <= 1'b0;
    else
        q <= d;
end

Compare with VHDL: shorter, more C-like, less ceremony. Verilog wire corresponds roughly to VHDL signal; reg (despite the misleading name) is just any signal driven from inside an always block. SystemVerilog's logic type unifies them.

VHDL's verbosity buys you stronger typing and earlier error detection. Verilog's terseness buys you faster typing and tools that scale to enormous codebases. Both are everywhere; learn whichever your team uses.

4.9 FSM design with HDL

Recall from Chapter 4 that an FSM has a current state, a next-state function, and an output function. The standard HDL idiom uses three processes:

  1. State register (clocked): updates current state on the clock edge.
  2. Next-state logic (combinational): computes next state from current state plus inputs.
  3. Output logic (combinational): produces outputs from current state (Moore) or current state plus inputs (Mealy).

VHDL example for a simple bus-controller FSM:

vhdl
type state_t is (IDLE, ARB, READ, WRITE, DONE);
signal state, next_state : state_t;
 
-- 1. state register
process(clk, rst)
begin
    if rst = '1' then
        state <= IDLE;
    elsif rising_edge(clk) then
        state <= next_state;
    end if;
end process;
 
-- 2. next-state logic
process(state, request, grant, ready)
begin
    next_state <= state;
    case state is
        when IDLE  => if request = '1' then next_state <= ARB; end if;
        when ARB   => if grant   = '1' then next_state <= READ; end if;
        when READ  => if ready   = '1' then next_state <= WRITE; end if;
        when WRITE => if ready   = '1' then next_state <= DONE; end if;
        when DONE  => next_state <= IDLE;
    end case;
end process;
 
-- 3. output logic (Moore)
process(state)
begin
    bus_req  <= '0';
    do_read  <= '0';
    do_write <= '0';
    case state is
        when ARB   => bus_req  <= '1';
        when READ  => bus_req  <= '1'; do_read  <= '1';
        when WRITE => bus_req  <= '1'; do_write <= '1';
        when others => null;
    end case;
end process;

The same pattern in Verilog:

verilog
typedef enum logic [2:0] {IDLE, ARB, READ, WRITE, DONE} state_t;
state_t state, next_state;
 
always_ff @(posedge clk or posedge rst)
    if (rst) state <= IDLE;
    else     state <= next_state;
 
always_comb begin
    next_state = state;
    case (state)
        IDLE:  if (request) next_state = ARB;
        ARB:   if (grant)   next_state = READ;
        READ:  if (ready)   next_state = WRITE;
        WRITE: if (ready)   next_state = DONE;
        DONE:  next_state = IDLE;
    endcase
end
 
always_comb begin
    bus_req  = 1'b0;
    do_read  = 1'b0;
    do_write = 1'b0;
    case (state)
        ARB:   bus_req  = 1'b1;
        READ:  begin bus_req = 1'b1; do_read = 1'b1; end
        WRITE: begin bus_req = 1'b1; do_write = 1'b1; end
        default: ;
    endcase
end

4.9.1 State encoding

The synthesis tool can pick how to encode states in flip-flops:

  • Binary. log2N\lceil\log_2 N\rceil flip-flops for NN states. Most flip-flop-efficient.
  • Gray. Adjacent states differ by one bit. Reduces transition glitches.
  • One-hot. NN flip-flops, exactly one high. More flip-flops but next-state logic is trivial: each state's flip-flop is the OR of incoming-transition conditions. Often optimal on FPGAs, where flip-flops are abundant but LUTs are precious.

Most synthesis tools auto-select based on FPGA architecture. Vivado's default for FSM with fewer than ~32 states is one-hot.

4.10 Testbenches

A testbench is a non-synthesizable HDL module that drives stimuli into your design and checks outputs. Testbenches use HDL features that have no hardware equivalent (file I/O, real arithmetic, infinite loops, wait statements) because they only run in simulation.

A bare-bones VHDL testbench:

vhdl
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
 
entity tb_adder is end tb_adder;
 
architecture sim of tb_adder is
    signal a, b   : std_logic_vector(7 downto 0) := (others => '0');
    signal sum    : std_logic_vector(8 downto 0);
begin
    DUT: entity work.adder port map(a => a, b => b, sum => sum);
 
    stimulus: process
    begin
        a <= "00000001"; b <= "00000010"; wait for 10 ns;
        assert sum = "000000011" report "1+2 failed" severity failure;
        
        a <= "11111111"; b <= "00000001"; wait for 10 ns;
        assert sum = "100000000" report "carry failed" severity failure;
        
        wait;
    end process;
end sim;

The same in Verilog:

verilog
module tb_adder;
    reg  [7:0] a, b;
    wire [8:0] sum;
    adder DUT(.a(a), .b(b), .sum(sum));
 
    initial begin
        a = 8'd1; b = 8'd2; #10;
        if (sum !== 9'd3) $fatal("1+2 failed");
 
        a = 8'd255; b = 8'd1; #10;
        if (sum !== 9'd256) $fatal("carry failed");
 
        $finish;
    end
endmodule

Modern testbenches use UVM (Universal Verification Methodology) for big chips: a SystemVerilog/SV class library with test sequences, scoreboards, coverage. Industrial verification teams write more lines of UVM than RTL.

4.11 Synthesis: the magic step

The synthesis tool ingests RTL plus a target library (a description of the gates available on your FPGA or ASIC process) and outputs a gate-level netlist that implements the same behavior. Modern synthesis tools are extraordinarily sophisticated: they pattern-match common idioms (counters, FIFOs, multipliers, FSMs) and replace them with optimized standard-cell implementations, retime registers across logic to balance pipeline stages, and squeeze gates out wherever possible.

Synthesizable subset. Not all HDL constructs become hardware. Verilog/VHDL constructs that do not synthesize:

  • wait for time (used in testbenches): wall-clock time has no meaning in hardware.
  • File I/O (textio, $readmemh): writes/reads disk files, not synthesizable.
  • Real arithmetic (real, time types): no native hardware support.
  • Infinite loops not bounded by clock: produce uncontrolled latches.

You write these in testbenches. You do not write them in the design under test.

A common synthesis bug: forgetting to assign a signal in every branch of an if/case. Synthesis interprets the missing assignment as "hold the previous value," and inserts a latch. Latches are evil — they are level-sensitive, hard to time, and indicate a design error. Always assign defaults at the top of a combinational process, then override them.

vhdl
process(state, ready)
begin
    enable  <= '0';   -- default
    error   <= '0';   -- default
    case state is
        when ACTIVE => enable <= '1';
        when FAULT  => error  <= '1';
        when others => null;
    end case;
end process;

4.12 Putting it together: a small ALU

A 4-bit ALU in Verilog:

verilog
module alu4(
    input  [3:0] a,
    input  [3:0] b,
    input  [2:0] op,
    output reg [4:0] result,
    output reg       zero
);
    always @* begin
        case (op)
            3'b000: result = {1'b0, a} + {1'b0, b};   // ADD
            3'b001: result = {1'b0, a} - {1'b0, b};   // SUB
            3'b010: result = {1'b0, a & b};           // AND
            3'b011: result = {1'b0, a | b};           // OR
            3'b100: result = {1'b0, a ^ b};           // XOR
            3'b101: result = {1'b0, ~a};              // NOT a
            3'b110: result = {a[2:0], 1'b0};          // SHL
            3'b111: result = {1'b0, a[3:1]};          // SHR
            default: result = 5'b00000;
        endcase
        zero = (result == 5'b00000);
    end
endmodule

Eight operations, one always block. Synthesis builds this from a multiplexer feeding the result, with each case implementing a different combinational function. Modern tools pack the whole ALU into ~10 LUTs on a Xilinx 7-series FPGA.

4.13 Memory module example

A small dual-port BRAM in Verilog:

verilog
module dpram #(parameter ADDR_W=10, parameter DATA_W=8) (
    input                clk,
    input                we,
    input  [ADDR_W-1:0]  waddr,
    input  [DATA_W-1:0]  wdata,
    input  [ADDR_W-1:0]  raddr,
    output reg [DATA_W-1:0] rdata
);
    reg [DATA_W-1:0] mem [0:(1<<ADDR_W)-1];
 
    always @(posedge clk) begin
        if (we) mem[waddr] <= wdata;
        rdata <= mem[raddr];
    end
endmodule

Synthesis recognizes the array-with-clocked-access pattern and infers a block RAM. Without further hints, Xilinx Vivado will map this to a 36 Kbit BRAM.