A modern chip has a billion transistors. You cannot draw a billion-gate schematic. You write HDL (Hardware Description Language) code: textual descriptions of digital systems that synthesis tools turn into gate-level netlists.
There are two dominant HDLs.
- Verilog (and its modern superset SystemVerilog). Born at Gateway Design Automation, 1984. C-like syntax, terse, popular in commercial design (US, Asia).
- VHDL (VHSIC Hardware Description Language). Born under DoD VHSIC project, 1983. Pascal-like syntax, verbose, strongly typed, popular in defense and aerospace, more common in Europe.
Both compile to identical hardware. Choose based on team familiarity. SystemVerilog (the 2005 expansion of Verilog) is the modern industry default for new ASICs.
4.1 Why HDL beats schematics
Imagine writing a novel by drawing each word as a tiny picture. Possible, but absurd. HDL is to schematic editors what typing is to drawing: vastly more efficient at the relevant scale.
Specific benefits:
- Parameterized. A 32-bit adder is the same description as a 64-bit adder, parameterized by a bit width.
- Reusable. Modules are imported into bigger designs.
- Verifiable. Testbenches simulate the design before tape-out.
- Synthesizable. The same code runs in simulation and gets compiled to silicon.
4.2 The synthesis flow
The full ASIC/FPGA flow:
- Specification. What does the chip do?
- RTL coding. Write Verilog/VHDL describing the design.
- Simulation. Run testbenches in ModelSim, Verilator, GHDL, Vivado XSim, Cadence Xcelium. Find bugs.
- Synthesis. Run Synopsys Design Compiler, Cadence Genus, Xilinx Vivado, Intel Quartus, or Yosys (open source). Outputs a netlist of gates from a standard-cell library.
- Place and route. Assign each gate to a physical location and route wires. For ASICs: Cadence Innovus, Synopsys ICC2. For FPGAs: vendor tools or Nextpnr.
- Static timing analysis. Verify that the longest signal path fits inside one clock cycle for the target frequency.
- DRC/LVS (for ASICs). Layout vs schematic checks ensure the layout matches the netlist.
- Bitstream / GDSII. For FPGAs: produce a .bit file. For ASICs: produce GDSII layer data, send to fab.
- Test. Boot the silicon. Hope it works.
This flow has been the chip-design standard for thirty years. Every modern CPU, GPU, FPGA, mobile SoC, and microcontroller went through some version of it.
4.3 VHDL: structure of a module
A VHDL module has two parts: the entity (the interface) and the architecture (the implementation).
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
entity adder is
Port (
a : in STD_LOGIC_VECTOR(7 downto 0);
b : in STD_LOGIC_VECTOR(7 downto 0);
sum : out STD_LOGIC_VECTOR(8 downto 0)
);
end adder;
architecture Behavioral of adder is
begin
sum <= std_logic_vector(
resize(unsigned(a), 9) + resize(unsigned(b), 9)
);
end Behavioral;The entity declares ports. The architecture describes behavior. The <= is a signal assignment; the right side is the expression to assign.
4.4 VHDL data types
STD_LOGIC: a single wire. 9-valued logic. Values:'0','1','X'(unknown),'Z'(high-Z),'U'(uninitialized),'L'(weak 0),'H'(weak 1),'W'(weak unknown),'-'(don't care). Models real silicon faithfully.STD_LOGIC_VECTOR: an array of std_logic, typically a bus.BIT: a 2-valued logic (0 or 1). Less expressive than std_logic; rarely used.INTEGER: signed integer. Used for counters, indices, parameters.BOOLEAN: true/false. Used in conditions.SIGNED,UNSIGNED: arithmetic-friendly subtypes of std_logic_vector. Use these in numeric_std for arithmetic.
The strong typing is one of VHDL's selling points: you cannot accidentally connect an integer to a std_logic_vector. The compiler catches it before silicon does.
4.5 VHDL: signal vs variable
VHDL has both signals and variables. They look similar but behave differently.
- Signal. Declared at architecture or process level. Updates after the process completes. Models a wire or register. Assigned with
<=. - Variable. Declared inside a process. Updates immediately on assignment, like a software variable. Used for intermediate computation. Assigned with
:=.
Subtle but important. Consider:
process(clk)
variable temp : std_logic;
begin
if rising_edge(clk) then
temp := a xor b;
c <= temp;
d <= temp; -- d gets the NEW xor value
end if;
end process;vs:
signal temp : std_logic;
...
process(clk)
begin
if rising_edge(clk) then
temp <= a xor b; -- temp updates AFTER the process
c <= temp; -- c gets the OLD temp value
d <= temp;
end if;
end process;In the variable version, c and d get the new XOR value computed in this clock cycle. In the signal version, c and d get the value of temp from the previous cycle, because signal updates are deferred.
4.6 VHDL: process and concurrent statements
VHDL has two kinds of statements:
- Concurrent statements. Outside any process. Execute "in parallel," whenever any input changes. Model combinational logic naturally.
- Sequential statements (inside a
process). Execute in order, top to bottom, when any signal in the process's sensitivity list changes.
A process with sensitivity list (a, b) re-executes whenever a or b change. A clocked process has sensitivity list (clk, reset) and uses if rising_edge(clk) inside.
The canonical clocked-register process:
process(clk, reset)
begin
if reset = '1' then
q <= '0';
elsif rising_edge(clk) then
q <= d;
end if;
end process;Synthesizes to a D flip-flop with asynchronous active-high reset.
4.7 VHDL: behavioral, dataflow, structural styles
You can describe the same module three ways:
Behavioral: describe what it does, in algorithm form.
process(a, b)
begin
if a = b then
eq <= '1';
else
eq <= '0';
end if;
end process;Dataflow: describe what it does, in expression form.
eq <= '1' when a = b else '0';Structural: describe how it is built, in component instances.
xor_gate : entity work.xor2 port map(a => a, b => b, y => xor_out);
inv_gate : entity work.inv port map(a => xor_out, y => eq);Real designs mix all three. Most RTL is dataflow plus clocked processes.
4.8 Verilog comparison
The same 8-bit adder in Verilog:
module adder(
input [7:0] a,
input [7:0] b,
output [8:0] sum
);
assign sum = {1'b0, a} + {1'b0, b};
endmoduleA clocked register in Verilog:
always @(posedge clk or posedge reset) begin
if (reset)
q <= 1'b0;
else
q <= d;
endCompare with VHDL: shorter, more C-like, less ceremony. Verilog wire corresponds roughly to VHDL signal; reg (despite the misleading name) is just any signal driven from inside an always block. SystemVerilog's logic type unifies them.
VHDL's verbosity buys you stronger typing and earlier error detection. Verilog's terseness buys you faster typing and tools that scale to enormous codebases. Both are everywhere; learn whichever your team uses.
4.9 FSM design with HDL
Recall from Chapter 4 that an FSM has a current state, a next-state function, and an output function. The standard HDL idiom uses three processes:
- State register (clocked): updates current state on the clock edge.
- Next-state logic (combinational): computes next state from current state plus inputs.
- Output logic (combinational): produces outputs from current state (Moore) or current state plus inputs (Mealy).
VHDL example for a simple bus-controller FSM:
type state_t is (IDLE, ARB, READ, WRITE, DONE);
signal state, next_state : state_t;
-- 1. state register
process(clk, rst)
begin
if rst = '1' then
state <= IDLE;
elsif rising_edge(clk) then
state <= next_state;
end if;
end process;
-- 2. next-state logic
process(state, request, grant, ready)
begin
next_state <= state;
case state is
when IDLE => if request = '1' then next_state <= ARB; end if;
when ARB => if grant = '1' then next_state <= READ; end if;
when READ => if ready = '1' then next_state <= WRITE; end if;
when WRITE => if ready = '1' then next_state <= DONE; end if;
when DONE => next_state <= IDLE;
end case;
end process;
-- 3. output logic (Moore)
process(state)
begin
bus_req <= '0';
do_read <= '0';
do_write <= '0';
case state is
when ARB => bus_req <= '1';
when READ => bus_req <= '1'; do_read <= '1';
when WRITE => bus_req <= '1'; do_write <= '1';
when others => null;
end case;
end process;The same pattern in Verilog:
typedef enum logic [2:0] {IDLE, ARB, READ, WRITE, DONE} state_t;
state_t state, next_state;
always_ff @(posedge clk or posedge rst)
if (rst) state <= IDLE;
else state <= next_state;
always_comb begin
next_state = state;
case (state)
IDLE: if (request) next_state = ARB;
ARB: if (grant) next_state = READ;
READ: if (ready) next_state = WRITE;
WRITE: if (ready) next_state = DONE;
DONE: next_state = IDLE;
endcase
end
always_comb begin
bus_req = 1'b0;
do_read = 1'b0;
do_write = 1'b0;
case (state)
ARB: bus_req = 1'b1;
READ: begin bus_req = 1'b1; do_read = 1'b1; end
WRITE: begin bus_req = 1'b1; do_write = 1'b1; end
default: ;
endcase
end4.9.1 State encoding
The synthesis tool can pick how to encode states in flip-flops:
- Binary. flip-flops for states. Most flip-flop-efficient.
- Gray. Adjacent states differ by one bit. Reduces transition glitches.
- One-hot. flip-flops, exactly one high. More flip-flops but next-state logic is trivial: each state's flip-flop is the OR of incoming-transition conditions. Often optimal on FPGAs, where flip-flops are abundant but LUTs are precious.
Most synthesis tools auto-select based on FPGA architecture. Vivado's default for FSM with fewer than ~32 states is one-hot.
4.10 Testbenches
A testbench is a non-synthesizable HDL module that drives stimuli into your design and checks outputs. Testbenches use HDL features that have no hardware equivalent (file I/O, real arithmetic, infinite loops, wait statements) because they only run in simulation.
A bare-bones VHDL testbench:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
entity tb_adder is end tb_adder;
architecture sim of tb_adder is
signal a, b : std_logic_vector(7 downto 0) := (others => '0');
signal sum : std_logic_vector(8 downto 0);
begin
DUT: entity work.adder port map(a => a, b => b, sum => sum);
stimulus: process
begin
a <= "00000001"; b <= "00000010"; wait for 10 ns;
assert sum = "000000011" report "1+2 failed" severity failure;
a <= "11111111"; b <= "00000001"; wait for 10 ns;
assert sum = "100000000" report "carry failed" severity failure;
wait;
end process;
end sim;The same in Verilog:
module tb_adder;
reg [7:0] a, b;
wire [8:0] sum;
adder DUT(.a(a), .b(b), .sum(sum));
initial begin
a = 8'd1; b = 8'd2; #10;
if (sum !== 9'd3) $fatal("1+2 failed");
a = 8'd255; b = 8'd1; #10;
if (sum !== 9'd256) $fatal("carry failed");
$finish;
end
endmoduleModern testbenches use UVM (Universal Verification Methodology) for big chips: a SystemVerilog/SV class library with test sequences, scoreboards, coverage. Industrial verification teams write more lines of UVM than RTL.
4.11 Synthesis: the magic step
The synthesis tool ingests RTL plus a target library (a description of the gates available on your FPGA or ASIC process) and outputs a gate-level netlist that implements the same behavior. Modern synthesis tools are extraordinarily sophisticated: they pattern-match common idioms (counters, FIFOs, multipliers, FSMs) and replace them with optimized standard-cell implementations, retime registers across logic to balance pipeline stages, and squeeze gates out wherever possible.
Synthesizable subset. Not all HDL constructs become hardware. Verilog/VHDL constructs that do not synthesize:
wait for time(used in testbenches): wall-clock time has no meaning in hardware.- File I/O (
textio,$readmemh): writes/reads disk files, not synthesizable. - Real arithmetic (
real,timetypes): no native hardware support. - Infinite loops not bounded by clock: produce uncontrolled latches.
You write these in testbenches. You do not write them in the design under test.
A common synthesis bug: forgetting to assign a signal in every branch of an if/case. Synthesis interprets the missing assignment as "hold the previous value," and inserts a latch. Latches are evil — they are level-sensitive, hard to time, and indicate a design error. Always assign defaults at the top of a combinational process, then override them.
process(state, ready)
begin
enable <= '0'; -- default
error <= '0'; -- default
case state is
when ACTIVE => enable <= '1';
when FAULT => error <= '1';
when others => null;
end case;
end process;4.12 Putting it together: a small ALU
A 4-bit ALU in Verilog:
module alu4(
input [3:0] a,
input [3:0] b,
input [2:0] op,
output reg [4:0] result,
output reg zero
);
always @* begin
case (op)
3'b000: result = {1'b0, a} + {1'b0, b}; // ADD
3'b001: result = {1'b0, a} - {1'b0, b}; // SUB
3'b010: result = {1'b0, a & b}; // AND
3'b011: result = {1'b0, a | b}; // OR
3'b100: result = {1'b0, a ^ b}; // XOR
3'b101: result = {1'b0, ~a}; // NOT a
3'b110: result = {a[2:0], 1'b0}; // SHL
3'b111: result = {1'b0, a[3:1]}; // SHR
default: result = 5'b00000;
endcase
zero = (result == 5'b00000);
end
endmoduleEight operations, one always block. Synthesis builds this from a multiplexer feeding the result, with each case implementing a different combinational function. Modern tools pack the whole ALU into ~10 LUTs on a Xilinx 7-series FPGA.
4.13 Memory module example
A small dual-port BRAM in Verilog:
module dpram #(parameter ADDR_W=10, parameter DATA_W=8) (
input clk,
input we,
input [ADDR_W-1:0] waddr,
input [DATA_W-1:0] wdata,
input [ADDR_W-1:0] raddr,
output reg [DATA_W-1:0] rdata
);
reg [DATA_W-1:0] mem [0:(1<<ADDR_W)-1];
always @(posedge clk) begin
if (we) mem[waddr] <= wdata;
rdata <= mem[raddr];
end
endmoduleSynthesis recognizes the array-with-clocked-access pattern and infers a block RAM. Without further hints, Xilinx Vivado will map this to a 36 Kbit BRAM.