>
section 5 of 156 min read

5. Cryptographic Hardware: Speed, Isolation, and Randomness

5.1 AES in silicon

AES-NI (Intel/AMD) and ARMv8 Crypto Extensions add single-instruction encrypt-round and key-schedule-step primitives that run AES at 1-10 cycles per block. Compared to software AES on the same core (50-200 cycles per block), this is a 10-100x speedup, and crucially, the hardware does not use software lookup tables, defeating cache side channels on T-table-based implementations. The instruction set extensions also expose hardware mixing where the data-flow is constant-time by construction.

Embedded AES coprocessors are even more elaborate. ChipWhisperer-target Atmel XMEGA chips, ARM Cortex-M crypto blocks, and dedicated chips like Atmel ATECC608 implement AES-128/192/256 in hardware with claimed side-channel resistance. The "claimed" matters: these blocks have been broken repeatedly by sufficiently determined side-channel attackers. ChipWhisperer's training labs include real captures from real production chips.

5.2 RSA and ECC engines

Modular exponentiation of 2048-bit numbers in hardware uses Montgomery multiplication arrays, big-integer datapaths, and Karatsuba/Toom-Cook decomposition for the largest multiplies. Side-channel-protected RSA engines apply Bellcore-defending fault detection and Montgomery-ladder exponentiation that runs constant-time. ECC over P-256 or Curve25519 is preferred in modern designs because the operations are smaller, the constants are simpler, and the side-channel countermeasures are easier to audit.

5.3 Hardware true random number generators

Software RNGs are deterministic: same seed, same sequence. Cryptography needs unpredictable randomness, which means physical entropy. Hardware TRNGs harvest one of a few sources:

  • Ring-oscillator jitter. Two free-running ring oscillators have slightly different frequencies due to thermal and process variation; sampling one with the other extracts jitter as random bits.
  • Metastable flip-flops. A flip-flop driven into metastability resolves to 0 or 1 unpredictably; many cells in parallel give an entropy stream.
  • Thermal-noise resistor. Amplified Johnson noise from a resistor.
  • Photon shot noise. Reverse-biased photodiode counts random photon arrivals.
  • Quantum well or tunneling diode. Used in higher-end commercial TRNGs.

Intel's RDRAND and RDSEED instructions expose an on-die TRNG (originally Bull Mountain, now multiple generations). The output passes through a CTR_DRBG (deterministic post-processor) before reaching software. The whole stack is FIPS 140-2 certified, with continuous health tests that detect entropy loss and shut the source down.

Bad RNG = catastrophic. Sony PS3's ECDSA implementation reused the same nonce across signatures (the random k was a constant), which lets anyone with two signatures derive the private key directly. The console signing key leaked publicly in 2010, jailbreaking the entire platform. The fix is "always use cryptographically secure randomness for k", and modern RFC-6979 deterministic ECDSA derives k from the message hash so the failure mode is impossible.

5.4 PUFs: secrets from manufacturing variation

A Physically Unclonable Function uses unavoidable manufacturing-process variation as a source of stable per-chip randomness that is extractable by the chip itself but cannot be cloned even by the original manufacturer. The key is never stored anywhere; it is reconstructed on every boot.

plaintext
PUF challenge-response:
 
  +-------------+     challenge     +----------+
  |  Verifier   |  ---------------->|   PUF    |
  | (knows CRPs |                   |  silicon |
  |  enrolled   |  <----------------|          |
  |  earlier)   |     response      +----------+
  +-------------+
          ^
          |
   The verifier checks that response matches the enrolled response
   for that challenge, within an error-correction tolerance.

The leading PUF families:

  • SRAM PUF. At power-up, each SRAM cell settles to 0 or 1 based on tiny mismatches between the two cross-coupled inverters. The startup pattern is unique per chip, stable across power cycles up to noise. Used commercially by Intrinsic ID and licensed into many secure-element products.
  • Ring-oscillator PUF. Many ring oscillators on the same chip have slightly different free-running frequencies due to gate-delay variation. Comparing pairs gives random bits.
  • Arbiter PUF. Two symmetric paths through configurable multiplexers, racing a signal from input to output. The challenge selects which path-segments are used; the response is which path won the race. Compact but broken by machine-learning modeling attacks. Fed enough challenge-response pairs, an SVM or neural network learns the path delays and clones the PUF in software.

Three properties matter:

  1. Uniqueness. Two different chips give different responses for the same challenge. Hamming distance between their response strings should be near 50%.
  2. Reliability. The same chip gives the same response across temperature, voltage, and aging. Bit error rate around 1-15% at room temperature, addressed by error-correcting codes (BCH or Reed-Muller) plus helper data.
  3. Unclonability. No one (including the foundry) can produce another chip with the same response. Strong for SRAM PUFs, weakening for arbiter PUFs.

PUFs solve the secret-provisioning problem elegantly: the chip manufactures itself with the secret inside, with no need for OEM key injection or storage. The challenge is the integration: the helper data, ECC, fuzzy extractor, and key-derivation chain are subtle, and many real PUF deployments have been weakened by sloppy helper-data leakage or modeling-attack-vulnerable arbiter designs.

5.5 Smart cards and secure elements

Smart cards descend from the ISO/IEC 7816 standard for contact cards (the gold contact pad on credit cards) and ISO/IEC 14443 for contactless (NFC). The chip inside is a complete secure microcontroller: 8-bit core (more recently 32-bit ARM SC), ROM mask containing the operating system (JavaCard, MULTOS, or proprietary), EEPROM/flash for applications and keys, RAM for working state, and a hardware crypto block. Modern smart cards layer every defense we have discussed: dual-rail logic, masked AES, voltage and light sensors, active mesh, randomized clock, redundant computation, anti-rollback fuses, secure boot of the card OS, and ML-resistant PUF-based attestation.

Smart card targets in the wild:

  • SIM cards. Hold the IMSI and Ki (subscriber authentication key) for cellular networks. Karsten Nohl's 2013 attack on legacy DES SIMs let attackers send a malformed OTA SMS that revealed enough to clone the card.
  • EMV bank cards. Dynamic-data-authentication cards run public-key signing on every transaction. The Murdoch et al. 2010 "Chip and PIN is Broken" paper showed that a man-in-the-middle device could fool the terminal into accepting any PIN. Modern EMV has tightened.
  • Public-transit cards. Mifare Classic was infamously broken (next section); successor cards like Mifare DESFire and Calypso use AES-based protocols.
  • ePassports. Biometric data signed by the issuing country, stored on a contactless smart card. Privacy and clone-resistance properties have been studied extensively; passive-authentication-only passports leak data, active-authentication-equipped ones resist cloning.

5.6 HSMs

Hardware Security Modules are the heavy artillery of cryptographic infrastructure. They are tamper-responsive boxes (active mesh, pressure sensors, temperature monitors, battery-backed zeroize) that hold root keys for banks, certificate authorities, payment networks, and cloud KMS services. Vendors include Thales (formerly Gemalto, formerly nCipher), Utimaco, Atalla (Micro Focus), AWS CloudHSM, Google Cloud HSM, and Azure Dedicated HSM.

FIPS 140-2 defines four levels of physical assurance:

  • Level 1. Production-grade equipment, software roles. Basically just "uses approved algorithms".
  • Level 2. Tamper-evident. Stickers, seals.
  • Level 3. Tamper-resistant. Active sensors, automatic zeroization on intrusion, identity-based authentication for crypto officers.
  • Level 4. Tamper-active in the strongest sense. Resists environmental attacks (extreme temperature, voltage, irradiation) by detecting and zeroizing.

FIPS 140-3, the successor standard, aligns with ISO/IEC 19790 and adds modern requirements. Banks typically buy Level 3+ devices; national-security and root-CA roles require Level 4.