7. Interrupts and ISRs, Properly // Embedded Systems // bhaswanth

An interrupt is the chip's way of saying "drop what you are doing, something happened." Without interrupts, every event would have to be polled, wasting CPU and adding latency. With them, the CPU can sleep at micro-watts and wake instantly when an edge appears on a pin.

7.1 What happens on an interrupt

CPU finishes the current instruction.
State (PC, status register, sometimes scratch registers) is pushed to the stack. On Cortex-M this is a hardware-managed eight-register stack frame.
The PC is loaded from the vector table entry for that interrupt source.
The CPU executes the ISR.
The ISR returns (a special instruction). The hardware pops the state. Execution resumes.

The vector table is the array of ISR addresses, one per source. On Cortex-M it lives at the base of flash (or wherever VTOR points). When you use a peripheral, you write the function pointer for its ISR into the right slot, or, more commonly, the linker script does it.

void EXTI0_IRQHandler(void) {
    if (EXTI->PR & (1 << 0)) {
        EXTI->PR = (1 << 0);    // clear the pending bit
        // handle the event
        button_pressed_flag = 1;
    }
}

7.2 Sources

External pin (button, GPS PPS, sensor data-ready).
Timer overflow or compare match.
ADC conversion done.
UART byte received (or transmit empty).
SPI/I2C transfer complete.
DMA done or half-done.
USB packet received.
Brown-out, wakeup, fault (memory, bus).
Software interrupts (SVC on Cortex-M for syscalls; PendSV for context switches).

Each source has a number; the NVIC maps numbers to ISRs.

7.3 ISR best practices

Keep them short. Microseconds, not milliseconds. Long ISRs starve everything else.
No blocking calls. No printf, no delay(), no mutex_lock (for non-RTOS-aware mutex).
No malloc. Allocators may not be reentrant.
Use volatile for shared variables read in main and written in ISR; the compiler must not optimize them away.
For multi-byte shared state, use atomic access or mask interrupts briefly. A 32-bit write on Cortex-M is atomic; a 64-bit write is not.
Defer work to a task. Set a flag, push to a queue, give a semaphore. The main loop or RTOS task does the heavy lifting.
Acknowledge the interrupt source before returning, or it will fire again immediately.

Bad ISR:

void UART_RX_IRQHandler(void) {
    char c = UART->DR;
    printf("got %c\n", c);    // BAD: printf can take ms, blocks until TX empty
}

Good ISR:

void UART_RX_IRQHandler(void) {
    char c = UART->DR;
    ring_buffer_push(&rx_buf, c);
}

The main loop or task pulls bytes out of rx_buf at its leisure.

7.4 Multiple interrupts and priorities

Without priorities, all interrupts are equal: when one is being serviced, all others wait. With priorities, a higher-priority interrupt can preempt a lower one (nested interrupts).

ARM Cortex-M's NVIC supports up to 240+ external interrupts and 8-256 priority levels. You configure each:

NVIC_SetPriority(USART2_IRQn, 5);
NVIC_SetPriority(TIM2_IRQn,   2);   // higher priority
NVIC_EnableIRQ(USART2_IRQn);
NVIC_EnableIRQ(TIM2_IRQn);

Lower number = higher priority (counterintuitive at first). The TIM2 ISR will preempt the USART2 ISR but not vice versa. Cortex-M splits the priority into preempt and subpriority fields configured by NVIC_SetPriorityGrouping. Same preempt = no preemption, just ordering of pending events.

Anti-pattern: setting all peripheral ISRs to the same priority. You lose the benefits of NVIC. Set safety-critical things (motor over-current, fault, brake event) at the highest priority; UART RX at low; logging at lowest.

rendering diagram...

7.5 DMA: when the CPU steps aside

DMA (Direct Memory Access) is a co-processor that moves data from peripheral to memory or memory to memory without bothering the CPU. The CPU sets up a transfer (source, destination, length, mode) and lets it run. When done, an interrupt fires.

Why bother? Consider sampling 8 ADC channels at 100 kHz. Without DMA, the CPU takes a conversion-complete interrupt every 10 microseconds, reads ADC->DR, stores it. At, say, 500 ns per ISR entry, that is 5 % of CPU just for one peripheral. With DMA, the conversion-complete signal goes straight to the DMA controller, which writes to a memory buffer; the CPU is interrupted only when the buffer is full.

Modes:

Peripheral-to-memory. ADC, UART RX, SPI RX. Most common.
Memory-to-peripheral. UART TX, SPI TX, DAC waveform.
Memory-to-memory. Big copies, frame buffer fills.
Circular. Wrap around at end of buffer; useful for continuous sampling. Combined with half-transfer interrupts, you get double-buffered streaming with zero copies.
Scatter-gather. A list of descriptors lets the DMA chain transfers across non-contiguous memory regions. Used in Ethernet, USB, large image processing.

// STM32 ADC + DMA setup (skeleton)
DMA1_Stream0->PAR  = (uint32_t)&ADC1->DR;
DMA1_Stream0->M0AR = (uint32_t)adc_buffer;
DMA1_Stream0->NDTR = ADC_BUF_LEN;
DMA1_Stream0->CR   = DMA_SxCR_PL_1
                   | DMA_SxCR_MSIZE_0
                   | DMA_SxCR_PSIZE_0
                   | DMA_SxCR_MINC
                   | DMA_SxCR_CIRC
                   | DMA_SxCR_TCIE
                   | DMA_SxCR_EN;
 
ADC1->CR2 |= ADC_CR2_DMA | ADC_CR2_DDS;
ADC1->CR2 |= ADC_CR2_SWSTART;

DMA has subtle traps: cache coherency on Cortex-M7+, MPU regions, alignment. A buffer that crosses a 4 KB boundary on some ARM IPs causes the DMA to wrap wrongly. Read the reference manual.