Week 3 — CPU Model: Fetch, Decode, Execute, and a Tiny Emulator

Course 4 syllabus

Overview

This week you build a tiny CPU emulator and, in doing so, internalize how a processor actually runs a program. The combinational blocks of Week 2 (adder, mux, decoder) plus registers and memory compose into a machine that repeats one loop forever: fetch the instruction at the program counter, decode it, execute it, repeat. Implementing that loop for a small instruction set — even 8 registers, 64 KB of memory, and ~8 instructions — makes the fetch/decode/execute cycle, instruction encoding, and the role of the program counter and flags concrete in a way no diagram can.

The emulator is the platform the next weeks compare against: real ARM64 assembly (Week 4) is this same model with a richer instruction set and ABI. Keeping it small is the point — small enough to understand every line, big enough to run Fibonacci and sum loops.

Readings

  • ARM (Plantz) Ch. 8–9: memory and the CPU. Extract: registers, the program counter, the fetch/execute cycle, and buses.
  • CA (Fox) Ch. 3: basic CPU-based architecture. Extract: the datapath and control unit.
  • CA Ch. 6–8 skim: simple machines, digital CPU design, advanced CPU design. Extract: how the simple model scales to pipelining (preview for Week 10).

Key Concepts

The fetch/decode/execute cycle

The CPU’s core loop: fetch the instruction at the address in the program counter (PC), decode it into an operation + operands, execute it (ALU op, memory access, or branch), and update the PC. State lives in registers (fast, few) and memory (large, slower). This loop is the irreducible essence of a stored-program computer.

Instruction encoding

An instruction is just bits (Week 1): an opcode field plus operand fields (register indices, immediates, addresses). Designing your ISA’s encoding — fixed vs variable width, how many bits for opcode vs operands — teaches the tradeoffs real ISAs make. This is exactly the RISC/CISC split you’ll meet in Week 4: ARM64 (RISC) uses fixed 32-bit instructions, which makes fetch/decode trivial and uniform, while x86-64 (CISC) uses variable-length 1–15-byte instructions, which packs code densely but needs a far more complex decoder. Decoding is the inverse of encoding: extract fields by masking/shifting (Week 1 bit-fiddling) — simple for a fixed-width ISA, a small state machine for a variable-width one.

Registers, memory, flags, and control flow

A small register file, a flat memory, and a flags/status register (zero, negative, carry, overflow — set by the ALU, Week 2). Branches read the flags to decide whether to change the PC, which is how if/loops work at the machine level (Week 5 expands this). Load/store move data between registers and memory.

From simple to real

Your emulator is a single-cycle interpreter; real CPUs pipeline (overlap fetch/decode/execute of consecutive instructions), predict branches, and reorder — the advanced topics CA previews and Week 10 revisits as performance. Understanding the simple model first makes those optimizations legible rather than magical.

Theory Exercises

  1. Design an instruction encoding for ~8 instructions over 8 registers; justify your opcode/operand field widths.
  2. Hand-decode a few of your encoded instructions by masking/shifting (Week 1); show the extracted fields.
  3. Trace the fetch/decode/execute cycle for a 3-instruction program, showing PC, registers, and flags each step.
  4. Show how a conditional branch uses the flags register to implement an if; write the flag conditions for ==, <, <=.
  5. Explain how pipelining overlaps the cycle for consecutive instructions and one hazard it introduces (preview Week 10).

Implementation

Build the emulator: a register file, flat memory, PC, flags, an instruction set (load/store, add/sub, compare, conditional/unconditional branch, halt), an encoder/assembler for your ISA, and the fetch/decode/execute loop. Run nontrivial programs (Fibonacci, sum-to-n, a simple loop). Add a single-step debug mode that prints state.

Measurement / Inspection

Single-step through Fibonacci and verify register/flag state matches a hand trace. Count instructions executed (a proxy for “cycles”) for several inputs and relate to algorithmic complexity. Inspect the encoded bytes of a small program with the Week 1 dump_bytes.

Expected baselines: programs produce correct results; the single-step trace matches your hand analysis exactly; instruction counts scale as the algorithm predicts (e.g. linear for sum-to-n). The encoded program is human-decodable with the Week 1 tool.

Connections

This model is the reference point for real ARM64 (Week 4) — same loop, richer ISA. Its ALU and flags are Week 2’s circuits; its encoding is Week 1’s bits. Pipelining/branch prediction previewed here return as performance topics in Week 10. Building an interpreter also foreshadows how higher-level languages are executed (Course 3’s tooling, Course 1’s frameworks).

Further Reading

  • ARM (Plantz) Ch. 8–9; CA (Fox) Ch. 3, 6–8.
  • Patterson & Hennessy, Computer Organization and Design (reference).
  • nand2tetris (project-based CPU construction).