Week 3 — CPU Model: Fetch, Decode, Execute, and a Tiny Emulator
Overview
This week you build a tiny CPU emulator and, in doing so, internalize how a processor actually runs a program. The combinational blocks of Week 2 (adder, mux, decoder) plus registers and memory compose into a machine that repeats one loop forever: fetch the instruction at the program counter, decode it, execute it, repeat. Implementing that loop for a small instruction set — even 8 registers, 64 KB of memory, and ~8 instructions — makes the fetch/decode/execute cycle, instruction encoding, and the role of the program counter and flags concrete in a way no diagram can.
The emulator is the platform the next weeks compare against: real ARM64 assembly (Week 4) is this same model with a richer instruction set and ABI. Keeping it small is the point — small enough to understand every line, big enough to run Fibonacci and sum loops.
Readings
- ARM (Plantz) Ch. 8–9: memory and the CPU. Extract: registers, the program counter, the fetch/execute cycle, and buses.
- CA (Fox) Ch. 3: basic CPU-based architecture. Extract: the datapath and control unit.
- CA Ch. 6–8 skim: simple machines, digital CPU design, advanced CPU design. Extract: how the simple model scales to pipelining (preview for Week 10).
Key Concepts
The fetch/decode/execute cycle
The CPU’s core loop: fetch the instruction at the address in the program counter (PC), decode it into an operation + operands, execute it (ALU op, memory access, or branch), and update the PC. State lives in registers (fast, few) and memory (large, slower). This loop is the irreducible essence of a stored-program computer.
Instruction encoding
An instruction is just bits (Week 1): an opcode field plus operand fields (register indices, immediates, addresses). Designing your ISA’s encoding — fixed vs variable width, how many bits for opcode vs operands — teaches the tradeoffs real ISAs make. This is exactly the RISC/CISC split you’ll meet in Week 4: ARM64 (RISC) uses fixed 32-bit instructions, which makes fetch/decode trivial and uniform, while x86-64 (CISC) uses variable-length 1–15-byte instructions, which packs code densely but needs a far more complex decoder. Decoding is the inverse of encoding: extract fields by masking/shifting (Week 1 bit-fiddling) — simple for a fixed-width ISA, a small state machine for a variable-width one.
Registers, memory, flags, and control flow
A small register file, a flat memory, and a flags/status register (zero, negative, carry, overflow — set by the ALU, Week 2). Branches read the flags to decide whether to change the PC, which is how if/loops work at the machine level (Week 5 expands this). Load/store move data between registers and memory.
From simple to real
Your emulator is a single-cycle interpreter; real CPUs pipeline (overlap fetch/decode/execute of consecutive instructions), predict branches, and reorder — the advanced topics CA previews and Week 10 revisits as performance. Understanding the simple model first makes those optimizations legible rather than magical.
Theory Exercises
- Design an instruction encoding for ~8 instructions over 8 registers; justify your opcode/operand field widths.
- Hand-decode a few of your encoded instructions by masking/shifting (Week 1); show the extracted fields.
- Trace the fetch/decode/execute cycle for a 3-instruction program, showing PC, registers, and flags each step.
- Show how a conditional branch uses the flags register to implement an
if; write the flag conditions for==,<,<=. - Explain how pipelining overlaps the cycle for consecutive instructions and one hazard it introduces (preview Week 10).
Implementation
Build the emulator: a register file, flat memory, PC, flags, an instruction set (load/store, add/sub, compare, conditional/unconditional branch, halt), an encoder/assembler for your ISA, and the fetch/decode/execute loop. Run nontrivial programs (Fibonacci, sum-to-n, a simple loop). Add a single-step debug mode that prints state.
Measurement / Inspection
Single-step through Fibonacci and verify register/flag state matches a hand trace. Count instructions executed (a proxy for “cycles”) for several inputs and relate to algorithmic complexity. Inspect the encoded bytes of a small program with the Week 1 dump_bytes.
Expected baselines: programs produce correct results; the single-step trace matches your hand analysis exactly; instruction counts scale as the algorithm predicts (e.g. linear for sum-to-n). The encoded program is human-decodable with the Week 1 tool.
Connections
This model is the reference point for real ARM64 (Week 4) — same loop, richer ISA. Its ALU and flags are Week 2’s circuits; its encoding is Week 1’s bits. Pipelining/branch prediction previewed here return as performance topics in Week 10. Building an interpreter also foreshadows how higher-level languages are executed (Course 3’s tooling, Course 1’s frameworks).
Further Reading
- ARM (Plantz) Ch. 8–9; CA (Fox) Ch. 3, 6–8.
- Patterson & Hennessy, Computer Organization and Design (reference).
- nand2tetris (project-based CPU construction).