Building a Z80 Emulator from Scratch: Step-by-Step TutorialThe Zilog Z80 is a classic 8-bit CPU that powered many microcomputers and game consoles in the late 1970s and 1980s. Building an emulator for the Z80 is an excellent way to learn about CPU architecture, instruction sets, memory management, and low-level debugging. This tutorial walks you through creating a clean, accurate, and reasonably fast Z80 emulator from scratch in a modern language (examples use pseudocode and C-like snippets). It covers design, instruction decoding, timing, memory, I/O, interrupts, and testing — with practical tips to avoid common pitfalls.
Overview and goals
An emulator translates the behavior of original hardware into software. For a Z80 emulator the main goals are:
- Correctness: instructions should produce the same observable results (register state, memory, I/O, flags) as the original CPU.
- Completeness: support the full instruction set (including undocumented opcodes if desired), register pairs, interrupts, and addressing modes.
- Timing accuracy: optionally match real-cycle timing for software that relies on it (games, demos).
- Performance: be fast enough for real-time use; often this means tens to hundreds of megahertz equivalent in host CPU cycles.
This guide builds a cycle-accurate, feature-complete emulator with a clear structure you can implement in C, C++, Rust, Go, or even Python for learning.
Prerequisites
- Familiarity with programming in a systems language (C/C++/Rust recommended).
- Basic understanding of CPU concepts: registers, flags, program counter (PC), stack pointer (SP), memory, interrupts.
- Z80 reference material (instruction set description, timing charts). Useful references: Z80 CPU User Manual, official opcode tables, and community-maintained opcode lists.
High-level design
A Z80 emulator typically contains these subsystems:
- CPU core: registers, flags, instruction decoder, ALU operations.
- Memory subsystem: read/write handlers, RAM/ROM mapping, optional memory-mapped I/O.
- I/O subsystem: port reads/writes.
- Interrupt controller: handling IRQ/NMI and the three Z80 interrupt modes (IM0/IM1/IM2).
- Timing and cycle accounting: track machine cycles for instructions, memory, and I/O.
- Debugging hooks: logging, breakpoints, single-step.
- Optional peripherals: video controllers, sound chips, timers — depending on target machine.
Start by implementing a minimal core (fetch-decode-execute loop) and incremental features: registers ➜ instructions ➜ memory ➜ interrupts ➜ timing.
CPU state and core structures
Define a compact CPU state. Important Z80 registers:
- 8-bit: A, F, B, C, D, E, H, L
- 16-bit pairs: AF, BC, DE, HL
- Alternate register set: A’, F’, B’, C’, D’, E’, H’, L’
- Index registers: IX, IY (16-bit)
- Stack Pointer: SP (16-bit)
- Program Counter: PC (16-bit)
- Interrupt vector register: I (8-bit)
- Refresh register: R (8-bit)
- Interrupt flip-flops: IFF1, IFF2
- Interrupt mode: IM (0,1,2)
Suggested structure in C-like pseudocode:
typedef struct { uint8_t A, F; uint8_t B, C, D, E, H, L; uint8_t A2, F2; uint8_t B2, C2, D2, E2, H2, L2; uint16_t IX, IY; uint16_t SP, PC; uint8_t I, R; bool IFF1, IFF2; uint8_t IM; // 0,1,2 uint64_t cycles; // cycle counter } Z80;
Implement convenient helpers for 16-bit pairs:
uint16_t getBC(Z80 *z) { return (z->B << 8) | z->C; } void setBC(Z80 *z, uint16_t v) { z->B = (v >> 8); z->C = v & 0xFF; }
Keep the flag register F bit positions consistent with documentation (S, Z, -, H, -, P/V, N, C). Provide helpers to set/clear/test flags.
Memory and I/O model
Implement a memory map with read/write callbacks for different regions (ROM, RAM, memory-mapped I/O). A simple memory model:
uint8_t memory[65536]; uint8_t mem_read(uint16_t addr) { return memory[addr]; } void mem_write(uint16_t addr, uint8_t val) { memory[addr] = val; }
For emulating hardware that maps I/O or bank-switching, allow handlers:
typedef uint8_t (*mem_read_cb)(uint16_t addr, void *ctx); typedef void (*mem_write_cb)(uint16_t addr, uint8_t val, void *ctx);
Z80 I/O is separate 8-bit port space (0..255) accessed via IN/OUT instructions. Implement port read/write callbacks:
uint8_t port_read(uint8_t port); void port_write(uint8_t port, uint8_t value);
Fetch-decode-execute loop
At the simplest, the main loop repeatedly fetches a byte at PC, decodes the opcode, executes it (modifying registers, memory, flags), and advances PC and cycles.
Key points:
- Some Z80 instructions are multi-byte (prefixes 0xCB, 0xDD, 0xED, 0xFD). Handle prefixed opcodes by reading additional opcode bytes and dispatching to specialized handlers.
- Update the R (refresh) register on each opcode fetch: R increments low 7 bits per instruction fetch; high bit of R comes from I on interrupt vector loads; exact details in Z80 manual.
- Many instructions affect cycles differently depending on addressing mode; track cycles carefully.
Simple loop:
while(running) { uint8_t opcode = mem_read(pc++); z->R = (z->R + 1) & 0x7F | (z->R & 0x80); // approximate switch(opcode) { case 0x00: // NOP z->cycles += 4; break; case 0x3E: // LD A,n z->A = mem_read(pc++); z->cycles += 7; break; // ... other opcodes ... default: handle_prefix_or_unimplemented(opcode); } }
For maintainability, organize opcode handlers:
- Base table for 0x00..0xFF
- CB-table (bit/rotate ops)
- ED-table (extended ops)
- DD/FD prefix handling to target IX/IY-based instructions
A dispatch table using function pointers can speed up decoding.
Implementing instructions
Tackle instructions in logical groups:
- Data transfer: LD r,r’; LD r,(HL); LD (HL),r; LD A,(nn); LD (nn),A
- 16-bit loads and arithmetic: LD HL,nn; ADD HL,BC; INC/DEC on 16-bit regs
- ALU 8-bit: ADD, ADC, SUB, SBC, AND, OR, XOR, CP — implement flags precisely
- Rotation and shifts: RLCA, RLA, RRCA, RRA, and CB-prefixed rotates
- Bit operations: BIT, SET, RES (CB-prefixed)
- Jumps and calls: JP, JR, CALL, RET — include conditional variants
- Stack operations: PUSH, POP, EX (AF, AF’), EXX
- I/O instructions: IN A,(n), OUT (n),A — use port callbacks
- Interrupts and special registers: EI, DI, IM instructions, R and I behavior
Implementing flags correctly is the trickiest part — many instructions set half-carry and parity flags with subtle rules. Use reference algorithms (or existing implementations) for consistent behavior. For example, half-carry for addition uses:
H = ((a & 0x0F) + (b & 0x0F) + carry_in) & 0x10
Parity can be computed with a precomputed table for 0..255.
Provide unit tests for each instruction group — compare against known test ROMs (see Testing section).
Opcode dispatch strategy
Options:
- Big switch statement (simple, clear, easy to debug).
- Lookup table of 256 function pointers for the main opcodes, plus separate tables for CB/ED/DD/FD. This is faster and cleaner once handlers are built.
- JIT (advanced): translate hot opcode sequences to native host code — much faster but complex.
Start with switch or lookup tables. Example dispatch table skeleton:
typedef void (*opcode_fn)(Z80* z); opcode_fn table[256]; void init_table() { table[0x00] = op_NOP; table[0x3E] = op_LD_A_n; // ... }
For prefixed opcodes, the handler reads the next byte and indexes into the relevant subtable.
Timing and cycle accuracy
If your target software depends on timing, implement cycle accounting:
- Every instruction adds its base T-states (machine cycles) to z->cycles.
- Memory reads/writes and I/O also consume cycles; reflect any extra timing for prefixed or indexed addressing.
- For peripherals (video, sound) that rely on CPU timing, run them in lockstep with the CPU by advancing their internal counters by the number of cycles executed since last update.
Cycle-accurate emulation is more complex but necessary for many games and demos. A looser approach is “approximate timing” where only instruction-level cycles are counted and peripheral timing is driven by wall-clock scaled to CPU speed.
Interrupts
Z80 supports non-maskable interrupt (NMI) and maskable IRQ with three modes (IM0, IM1, IM2). Key behaviors:
- EI enables interrupts after the next instruction; DI disables immediately.
- IFF1 and IFF2 track whether interrupts are enabled and save state on NMI.
- IM0: external device places an instruction on the data bus (often 0xFF or RST vectors).
- IM1: CPU executes RST 0x38 on IRQ.
- IM2: CPU forms vector from I register (high byte) and a supplied low byte on the data bus, then jumps to that address.
Emulate these behaviors:
- Provide functions to assert NMI and IRQ lines.
- On interrupt, push PC, modify IFFs, set PC appropriately, and deduct cycles for the interrupt sequence.
- For IM2, combine I and supplied vector low byte to form the jump address.
Edge cases like interrupt during HALT must be handled: HALT leaves PC unchanged and repeatedly executes NOP-like behavior until interrupt is serviced.
Handling IX/IY and DD/FD prefixes
DD (0xDD) and FD (0xFD) prefixes modify many instructions to operate on IX or IY instead of HL. Implementation patterns:
- When a prefix is detected, read the next opcode. Many opcodes map directly with HL -> IX or IY substitution.
- Some opcodes form sequences like DD CB d op where a displacement byte d follows and refers to (IX+d) memory operations. This requires reading three bytes (prefix, CB, displacement) then the CB-subop.
- Be careful with combined prefixes: DD FD should be treated specially (they cancel or are invalid depending on sequence).
Implement helper functions that accept a base register pointer for HL vs IX/IY to avoid duplicating handlers.
Debugging and testing
- Start executing simple programs (LED blinker, memory writes) to validate basic behavior.
- Use known Z80 test ROMs (e.g., ZEXALL) to verify instruction correctness. Run subsets if you have partial support.
- Add logging with conditional breakpoints to inspect state on invalid opcodes.
- Create unit tests for ALU operations, flag results, and edge cases (carry/half-carry, signed arithmetic).
- Compare against a reference emulator for instruction-by-instruction trace matching.
Some practical tips:
- Implement a “trace” mode that prints PC, opcode bytes, and register state each instruction — invaluable when bugs appear.
- Use assertions for invariants (stack pointer alignment, valid cycle counts).
- When a bug appears, reduce the test case to minimal reproducible sequence.
Performance optimizations
Once correct, profile and optimize:
- Use lookup tables and function pointers instead of deep switch statements.
- Inline hot ALU operations.
- Precompute flag tables for operations (e.g., add/sub result → flags).
- Minimize memory reads: fetch multi-byte operands into local variables.
- Use host CPU-specific optimizations (SIMD unlikely to help here).
- JIT-compile hot inner loops if extreme speed is required.
Often, well-structured C or Rust with careful inlining and tables is sufficient for real-time emulation of Z80-based systems.
Example: Implementing ADD A, r and flags
A concise, correct implementation of ADD A, r must update S, Z, H, P/V, N (reset), and C. Use helpers and precomputed tables for parity.
Pseudocode:
void op_ADD_A_r(Z80 *z, uint8_t value) { uint8_t a = z->A; uint16_t result = (uint16_t)a + value; z->F = 0; if (result & 0x80) z->F |= FLAG_S; if ((result & 0xFF) == 0) z->F |= FLAG_Z; if (((a & 0x0F) + (value & 0x0F)) & 0x10) z->F |= FLAG_H; if (((a ^ value ^ (result & 0xFF)) & 0x80)) z->F |= FLAG_PV; // overflow if (result & 0x100) z->F |= FLAG_C; z->A = result & 0xFF; z->F &= ~FLAG_N; }
Verify this against a test suite; half-carry and overflow rules are common sources of bugs.
Integrating peripherals
To run real software, emulate the target machine’s peripherals:
- Video: implement scanline timing, VRAM, and I/O registers. Sync to CPU cycles for accurate display.
- Sound: emulate PSG or FM chips with cycle-synced envelope/timer updates.
- Timers: emulate hardware timers and interrupts.
- Disk, cassette, serial ports as needed for the platform.
Design peripheral interfaces that receive advance(cycles) calls so they progress in lockstep with CPU execution.
Putting it together: initialization and main loop
- Initialize CPU state and memory (load ROMs).
- Install memory and I/O callbacks for peripherals.
- Optionally enable tracing or breakpoint hooks.
- Enter main loop:
- Run CPU for a number of cycles or until next event (e.g., VBlank).
- Call peripherals to advance by the same cycle count.
- Handle input, timers, audio buffer generation, and rendering.
- Sleep or throttle to match real-time if running faster than real hardware.
Example simplified main loop:
while(running) { uint64_t start_cycles = z->cycles; run_cycles(z, target_cycles_per_frame); // executes instructions and advances z->cycles uint64_t executed = z->cycles - start_cycles; video_advance(executed); sound_advance(executed); handle_input(); render_frame_if_needed(); }
Additional resources and ROMs for testing
- Z80 CPU User Manual and opcode tables
- ZEXALL test ROM
- Open-source Z80 emulator implementations (for study): SJASM-based projects, MAME’s Z80 core, and small hobby emulators on GitHub
- Community forums and documentation for specific machines (ZX Spectrum, MSX, Game Boy’s LR35902 uses a related instruction set)
Common pitfalls and tips
- Flags are subtle — write unit tests for each ALU instruction.
- DD/FD prefix handling and CB with displacement are tricky; implement small test cases for them.
- Interrupt timing and EI semantics (takes effect after next instruction) often cause bugs.
- R register wrapping and bit-7 behaviors should match specification; some software depends on exact behavior.
- Keep CPU core separate from system-level logic (video/audio) for reusability.
Example minimal emulator repository layout
- src/
- cpu.c / cpu.h
- memory.c / memory.h
- io.c / io.h
- peripherals/
- video.c
- sound.c
- tests/
- zex_tests.c
- tools/
- disassembler.c
- docs/
- opcode_tables.md
Closing notes
Building a Z80 emulator is a rewarding project that teaches low-level computing concepts and software architecture. Start small — implement basic instructions and memory, verify with simple programs, then expand to full instruction coverage, interrupts, and peripherals. Use test ROMs and incremental unit testing to validate correctness, and profile to improve performance. With careful attention to flags, prefixes, and timing, you can create an emulator accurate enough to run vintage software faithfully.
Leave a Reply