(This page might be a bit more readable if you enable 1st-party CSS.)

So far we have sketched an architecture for our CPU – it’s a 32-bit, register-based RISC machine. Now let’s design a set of instructions for our machine!

The registers

It’s common for RISC machines to provide a large number of registers (you can run out quickly when simple operations require many instructions manipulating many registers). So let’s give our machine, say, 64 registersr0r63. When we actually implement this thing we’ll be careful not to hardcode this number, so we can spin up a machine with twice that number of registers, or more, with ease.

We’ll also need an instruction pointer, so we know which instruction our machine should be executing. We could use on of our general purpose registers, but for convenience when talking about the machine and its state let’s add a new register, ip, for this purpose. (We won’t mess around with ip explicitly though, so we won’t use it with any of our instructions – instead our machine will handle incrementing ip after each instruction).


Now let’s toss in some instructions. For arithmetic, we’ll include add, sub, mul, and div. And mod, too, what the heck:

add r0 r4 r5
sub r2 r0 r3
mul r7 r7 r3
div r3 r4 r8
mod r2 r0 r3

Notice that instructions share a common form:

instruction dr sr0 sr1

Where dr is the destination register, and stands in for any of our general purpose registers, and both sr0 and sr1 are source registers, and also stand in for any general purpose register. The idea is that the instruction performs the operation dr := sr0 <op> sr1, updating the destination register with the result of performing the operation on the source registers. We’ll use this notation, called three address code, going forward.

Now we can add and subtract values in registers, but how do we get values into registers in the first place? Let’s allow ourselves to simply set a register to a constant value: constant dr 0x123, which sets dr to be the literal (hexadecimal) value 0x123dr := 0x123. (Sometimes these literals are referred to as immediates). We also will want to be able to move these] values freely between registers: mov dr sr0, which sets dr to be the value that is in srdr := sr.

With just these instructions we can write simple programs to calculate mathematical functions. For instance, we can calculate 2 * 3 + 4:

constant r0 2
constant r1 3
constant r2 4
mul r0 r0 r1
add r0 r0 r2

Now r0 will have the value 10.


We aren’t even close to being able to display output – where would we even display it? – but we’d like to be able to get some kind of output from our machine. For now, we’ll introduce a simple instruction, halt, that stops our machine. And we’ll follow the convention that whatever value is in r0 when the machine halts should be taken to be the “result” of running the machine.

Memory access

Next we’ll add some instructions for manipulating memory: load dr sr0 and store dr sr0. The load instruction reads from memory (at the address indicated by sr0) and writes to the destination register: dr := memory[sr0]. Store operates similarly, reading from a register and writing to memory: memory[dr] := sr0.

These operations implement a very simple addressing mode, one that only supports direct access to a specific, “absolute” memory location. Other architectures provide for a variety of addressing modes, for instance “relative” to the value of a particular register, but we’ll keep things simple.


We can’t yet actually do anything really useful, though, because we can’t yet loop! Let’s add jz sr0 sr1 and jnz sr0 sr1 as conditional jumps; jz will set the instruction pointer to sr1 when sr0 is 0, otherwise it will advance to the next instruction as usual. jnz will do the opposite, checking that sr0 is not 0. And jmp will jump unconditionally:

In some sense we only really need a single, conditional looping instruction; for instance, instead of jmp r0 we could write:

constant r1 0
jz r1 r0

And if mov were allowed to apply to ip, we could skip our conditional branching instructions entirely! But explicit branching instructions are nicer to read and write by hand, and later will be more convenient when compiling a higher level language to assembly.


Our conditional jumps compare with zero, a common pattern in real architectures. But how do we “get” a 0 in the first place? Well, let’s say we want to determine whether r0 is equal to r1, and if so branch to r2. We could implement it thusly:

sub r3 r0 r1
jz r3 r2

That is, we subtract r0 from r1 – if they are equal, then we get 0 and we will branch, otherwise we get non-zero and we will not branch. Getting to greater than or less than with the instructions we’ve added thus far is a bit trickier, but can be done.

These cute hacks to avoid explicit comparison instructions play fast and loose with overflow, too!

It’s annoying, though, so we’ll add eq, neq, lt, and gt operations, all with the effect dr := sr0 <op> sr1 ? 0 : 1. So if we compare two registers for equality, and they are equal, then the result will be 0. Then we can branch based on their equality using jz.

Bitwise operations

Most instructions support a variety of bitwise operations, like and, or, and so on. We could require users of our machine to implement these out of the previously describe primitive instructions, but that’ll be a pain. So add and, or, and not as well. (Later we’ll probably even want to add bitwise shift instructions, shl and shr, but we’ll leave them out for now).

Phew! This was supposed to be a RISC machine, and already we’re nearing 20 instructions? Considering x86 sports well over 1000 instructions, I think we’re doing ok.

Next up: A simple interpreter »