Quinix

We can finally get more than a single number out of our machine – we can print (short) strings, too. But getting strings into our machine is a pain. Let’s improve our assembler!

Data instruction

We’ll support a new instruction, data, that allows us to add arbitrary strings to body of our program, and to reference the address of those strings by label. We’ll also support immediate values and “reference” values (that refer to the addresses of other labels) in the same form, for improved readability and convenience:

data @message 'Hello, world!\n' ; Output 14 bytes -- size included.
data @identifier 0x0 0x1 0x2    ; Output 3 bytes.
data @pointer @identifier       ; Output 1 byte -- the address of @identifier.

Each of these forms introduce a new label that can be used in the body of the program, just like “regular” label definitions. So we can use them thusly:

constant r0
@message

This new “instruction”, in its various forms, will not be a real instruction added to our instruction set; instead it will only be understood by our assembler, which will translate it to existing instructions – namely label definition and immediates. For instance, we can translate data @message 'Hi!' to the following:

@message:
0x03
0x48
0x69
0x21

Recall that strings are prefixed with their length, hence the first byte is 0x3.

Assembling

So far we have three kinds of directives: instructions, label definitions, and label references. So we’ll add another, which includes the data to output as a list of numbers. For “reference” data like data @pointer @message we can’t compute the actual value until our first pass has collected all references, so we simply store the reference label itself:

interface Data {
  type: 'data';
  label: string;
  data?: number[];
  reference?: string;
}

For strings the data will correspond to the codepoints of the string – since we’re working with a 32-bit machine we get Unicode support “for free”! We can convert a parsed string into a list of its code points, along with its length, as follows:

function stringToData(s: string): number[] {
  const data = Array.from(s).map((s) => s.codePointAt(0) || 0);
  data.unshift(data.length);
  return data;
}

Now we have two different ways to define new labels – the existing @label: form, and the new data @label ... form – so we must update our first pass that collects label definitions:

// First pass: collect the addresses.
let address = 0;
const labelAddresses: {[label: string]: number} = {};
directives.forEach((directives, i) => {
  if(directives.type === 'label' || directives.type === 'data'){
    labelAddresses[directives.label] = address;
  }
  address += directives.size
});

Finally, we update our second pass to actually emit the data we’ve defined. For strings and immediates we can simply emit the data itself, whereas for references we’ll need to look up the relevant address in the address table we collected in the first pass:

switch(directive.type){
  // ...
  case 'data': {
    if(directive.reference){
      encoded.push(labelAddresses[directives.reference]);
    }
    else {
      encoded.push(...directive.data);
    }
  }
}

Relocation

One annoyance is that, while data instructions read as though they are outside of the flow of the program, in fact they are not. For instance, despite appearances, code like the following is incorrect:

add r2 r3 r4
data @message 'Hello!'
constant r4 @message

That’s because the machine will first fetch, decode, and execute the add instruction. And then it will fetch and attempt to decode the first byte of our data, which will be the immediate value 0x00000006. This decodes to halt!

It can therefore be convenient for our assembler to relocate the data instructions to the end of the code (or alternatively to beginning the program, with an unconditional jump over the data) before we assemble a program:

function relocate(directives: Directive[]){
  const nonData = directives.filter((directive) => {
    return directive.type !== 'data';
  });
  const data = directives.filter((directive) => {
    return directive.type === 'data';
  });
  return [
    ...nonData,
    ...data,
  ];
}

Once we’ve done that, we can update our “Hello, world!” program to print any string we care to! Try changing the value of @message below:

data @message 'Hello!'
data @io 0x100
data @buffer 0x101
<p>constant r1         ; One.
0x1
constant r2         ; Input buffer.
@message
constant r3         ; Output buffer.
@buffer
load r3 r3
constant r4         ; Counter.
0x0
load r5 r2          ; Number of bytes to write – first byte of <code>message</code>.
add r5 r5 r1        ; (Add 1 for length.)</p>
<p>@loop:
load r6 r2          ; Copy 1 byte from message to our peripheral.
store r3 r6
add r2 r2 r1        ; Increment our pointers.
add r3 r3 r1
add r4 r4 r1        ; Increment our counter.
eq r6 r4 r5         ; Check if we have copied enough bytes.
constant r7
@loop
jnz r6 r7           ; If we haven’t yet copied enough bytes, loop again.</p>
<p>constant r0         ; Write!
@io
load r0 r0
store r0 r1</p>
<p>mov r0 r5           ; Return the number of bytes written.
halt

Ready.

Preview source »

(This page might be a bit more readable if you enable 1st-party CSS.)

Data instruction

Assembling

Relocation