Quinix

(This page might be a bit more readable if you enable 1st-party CSS.)

We can finally get more than a single number out of our machine – we can print (short) strings, too. But getting strings into our machine is a pain. Let’s improve our assembler!

Data instruction

We’ll support a new instruction, data, that allows us to add arbitrary strings to body of our program, and to reference the address of those strings by label. We’ll also support immediate values and “reference” values (that refer to the addresses of other labels) in the same form, for improved readability and convenience:

data @message 'Hello, world!\n' ; Output 14 bytes -- size included.
data @identifier 0x0 0x1 0x2    ; Output 3 bytes.
data @pointer @identifier       ; Output 1 byte -- the address of @identifier.

Each of these forms introduce a new label that can be used in the body of the program, just like “regular” label definitions. So we can use them thusly:

constant r0
@message

This new “instruction”, in its various forms, will not be a real instruction added to our instruction set; instead it will only be understood by our assembler, which will translate it to existing instructions – namely label definition and immediates. For instance, we can translate data @message 'Hi!' to the following:

@message:
0x03
0x48
0x69
0x21

Recall that strings are prefixed with their length, hence the first byte is 0x3.

Assembling

So far we have three kinds of directives: instructions, label definitions, and label references. So we’ll add another, which includes the data to output as a list of numbers. For “reference” data like data @pointer @message we can’t compute the actual value until our first pass has collected all references, so we simply store the reference label itself:

interface Data {
  type: 'data';
  label: string;
  data?: number[];
  reference?: string;
}

For strings the data will correspond to the codepoints of the string – since we’re working with a 32-bit machine we get Unicode support “for free”! We can convert a parsed string into a list of its code points, along with its length, as follows:

function stringToData(s: string): number[] {
  const data = Array.from(s).map((s) => s.codePointAt(0) || 0);
  data.unshift(data.length);
  return data;
}

Now we have two different ways to define new labels – the existing @label: form, and the new data @label ... form – so we must update our first pass that collects label definitions:

// First pass: collect the addresses.
let address = 0;
const labelAddresses: {[label: string]: number} = {};
directives.forEach((directives, i) => {
  if(directives.type === 'label' || directives.type === 'data'){
    labelAddresses[directives.label] = address;
  }
  address += directives.size
});

Finally, we update our second pass to actually emit the data we’ve defined. For strings and immediates we can simply emit the data itself, whereas for references we’ll need to look up the relevant address in the address table we collected in the first pass:

switch(directive.type){
  // ...
  case 'data': {
    if(directive.reference){
      encoded.push(labelAddresses[directives.reference]);
    }
    else {
      encoded.push(...directive.data);
    }
  }
}

Relocation

One annoyance is that, while data instructions read as though they are outside of the flow of the program, in fact they are not. For instance, despite appearances, code like the following is incorrect:

add r2 r3 r4
data @message 'Hello!'
constant r4 @message

That’s because the machine will first fetch, decode, and execute the add instruction. And then it will fetch and attempt to decode the first byte of our data, which will be the immediate value 0x00000006. This decodes to halt!

It can therefore be convenient for our assembler to relocate the data instructions to the end of the code (or alternatively to beginning the program, with an unconditional jump over the data) before we assemble a program:

function relocate(directives: Directive[]){
  const nonData = directives.filter((directive) => {
    return directive.type !== 'data';
  });
  const data = directives.filter((directive) => {
    return directive.type === 'data';
  });
  return [
    ...nonData,
    ...data,
  ];
}

Once we’ve done that, we can update our “Hello, world!” program to print any string we care to! Try changing the value of @message below:

Ready.
Preview source »
Next up: Input and asynchrony »