We can finally get more than a single number out of our machine – we can print (short) strings, too. But getting strings into our machine is a pain. Let’s improve our assembler!
Data instruction
We’ll support a new instruction, data
, that allows us to add
arbitrary strings to body of our program, and to reference the address of those
strings by label. We’ll also support immediate values and “reference” values
(that refer to the addresses of other labels) in the same form, for
improved readability and convenience:
data @message 'Hello, world!\n' ; Output 14 bytes -- size included.
data @identifier 0x0 0x1 0x2 ; Output 3 bytes.
data @pointer @identifier ; Output 1 byte -- the address of @identifier.
Each of these forms introduce a new label that can be used in the body of the program, just like “regular” label definitions. So we can use them thusly:
constant r0
@message
This new “instruction”, in its various forms, will not be a real
instruction added to our instruction set; instead it will only be understood by our
assembler, which will translate it to existing instructions – namely label definition
and immediates. For instance, we can translate data @message 'Hi!'
to the following:
@message:
0x03
0x48
0x69
0x21
Recall that strings are prefixed with their length, hence the first byte is 0x3
.
Assembling
So far we have three kinds of directives: instructions, label definitions,
and label references. So we’ll add another, which includes the data to output
as a list of numbers. For “reference” data like data @pointer @message
we
can’t compute the actual value until our first pass has collected all
references, so we simply store the reference label itself:
interface Data {
type: 'data';
label: string;
data?: number[];
reference?: string;
}
For strings the data will correspond to the codepoints of the string – since we’re working with a 32-bit machine we get Unicode support “for free”! We can convert a parsed string into a list of its code points, along with its length, as follows:
function stringToData(s: string): number[] {
const data = Array.from(s).map((s) => s.codePointAt(0) || 0);
data.unshift(data.length);
return data;
}
Now we have two different ways to define new labels – the existing @label:
form, and the new data @label ...
form – so we must update our first pass that
collects label definitions:
// First pass: collect the addresses.
let address = 0;
const labelAddresses: {[label: string]: number} = {};
directives.forEach((directives, i) => {
if(directives.type === 'label' || directives.type === 'data'){
labelAddresses[directives.label] = address;
}
address += directives.size
});
Finally, we update our second pass to actually emit the data we’ve defined. For strings and immediates we can simply emit the data itself, whereas for references we’ll need to look up the relevant address in the address table we collected in the first pass:
switch(directive.type){
// ...
case 'data': {
if(directive.reference){
encoded.push(labelAddresses[directives.reference]);
}
else {
encoded.push(...directive.data);
}
}
}
Relocation
One annoyance is that, while data
instructions read as though they are outside of
the flow of the program, in fact they are not. For instance, despite appearances,
code like the following is incorrect:
add r2 r3 r4
data @message 'Hello!'
constant r4 @message
That’s because the machine will first fetch, decode, and execute the add
instruction.
And then it will fetch and attempt to decode the first byte of our data, which will be
the immediate value 0x00000006
. This decodes to halt
!
It can therefore be convenient for our assembler to relocate the data
instructions
to the end of the code (or alternatively to beginning the program, with an unconditional
jump over the data) before we assemble a program:
function relocate(directives: Directive[]){
const nonData = directives.filter((directive) => {
return directive.type !== 'data';
});
const data = directives.filter((directive) => {
return directive.type === 'data';
});
return [
...nonData,
...data,
];
}
Once we’ve done that, we can update our “Hello, world!” program
to print any string we care to! Try changing the value of @message
below:
Ready.