Nand2Tetris: Building a Jack Language Virtual Machine Translator

February 24, 2022

Summary

To recap: all the From Nand to Tetris blog posts cover aspects of the computer science course From Nand to Tetris also called Nand2Tetris: how to build a computer from first principles. These work posts explore and recap the high level language to machine code journey.

This post is about the Virtual Machine translator from the nand2Tetris course. It covers the definition and purpose of a virtual machine, where it fits into the nand2Tetris course and some of the lessons learned during project development.

The VM Translator is not a direct "line to line" translation like the Hack Assembler project in the last post. The translator translates Virtual Machine commands which equate to several assembly commands. Translating VM commands into assembly is quite a minor complication (a footnote really). This is the project where the real fun begins: Thinking deeply about, organizing and categorizing the different aspects of the Hack architecture / API.

About the hack Virtual Machine Translator

When compiling high-level code down to machine language it's common to take intermediary steps along the way. The larger problem of translating the high-level language into machine code is simplified by creating a lower-level language that sits between assembly code and the high-level language (in this case Jack code). BTW, Java and C# compile into virtual machine languages before further compilation. The Jack high-level language to machine code model looks like this:

graph LR
    subgraph High-Level Language
        A[High-Level Language Code]
    end

    subgraph Virtual Machine Code
        B[Virtual Machine Code]
    end

    subgraph Assembly Code
        C[Assembly Code]
    end

    subgraph Machine Code
        D[Machine Code]
    end

    A -->|Compilation| B
    B -->|Translation| C
    C -->|Translation| D

What is Virtual Machine code?

The concept of a virtual machine is like an intermediary between a high level language (think Java or whichever high level language works for you) and low level assembly language which is the thinnest layer possible between machine code and human composable computer code.

Virtual machine code allows the developer to specify important operations like:

pushing and popping values to/from registers

creating named symbols for operations like loops and jumps

branching

All these same things are possible in assembly, just not very easily. This layer of abstraction is easier to get to from a higher level language as well. This is the whole point:

layer of abstraction	name of abstraction	handler	description
0	machine code	CPU	processes the binary code
1	assembly language	assembler	assembles symbolic language to binary
2	virtual machine language	translator	translates VML to assembly code
3	high level language	compiler	compiles from high level to machine code

It's possible to compile a high level language directly to assembly and bypass a virtual machine language, just much harder.

How does VM Code work / get translated?

This is an exciting phase of Nand2Tetris: The VM Translator for the first time uncovers the elegance and beauty of the Hack Architecture. In the architecture there are 32767 16 bit memory blocks. Running an application on computer hardware is a process of writing, moving, deleting, and manipulating data to memory. It's a process of simply telling the computer to "jump here, save this", "jump there, replace that with this", "jump, replace that with 0 (delete)". When this is done according to a defined specification you can start to imagine how things like arrays and objects and programs might work. In order to do all that, a modern computer memory system must be categorized according to a specification:

RAM addresses	Usage
0-15	sixteen virtual registers
16 - 255	Static variables
256 - 2047	Stack
2048 - 16383	Heap
16384 - 24575	Memory I/O

Organizing and categorizing in this way creates meaning in the commands of the VM implementation. Without the above structure there is no meaning in a simple command like:

push constant 10

The structure outlined above lets developers design an architecture for all the locations of each group of registers. Like this:

Register Location	Name	Usage
RAM[0]	SP	Stack Pointer
RAM[1]	LCL	local segment
RAM[2]	ARG	current argument segment
RAM[3]	THIS	this segment
RAM[4]	THAT	that segment
RAM[5-12]	--	temp segment
RAM[13-15]	--	general purpose registers

In the context of the virtual machine push constant 10 means to take the constant 10 and put it on the stack. Having no definition of the stack location makes it impossible to push a number onto the stack, but by "agreeing" that the stack pointer is located at RAM[0] this command can be predictably translated to assembly language because we always know that the stack pointer is at the value of RAM[0]. Please try to follow along with the assembly code below:

// Remember: in hack assembly there are two 16 bit registers A and D. Everytime the "A" register is accessed by the @ symbol it erases the previous value of the A register. The D register remains until explicitly defined.
// RAM[value] is defined b setting M.

// Psuedo code for push constant 10 is: RAM[SP] = 10

// store the value 10:
@10 // To get the constant 10: Set A register to 10,
D=A // Store the value of the A register (constant 10) into the D register

// go to the location of the stack pointer and assign that locaion the value 10.
@0 // To go to the address of the stack pointer first go to address 0. This sets the A register to 0. Why 0? According to the chart above it's the stack pointer,
A=M // Then set A register to the value of RAM[0]. So, A = RAM[SP]
M=D // Finally, set RAM[SP] to D (10)

// increment stack pointer
@0
M=M+1

now the code without comments:

@10
D=A
@0
A=M
M=D
@0
M=M+1

The VM Translator:

Use the buttons below to fill the textarea with some starter code. These functions are from the Nand2Tetris course.

Output:

So what's happening here?

We enter some vm commands and out the other side comes some assembly code.

The translator takes in simple commands like: push constant 2 or push local 0. With a computer system specification, these small commands are easily translated into assembly language. What I think is truely beautiful about this system is that there are really not that many different operations. Most of the time there are only one of two things happening:

pushing or poping data - push [some location or literal] [index or literal]
running CPU operation - add, subtract, comparison

When these seemingly "simple" operations are coupled with the architecture definiation and organization of the memory, complex applications and high level language features are now possible.

How does the translator (this program) work?

To me it's much less important how the translator was written. But, since we're down here: it was actually the second application I wrote in python, the first being the assembler reviewed in the last blog post. There is a backend server that runs the VM Translator when requests are sent through the api. The backend is a simple node express application that spawns the python application.

Written by Michael Barakat, a front end developer living in Seattle.

← Previous Post: Nand2Tetris: Building a Jack Language Syntax Analyzer

Next Post: Nand2Tetris: Building a Jack language compiler →