Nand2Tetris: Building a Jack Language Assembler

February 21, 2022
This is the beginning of a series of posts about my experiences taking the From Nand to Tetris course. It's a computer science course about how to build a computer from first principles. These posts explore and recap the high level language to machine code journey.

Summary

This post is about building an assembler for the nand2Tetris class from the Hebrew University of Jerusalem. An assembler is a program that translates human-readable assembly code (human-readable with great effort) into machine code. Each line of assembly code will have a direct binary representation, so building one is a fairly direct project that serves as a good entry point to the journey of building a compiler, especially if you've never had to write a well-organized application.

Other benefits of building an assembler include gaining a better understanding of how the CPU responds to the given binary. It also gave me the added benefit of learning some basics of Python: open files, reading, writing and creating new files.

Because this is the beginning of the series we'll also cover a bit about the architecture covered in nand2Tetris computer.

Learn more about the From Nand to Tetris course here.

Background of the Hack assembler:

The hack assembler was created as part of completing the work for From Nand to Tetris course from the Hebrew University of Jerusalem.

What's this about "Hack" assembler and "Hack" computer? The "hack" computer is how the nand2Tetris computer system is referred to. So when "hack" assembly or "hack" computer is mentioned it's a reference to the nand2Tetris software curriculum.

The course covers many aspects of hardware, software and algorithms involved with building a computer and abstracts away each layer of how a computer works starting with the simplest concepts and building to more complex topics as you go:

Computer Hardware: Layer of abstraction

1. Boolean Logic Gates

2. Combinatorial Chips

3. Sequential Logic Chips

4. Machine Language

5. Computer Architecture

6. Assembler

7. Stack Arithmetic

8. Program Control

9. High-Level Programming

10. Compiler: Syntax Analysis

11. Compiler: Code Generation

12. Operating System

The class starts with the most basic element: the nand gate. It can be used to build all the components that serve as the basis for a modern computer using Von Neumann architecture.

About the Hack hardware platform

As mentioned above the Hack computer starts with using Nand gates to construct more complex gates and eventually chips which perform more complex operations.

What's a nand gate?

The nand gate is a circuit that can be used to make other logic gates (and, or, nor, xor, etc.)—hence the name of the course nand2tetris. Nand is short for "not and." If you are familiar with how "and" and "or" work in software then you will easily understand that "Nand" is just the inverse of and: Nand 1 and 1 and the result is 0.

A logic gate is a circuit used to build combinatorial chips (half-adder, full-adder, multiplexors, de-multiplexors) and sequential logic chips (flip-flop, program counter, arithmetic logic unit)

Here is the truth table for a nand gate:

input 1input 2output

0

0

1

0

1

1

1

0

1

1

1

o

You can make an "and" gate with two nand gates by splitting the output of one nand gate into both sides of another nand gate:

graph LR; A -->|Input A| NAND1 B -->|Input B| NAND1 NAND1 -->|Output 1 goes to Input A| NAND2 NAND1 -->|Output 1 goes to Input B| NAND2 NAND2 -->|Output| Q

Truth table for the two nand gates wired up as an and gate:

ABNAND1NAND2AND Gate Output

0

0

1

1

0

0

1

1

1

0

1

0

1

1

0

1

1

0

1

1

The first three parts of the nand2Tetris course are all about combining different chips in clever ways to build more complex kinds of chips. Eventually, the suite of chips will allow you to create the Arithmetic Logic Unit, RAM, Program Counter, and others to form the full computer architecture.

Von Neumann architecture

Since it was mentioned above, Here is what von Neumann architecture looks like (it should look at least somewhat familiar if you know the basics of what's under the hood of a computer):

graph LR subgraph Output F[Output] end subgraph Computer System subgraph CPU B[Control Unit] C{Arithmetic Logic Unit - ALU} end subgraph Memory D{Memory} end end subgraph Input A[Input] end C ---> D D ---> C A --> D C ---> F C --> B B --> C

This is the architecture that the "Hack" computer (and all personal computers) use.

How does the "Hack" assembly work?

At this point you understand that each line of assembly code translates to a line of 16 bit binary. You may wonder what those lines of binary mean?

There are only two kinds of instructions:

  1. A instructions - which correspond to a value that's stored in the A register

  2. C instructions - which correspond to either storing and manipulating a value jumping to a destination.

For the sake of brevity I'm not going to dive into how the instructions are decoded into 16 bit binary. There are plenty of resources online for that already:

Textbook chapter on Hack assembly language here

Another site which gives a tidy break down here

Below is a piece of sample code to give you a better idea of what the assembler is translating into machine code:

@0 // A instruction. Accesses RAM at location 0
D=M // C instruction. D (destination) is assigned the value of RAM @0
@1 // Access RAM at location 1
M=1 // Assign RAM at location 1 the value of 1

How was the Assembler built?

As the title of this post suggests, the assembler itself was written in Python. The python code is running on the backend server for Hellodeary. This site was written with javascript and sends the request to my server where the hack assembly is processed and sent back to this web page.

In terms of software architecture, the assembler scans text input file. The input file is scanned twice. Why twice? Each line of assembly code is either a 16 bit instruction or a symbol (labels or variables). On the first scan the symbols are added to a table and tracked. On the second scan each line is translated to 16 bit binary. When a label or variable is encountered it's looked up in the symbol table and it's value is referenced and converted to binary.

My assembler is quite simple. It uses Dicionaries to store all the different possibilities that could be encountered in the assembly language application. Then when the line is processed, my application builds the binary from each symbol in the line of assembly code.

Consider the code below:

@3
M=A

@3 Sets the A register to integer constant 3. This translates to 16bit binary of 3 or 0000 0000 0000 0011. That's pretty straightforward. The next line M=A is a C Instruction which is made up of three parts: Destination, Comp, and Jump. There is no jump statement in the code above, it's just an assignment. So, the parser looks at each symbol, retrieves the value and then joins the values together into one 16 bit number.

// Base = 1110 0000 0000 0000 The base is always the same - the last three bits are 1.
// Jump = 0000 0000 0000 0000 Jump is bits 0, 1, 2
// Dest = 0000 0000 0000 1000 Dest is bits 3, 4, and 5 (read the bits right to left)
// Comp = 0000 1100 0000 0000 Comp is bits 6 - 12
// val = 1110 1100 0000 1000 _Or_ them all together to get the instruction

Let's see the nand2Tetris assembler in action:

Get started by preloading the assembler with some code that has already been written by using the buttons below:

Input hack assembly:

If everything worked correctly, in the box you will see some machine code. As long as what's typed into the box adheres to the language specification it will compile into beautiful machine code. 🤟


Written by Michael Barakat, a front end developer living in Seattle.