The word compiler is so ubiquitous, it needs almost no introduction. It's a suite of classes that loop through a text file and magically allow the text input to manipulate memory and program counters to create meaning.
To recap: all the From Nand to Tetris blog posts cover aspects of the computer science course From Nand to Tetris also called Nand2Tetris: how to build a computer from first principles. These work posts explore and recap the high level language to machine code journey.
Retrospective
What makes the nand2Tetris compiler special is how small yet mighty it is. The Hack computer from nand2Tetris runs software using a language called Jack. Jack has many of the features that come to be expected in a high-level language: it's a typed language (in this case statically typed), C-like syntax, primitive data types (boolean, integer, character, array), class structure, functions, procedures, memory management, and a standard library.
Valuable concepts covered
This project was valuable because it touches on so many important programming concepts: Lexical Analysis (Scanning), Parsing, Semantic analysis, intermediate code generation, optimization, code generation, creating symbol tables to manage variables, testing and debugging, integrating the compiler with the rest of the hack system.
Major omissions
If you've read carefully you may have noticed that a major aspect of programming was completely omitted from above: error handling
Why was error handling omitted? Because nand2Tetris covers so many concepts, the authors of the class have in some cases intentionally made some assumptions to give the course more value and bypass tangential/not directly related topics. In this case, as compiler programmers for the class we have made the assumption that the input code being compiled is "error free". I smile inside every time I contemplate this. Too bad we as developers don't always have this luxury.
The compiler pipeline in nand2Tetris
The compiler to machine code process: High level language goes through syntax analysis, After analysis the high level language is translated to virtual machine code, The virtual machine code is translated into assembly language, Assembly language is converted to machine code.
The compilation process:
The compiler in action
Below you can enter jack code into the textarea and compile it. Notice that the compiler outputs VM Code, not machine code. The process of going from VM code to machine code is much more trivial. Another interesting discovery for me was that VM Programs can and will look different from different compilers, but can create programs that create identical results.
Use the buttons below to preload the textarea with some Jack code. I think you will also agree that the Jack programming language is quite simple to read, I think it will look very familiar to anyone that's worked with java, c or javascript before.
Input Jack code:
Takaways from writing this compiler
Completing this part of Nand2Tetris was quite a gratifying process. As a developer with a front end background, I never had to think so carefully about building the foundation, categorizing and organizing my code. That being said, if I were to do it all over again it would be quite different. I wrote the compiler in python (as i did with the assembler, translator, and syntax analyzer). I wrote all four programs without really taking the time to learn the python language very well. It has features (like dictionaries) that helped greatly with categorizing and organizing my classes. It also has features that I didn't find out about until after the course, that I wish I had known about, like pattern matching (which would have been a game changer). If you don't know what I'm talking about let me explain.
Some languages like javascript have switch statements which are designed for "value based" matching:
function matchValue(value) {switch (value) {case 1:console.log("Value is 1");break;case 2:console.log("Value is 2");break;default:console.log("Value is not 1 or 2");}}matchValue(1); // Output: Value is 1matchValue(2); // Output: Value is 2matchValue(3); // Output: Value is not 1 or 2
Some languages like python or rust have what are called match statements, which allow pattern based matching which is more comprehensive and exhaustive (checks for more posibilities more easily and directly). What can be done in a handful of lines of code in a match statement would take many more in a language that only has branching like JavaScript. More importantly, branching statements are much harder to read than match statements.
Consider the Rust match statement below. It compares two coordinates and changes the output based on variation of both variables:
// Rust match statementlet coordinates = (3, 4);match coordinates {(0, 0) => println!("Coordinates are at the origin."),(x, 0) => println!("Coordinates are on the x-axis at x = {}.", x),(0, y) => println!("Coordinates are on the y-axis at y = {}.", y),(x, y) => println!("Coordinates are at (x = {}, y = {}).", x, y),}
The above code uses a tuple of 3 and 4. When the coordinates match the conditions set in the branch (branch is the name of each function line in the match statement), a different message is printed to the console.
Look at how the same thing is done in JavaScript. It's not as neat and readable as the Rust code:
function matchCoordinates(coordinates) {const [x, y] = coordinates;if (x === 0 && y === 0) {console.log("Coordinates are at the origin.");} else if (x !== 0 && y === 0) {console.log(`Coordinates are on the x-axis at x = ${x}.`);} else if (x === 0 && y !== 0) {console.log(`Coordinates are on the y-axis at y = ${y}.`);} else {console.log(`Coordinates are at (x = ${x}, y = ${y}).`);}}const coordinates = [3, 4];matchCoordinates(coordinates);
That last block of code is basically how my entire compiler looks, except way more complex and messy. I don't dare touch the application again, because it's so darn hard to understand. Lessons learned! 😬
Compilers are filled with scenarios where match statements work perfectly. They work well in any program where there are many kinds of conditions to process. Emulators for example are another place.