Yorkville High School Computer Science Department
Yorkville High School Computer Science Department on Facebook  Yorkville High School Computer Science Department Twitter Feed  Yorkville High School Computer Science Department on Instagram

Yorkville High School Computer Science

ASSIGNMENTS: Mobile App - May 10, 2019 :: UI Project - May 10, 2019 :: Web App - May 15, 2019 >>

Operating System Design :: Projects :: Compiler Part 1

Due: Nov 19


The files for this project are in the projects directory under 10 when you downloaded the class software.

You will need two tools: the programming language with which you will implement your Syntax Analyzer, and the supplied TextComparer utility (available in your tools directory), which allows comparing the output files generated by your analyzer to the compare files supplied by us. You may also want to inspect the generated and supplied output files visually, using some XML viewer (any standard web browser should do the job).


Chapter 10 includes a proposed, language-independent Syntax Analyzer API, which can serve as your implementation's blueprint. We propose implementing the Syntax Analyzer in two stages. First, write and test a Tokenizer module. Next, write and test a Parser module.

Stage I: Tokenizer: A tokenizer is a program that breaks a given textual input into a list of separate tokens. And while it is at it, it can also classify the tokens into lexical categories. This is the first task that syntax analyzers normally do. Your first task it to implement the JackTokenizer module specified in chapter 10. When applied to a text file containing Jack code, the tokenizer should produce a list of tokens, each printed in a separate line along with its classification: symbol, keyword, identifier, integer constant or string constant. Here is an example:

Source Code (Input)

if (x < 0) 
	{let state = "negative";}

Tokenizer Output

   <keyword> if </keyword>
   <symbol> ( </symbol>
   <identifier> x </identifier>
   <symbol> &lt; </symbol>
   <integerConstant> 0 </integerConstant>
   <symbol> ) </symbol>
   <symbol> { </symbol>
   <keyword> let </keyword>
   <identifier> state </identifier>
   <symbol> = </symbol>
   <stringConstant> negative </stringConstant>
   <symbol> ; </symbol>
   <symbol> } </symbol>

Note that in the case of string constants, the tokenizer throws away the double-quote characters. This behavior is intended, and is part of our tokenizer specification.

Note that four of the symbols used in the Jack language (<, >, ", and &) are also used for XML markup, and thus they cannot appear verbatim as XML data. To solve the problem, we require the tokenizer to output these tokens as &lt;, &gt;, &quot;, and &amp;, respectively. For example, in order for the text "<symbol> < </symbol>" to be displayed properly in a web browser, it should be generated as "<symbol>&lt;</symbol>".

Stage II: Parser: Once you are done implementing the tokenizer for the Jack language, you can proceed implementing a Jack parser. A parser goes over the tokenized text and emits output indicating that it "understood" the text's grammatical structure. In order to do so, the parser must include methods that look for canonic structures in a certain language - in our case Jack - and then emit these structures in some agreed upon formalism - in our case the XML described in Chapter 10.

In both Project 10 and Project 11, the Jack parser is implemented as a module called CompilationEngine. We recommend following the compilation engine's API, and writing each one of its methods, making sure that it emits the correct XML output. We also recommend to first write a compilation engine that handles any given Jack code except for expressions, and unit-testing it as explained below.


A natural way to test your analyzer it is to have it parse some representative Jack programs. We supply two such test programs: Square Dance and Array Test. The former includes all the syntactic features of the Jack language except for array processing, which appears in the latter. We also provide a simpler version of the SquareDance program that does not include expressions.

Tokenizer Testing:

Parser Testing:

Yorkville High School Computer Science Department on Facebook Yorkville High School Computer Science Department Twitter Feed Yorkville High School Computer Science Department on Instagram