
#  Building a minC (minimum C) compiler


Enter your name and student ID.

 * Name:
 * Student ID:



<a name="intro"> </a>
# 1. Introduction
* This is the instruction for those who choose Option A for the term report
* You build a compiler for a minimum subset of C language, dubbed 'minC'
* This page explains how you should work on it, but <font color="red">your actual work is supposed to be done mostly in terminal and text editor</font>


# 2. Prepare AI Tutor
* execute the following cell to set up your tutor

In [1]:
from heytutor import *
config(default_lang="C")  # choose one of Go/Julia/OCaml/Rust

* you may want to ask a few basics

In [2]:
hey("""what is csel instruction of ARM64 isa?""")

The `csel` (Conditional Select) instruction in the ARM64 (AArch64) instruction set architecture is used to select one of two source registers based on a condition code. It's a predicated instruction that's particularly useful for avoiding conditional branches and achieving more efficient code execution in certain scenarios.

### Explanation:
The `csel` instruction performs the following operation:
1. **Inputs**: It takes three registers (`Rn`, `Rm`, `Rd`) and a condition (`cond`).
   - `Rn` and `Rm` are source registers.
   - `Rd` is the destination register.
   - `cond` is the condition code (e.g., EQ for equal, NE for not equal, etc.).

2. **Output**: It writes the value from `Rn` if the condition is true, and the value from `Rm` if the condition is false. The result is stored in `Rd`.

### Format:
```
CSEL <Xd>, <Xn>, <Xm>, <cond>
```
Or for 32-bit registers:
```
CSEL <Wd>, <Wn>, <Wm>, <cond>
```
Where:
- `Xd` / `Wd`: Destination register (64-bit or 32-bit).
- `Xn` / `Wn`: First source register.
- `Xm` / `Wm`: Second source register.
- `cond`: Condition code.

### Example:

Suppose you want to select between two registers `x1` and `x2`, and store the result in `x0`. Use the condition code `EQ` (Equal) to choose `x1` when the condition flag indicates equality and `x2` otherwise:

```assembly
CMP x3, x4      // Compare values in x3 and x4
CSEL x0, x1, x2, EQ  // Select x1 if x3 == x4, else select x2
```

Here:
- The `CMP` instruction sets the condition codes based on the comparison of `x3` and `x4`.
- The `CSEL` instruction uses the condition code set by `CMP`. If `x3 == x4` (condition EQ is true), it sets `x0 = x1`. Otherwise, it sets `x0 = x2`.

### Use Case:
The `csel` instruction is useful when you want to avoid branching (e.g., `if` statements), reducing potential pipeline stalls or branch mispredictions. It is faster in situations where both source registers are ready, as the compiler avoids generating a branch instruction.

For more detailed documentation, refer to the ARM Architecture Reference Manual: [ARM 64 Reference](https://developer.arm.com/documentation).


# 3. Files
* `parser/` : a C -> XML parser written in Python + tatsu (parser generator), which converts a C source into an equivalent XML
* `{ml,jl,go,rs}/minc` : a directory for each language
* `test/` : test programs

# 4. Parser (C -> XML)
* `minc_grammar.y` : grammar definition of the minimum C language we work on, which is converted into a working python program (`minc_grammar.py`) using [tatsu](https://github.com/neogeny/TatSu) parser generator library

In [4]:
cd ~/notebooks/pl08_minc/parser
tatsu minc_grammar.y > minc_grammar.py

------------------------------------------------------------------------
         409  lines in grammar
          21  rules in grammar
         209  nodes in AST


* `minc_to_xml.py` : the main parser program that drives minc_grammar.py to convert a C source into an equivalent XML

In [6]:
python minc_to_xml.py example/ex.c > ex.xml

* examine the result and understand how each C construct is converted into XML

In [7]:
cat example/ex.c 

long f(long x, long y) {
  long u;
  u = x + y;
  return u * 3;
}


In [8]:
cat ex.xml

<program xmlns="https://program.com">
 <fun_def>
  <name>f</name>
  <params>
   <param>
    <type>
     <primitive_type>long</primitive_type>
    </type>
    <name>x</name>
   </param>
   <param>
    <type>
     <primitive_type>long</primitive_type>
    </type>
    <name>y</name>
   </param>
  </params>
  <return_type>
   <primitive_type>long</primitive_type>
  </return_type>
  <body>
   <compound>
    <decls>
     <decl>
      <type>
       <primitive_type>long</primitive_type>
      </type>
      <name>u</name>
     </decl>
    </decls>
    <stmts>
     <expr_stmt>
      <bin_op>
       <op>=</op>
       <left>
        <var>u</var>
       </left>
       <right>
        <bin_op>
         <op>+</op>
         <left>
          <var>x</var>
         </left>
         <right>
          <var>y</var>
         </right>
        </bin_op>
       </right>
      </bin_op>
     </expr_stmt>
     <return>
      <bin_op>
       <op>*</op>
       <left>
        <var>u</var>
       </left>
       <righ

* you don't have to modify `minc_grammar.y` or `minc_to_xml.py` unless you extend the grammar for extra points
* yet you are encouraged to see how simple does tatsu (or any parser generator, for that matter) make it to write a parser; just take a look at `minc_grammar.y`

##  Note: why are we using XML?
* it is unnecessary and unusual to convert the source program first into XML and then to the abstract syntax tree
* more commonly, you use a parser generator for the language you write your compiler with (OCaml, Julia, Go, or Rust), which allows you to directly build the abstract syntax tree you can manipulate in that language
* for example, C/C++ has [flex](https://en.wikipedia.org/wiki/Flex)/[bison](https://en.wikipedia.org/wiki/GNU_Bison) parser generator, whose grammar description file (analogous to minc_grammar.y) allows you to build any C/C++ data structure in it; OCaml has [ocamllex/ocamlyacc](https://v2.ocaml.org/manual/lexyacc.html) ([Menhir](http://gallium.inria.fr/~fpottier/menhir/) is a newer version of ocamlyacc).  there is a parser generator that supports multiple languages, most notably [ANTLR](https://www.antlr.org/), which supports Java and Python code generation.  in circumstances where there is a tool available for your language, it is much more straightforward and convenient to use these tools without going through XML
* the reasons why we go through XML are
  * I could not find a popular parser generator for some of the languages (Go and Julia)
  * even if one exists in each language, there will be differences between them that make it difficult/tricky/cumbersome to explain them
* so I arrived at a parser generator (tatsu) for Python as a common middle ground and XML as a common data structure all languages can easily read


# 5. {go,jl,ml,rs}/minc
* in each language-specific directory (`go, jl, ml, rs`) , there is a toplevel directory `minc`
* the code given there is a skeleton of a compiler that reads an XML file, builds its abstract syntax tree, and finally calls the code generator
* the code generator is currently almost empty and raises an exception when called
* your main job in the assignment is to complete the code generator

## 5-1. Go
###  files
* `go/`
  * `minc/`
    * `minc_ast.go` --- abstract syntax tree (AST) definition
    * `minc_parse.go` --- XML -> AST converter
    * `minc_cogen.go` --- AST -> assembly code
    * `minc.go` --- the main file

###  build

In [9]:
export PATH=~/go/bin:$PATH
cd ~/notebooks/pl08_minc/go/minc
ls -lR

.:
total 36
-rw-r--r-- 1 tau tau   156 Jun 23 00:39 go.mod
-rw-r--r-- 1 tau tau   398 Jun 23 00:39 go.sum
-rw-r--r-- 1 tau tau  6058 Jun 23 00:39 minc_ast.go
-rw-r--r-- 1 tau tau   217 Jun 23 00:39 minc_cogen.go
-rw-r--r-- 1 tau tau   698 Jun 23 00:39 minc.go
-rw-r--r-- 1 tau tau 10653 Jun 23 00:39 minc_parse.go


In [10]:
go build

go: downloading github.com/subchen/go-xmldom v1.1.2
go: downloading github.com/antchfx/xpath v0.0.0-20170515025933-1f3266e77307


###  run
* be sure you have generated ex.xml by `minc_to_xml.py`
* try to compile a small program and see that the code generator raises an exception


In [11]:
./minc ../../parser/ex.xml ex.s

panic: YOU MUST IMPLEMENT go/minc/minc_cogen.go:ast_to_asm_program

goroutine 1 [running]:
main.ast_to_asm_program(...)
	/home/tau/notebooks/pl08_minc/go/minc/minc_cogen.go:4
main.file_xml_to_asm({0x7fff8cd6f69a?, 0x412451?})
	/home/tau/notebooks/pl08_minc/go/minc/minc.go:7 +0x2c
main.file_xml_to_file_asm({0x7fff8cd6f69a?, 0x0?}, {0x7fff8cd6f6ae, 0x4})
	/home/tau/notebooks/pl08_minc/go/minc/minc.go:14 +0x25
main.main()
	/home/tau/notebooks/pl08_minc/go/minc/minc.go:25 +0x45


: 2

* take a look at the source code that caused the exception

In [12]:
cat minc_cogen.go

package main
func ast_to_asm_program(program * Program) string {
	asm := "this is an assembly code generated by minc compiler ...\n"
	panic("YOU MUST IMPLEMENT go/minc/minc_cogen.go:ast_to_asm_program")
	return asm
}


* your job is to implement `ast_to_asm_program` function

## 5-2. Julia
###  files
* `jl/`
  * `minc/`
    * `minc_ast.jl` --- abstract syntax tree (AST) definition
    * `minc_parse.jl` --- XML -> AST converter
    * `minc_cogen.jl` --- AST -> assembly code
    * `minc.jl` --- the main file

###  build

In [19]:
export PATH=~/.juliaup/bin:$PATH
cd ~/notebooks/pl08_minc/jl/minc
ls -lR

.:
total 28
-rw-r--r-- 1 tau tau  5352 Jun 23 00:39 minc_ast.jl
-rw-r--r-- 1 tau tau   220 Jun 23 00:39 minc_cogen.jl
-rwxr-xr-x 1 tau tau   765 Jun 23 00:39 minc.jl*
-rw-r--r-- 1 tau tau 10455 Jun 23 00:39 minc_parse.jl


In [20]:
chmod +x minc.jl

###  run
* be sure you have generated ex.xml by `minc_to_xml.py`
* try to compile a small program and see that the code generator raises an exception


In [21]:
./minc.jl ../../parser/ex.xml ex.s

The latest version of Julia in the `release` channel is 1.11.5+0.x64.linux.gnu. You currently have `1.11.4+0.x64.linux.gnu` installed. Run:

  juliaup update

in your terminal shell to install Julia 1.11.5+0.x64.linux.gnu and update the `release` channel to that version.
]0;Julia]0;Julia[91m[1mERROR: [22m[39mLoadError: YOU MUST IMPLEMENT jl/minc/minc_cogen.jl:ast_to_asm_program
Stacktrace:
 [1] [0m[1mast_to_asm_program[22m[0m[1m([22m[90mprogram[39m::[0mProgram[0m[1m)[22m
[90m   @[39m [35mMain[39m [90m~/lectures/programming-languages/jupyter/notebooks/source/pl08_minc/jl/minc/[39m[90m[4mminc_cogen.jl:4[24m[39m
 [2] [0m[1mfile_xml_to_asm[22m[0m[1m([22m[90mfile_xml[39m::[0mString[0m[1m)[22m
[90m   @[39m [35mMain[39m [90m~/lectures/programming-languages/jupyter/notebooks/source/pl08_minc/jl/minc/[39m[90m[4mminc.jl:9[24m[39m
 [3] [0m[1mfile_xml_to_file_asm[22m[0m[1m([22m[90mfile_xml[39m::[0mString, [90mfile_asm[39m::[0mString[

: 1

* take a look at the source code that caused the exception

In [22]:
cat minc_cogen.jl


function ast_to_asm_program(program :: Program)
    asm = "this is an assembly code generated by minc compiler ...\n"
    throw(ErrorException("YOU MUST IMPLEMENT jl/minc/minc_cogen.jl:ast_to_asm_program"))
    asm
end


* your job is to implement `ast_to_asm_program` function

## 5-3. OCaml
###  files
* `ml/`
  * `minc/`
    * `libs/`
      * `minc_ast.ml` --- abstract syntax tree (AST) definition
      * `minc_parse.ml` --- XML -> AST converter
      * `minc_cogen.ml` --- AST -> assembly code
      * `dune` --- describes dependencies between them
    * `bin/`
      * `main.ml` --- the main file

###  build

In [23]:
eval $(opam env)
cd ~/notebooks/pl08_minc/ml/minc
ls -lR

.:
total 20
drwxr-xr-x 2 tau tau 4096 Jun 23 00:39 bin/
-rw-r--r-- 1 tau tau  481 Jun 23 00:39 dune-project
drwxr-xr-x 2 tau tau 4096 Jun 23 00:39 lib/
-rw-r--r-- 1 tau tau  700 Jun 23 00:39 minc.opam
drwxr-xr-x 2 tau tau 4096 Jun 23 00:39 test/

./bin:
total 8
-rw-r--r-- 1 tau tau 156 Jun 23 00:39 dune
-rw-r--r-- 1 tau tau 745 Jun 23 00:39 main.ml

./lib:
total 28
-rw-r--r-- 1 tau tau  203 Jun 23 00:39 dune
-rw-r--r-- 1 tau tau 4747 Jun 23 00:39 minc_ast.ml
-rw-r--r-- 1 tau tau  447 Jun 23 00:39 minc_cogen.ml
-rw-r--r-- 1 tau tau 8323 Jun 23 00:39 minc_parse.ml

./test:
total 4
-rw-r--r-- 1 tau tau 20 Jun 23 00:39 dune
-rw-r--r-- 1 tau tau  0 Jun 23 00:39 minc.ml


In [25]:
dune build

                                    

###  run
* be sure you have generated ex.xml by `minc_to_xml.py`
* try to compile a small program and see that the code generator raises an exception


In [26]:
_build/default/bin/main.exe ../../parser/ex.xml ex.s

Fatal error: exception Minc_cogen.NotImplemented("YOU MUST IMPLEMENT ml/minc/lib/minc_cogen.ml:ast_to_asm_program")


: 2

* take a look at the source code that caused the exception

In [None]:
cat lib/minc_cogen.ml

* your job is to implement `ast_to_asm_program` function

## 5-4. Rust
###  files
* `rs/`
  * `minc/`
    * `src/`
      * `minc_ast.rs` --- abstract syntax tree (AST) definition
      * `minc_parse.rs` --- XML -> AST converter
      * `minc_cogen.rs` --- AST -> assembly code
      * `main.rs` --- the main file

###  build

In [27]:
. ~/.cargo/env
cd ~/notebooks/pl08_minc/rs/minc
ls -lR

.:
total 12
-rw-r--r-- 1 tau tau 2353 Jun 23 00:39 Cargo.lock
-rw-r--r-- 1 tau tau  187 Jun 23 00:39 Cargo.toml
drwxr-xr-x 2 tau tau 4096 Jun 23 00:39 src/

./src:
total 32
-rw-r--r-- 1 tau tau   963 Jun 23 00:39 main.rs
-rw-r--r-- 1 tau tau  5561 Jun 23 00:39 minc_ast.rs
-rw-r--r-- 1 tau tau   375 Jun 23 00:39 minc_cogen.rs
-rw-r--r-- 1 tau tau 13184 Jun 23 00:39 minc_parse.rs


In [28]:
cargo build

[1m[32m    Updating[0m crates.io index
[K[1m[32m  Downloaded[0m autocfg v1.1.0                                                
[K[1m[32m  Downloaded[0m bytes v1.1.0                                                  
[K[1m[32m  Downloaded[0m memchr v2.5.0                                                 
[K[1m[32m  Downloaded[0m static_assertions v1.1.0619.7 KB                              
[K[1m[32m  Downloaded[0m pin-project-lite v0.2.9 587.0 KB                              
[K[1m[32m  Downloaded[0m tokio v1.20.0ing bytes: 487.5 KB                              
[K[1m[32m  Downloaded[0m rxml_validation v0.8.1: 4.5 KB                                
[K[1m[32m  Downloaded[0m minidom v0.15.0g bytes: 16.6 KB                               
[K[1m[32m  Downloaded[0m smartstring v0.2.10tes: 96.4 KB                               
[K[1m[32m  Downloaded[0m rxml v0.8.1ning bytes: 22.5 KB                                
[K[1m[32m  Downloaded[0m 10 crates

###  run
* be sure you have generated ex.xml by `minc_to_xml.py`
* try to compile a small program and see that the code generator raises an exception


In [29]:
./target/debug/minc ../../parser/ex.xml ex.s


thread 'main' panicked at src/minc_cogen.rs:10:5:
YOU MUST IMPLEMENT rs/minc/src/minc_cogen.go:ast_to_asm_program
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


: 101

* take a look at the source code that caused the exception

In [None]:
cat src/minc_cogen.rs

* your job is to implement `ast_to_asm_program` function


# 6. test
* your primary goal is to pass all the tests

##  files
* `test/`
  * `src/` 
    * `f00.c, f01.c, f02.c, ..., ` --- test programs each definint a function `f`
  * `main.c` --- a file that calls function `f`
  * `Makefile` --- executes all the tests

the `Makefile` does the following for each test program (`src/f00.c, src/f01.c,` ...)

1. convert `src/fXX.c` to `xml/fXX.xml` with `parser/minc_to_xml.py`
1. compile `xml/fXX.xml` to `minc/fXX.s` with the minC compiler you are supposed to write
1. compile `main.c` and `minc/fXX.s` into an executable
1. compile `main.c` and `src/fXX.c` into an executable with gcc
1. execute the two executables and compare their outputs

* in the `Makefile`
 * <font color=red>you must set the variable `minc` to the path to your compiler (relative to the test directory)</font>
```
minc := ../ml/minc/_build/default/bin/main.exe
```
 * you can set the variable `srcs` to change the functions tested. the default is all the 75 files in `src/`
```
srcs := $(wildcard src/f*.c)
```
for example, if you set this to
```
srcs := $(wildcard src/f00.c)
```
you can only test `src/f00.c`

 * `make -n` will show you what will be executed without actually executing it
 

In [None]:
# make sure you set minc variable in Makefile
cd ~/notebooks/pl08_minc/test
make -n

to test a single file

In [None]:
make -n srcs=src/f00.c


* if the test fails, identify which file it failed for and how

* use
```
make srcs=src/fXX.c
```
to run the test on the particular file that failed

* on each file (`src/fXX.c`), it first converts the C source file into XML using `minc_to_xml.py`; if the test fails here, it's not your fault, unless you modified the grammar

```
echo "# convert src/f00.c to xml/f00.xml"
../parser/minc_to_xml.py src/f00.c > xml/f00.xml
```

* next it calls your compiler to convert the XML file into asm (the command line depends on the language you chose)

```
echo "# compile xml/f00.xml to asm with your minC compiler"
../ml/minc/_build/default/bin/main.exe xml/f00.xml asm/f00.s
```

if this fails, examine the original C source file (`f00.c` in the case above) and XML source file to examine what kind of source code makes it fail

* then it calls the gcc to compile the assembly you generated into an executable

```
echo "# generate the executable that calls f with your minC compiler"
gcc -o minc/f00.exe -DTEST_NO=0 main.c asm/f00.s -O0 -g
```

if you generate a syntactically invalid assembly code, gcc will complain here.  read the gcc error message, examine the assembly you generated (`asm/f00.s` in the case above) and understand why it failed

* then it calls executables generated by gcc as well as your minC compiler

```
echo "# run the executable generated by gcc"
./gcc/f00.exe | tee out/f00.gcc
echo "# run the executable generated by your minC compiler"
./minc/f00.exe | tee out/f00.minc
```
the gcc-generated executable is unlikely to fail.  if your compiler-generated executable fails, you might be able to see the reason just by looking at the assembly code you generated; otherwise, run the executable with a debugger. GDB, for example, can step-execute an assembly program and allow you to examine the value of registers.

* it finally compares the output of the two executables

```
echo "# take the diff of the two"
diff out/f00.gcc out/f00.minc
```

if it fails, the debugging strategy is the same as above; you might be able to see the reason just by looking at the assembly code you generated; otherwise, run the executable with a debugger. GDB, for example, can step-execute an assembly program and allow you to examine the value of registers.

## 6-1. How to add new test programs
* if you work on any of the extension, you are likely to add test programs too
* src/src/fun.c contains all the C functions to test, each one being guarded by
```
#if TEST_NO == ???

#endif
```
add your test case at the end of the file, using the next number as your TEST_NO
* modify the following line in `test/src/src/Makefile`
```
tests := $(shell seq -f "%02.0f" 0 74)
```
to reflect the tests you added.  for example, if you add two test cases (TEST_NO = 75 and 76), you should change it to 
```
tests := $(shell seq -f "%02.0f" 0 76)
```
and run make in `test/src/src`

In [None]:
cd ~/notebooks/pl08_minc/test/src/src
make

* do a similar thing for `test/main.c`; as long as all functions take only `long` arguments and return a `long` value, you can use the same `main` function. in that case you just change
```
#if 0 <= TEST_NO && TEST_NO <= 74
...
#endif
```
to, say, 
```
#if 0 <= TEST_NO && TEST_NO <= 76
...
#endif
```

* if you support types other than `long`, you are likely to add a different main function. modify `main.c` accordingly



# 7. Format of the report and how to submit your work
## 7-1. Baseline requirements (Level 1)
* implement the compiler for minC (you will mainly write `minc_cogen.{go,jl,ml,rs}`)
* your code generator must be heavily commented (explain how it works, as if you are writing a report, except you write it in the source code)
* pass all the tests, that is, the following command executes without an error

In [None]:
BEGIN SOLUTION
END SOLUTION
cd ~/notebooks/pl08_minc/test
make -B

* by default, the entire test stops as soon as any test fails. you may want to try `-k` option for make, to skip files that fails and go ahead to others

In [None]:
BEGIN SOLUTION
END SOLUTION
cd ~/notebooks/pl08_minc/test
make -B -k

* write the summary of tests that passed/failed, in the online Excel `teamXX/pl08_minc/results.xlsx`
 * P : passed
 * C : cannot assemble (your compiler fails to produce assembly)
 * I : invalid assembly (your compiler produced an assembly code but it does not compile with gcc)
 * R : runtime error (your compiler produced an executable, but it does not terminate successfully)
 * W : wrong result (your compiler produced an executable that terminates successfully, but the output result is different from that of gcc-generated executable)

* members of each team must get togeter at least once to show their test results each other
* you are encouraged to get together with team members working on this option in other occassions to discuss progress and help each other (team members working on other options are not requierd to participate)


* <font color=red>before submitting your work, make sure you clean up your working directory,</font>  in the manner described in [Writing standalone programs using libraries](https://taura.github.io/programming-languages/html/errata/pl04_standalone.sos.html)

* Go

In [None]:
cd ~/notebooks/pl08_minc/go/minc
go clean

* OCaml
  * ignore `Error: rmdir(_build): Directory not empty` if you see it (I don't know why it happens)
  

In [None]:
cd ~/notebooks/pl08_minc/ml/minc
dune clean

* Rust

In [None]:
cd ~/notebooks/pl08_minc/rs/minc
cargo clean

* <font color="red">also make sure you execute `make clean` in the test directory</font>

In [None]:
cd ~/notebooks/pl08_minc/test
make clean

* submit your work through Jupyter (modify and add files in place under `~/notebooks/pl08_minc`); make sure you execute the above cell (`make -B -k`)
* submit "Term Report Option A (pl08_minc; build a compiler)" through UTOL
* all the essential work is submitted through Juptyer and Excel; you only have to submit a brief report to UTOL

## 7-2. Extra points [Level 2+]
* you get extra points by doing more than required above

* you can extend the minC language in any ways, but possible extensions include (but are not limited to):
  * syntactic extensions
    * [difficulty level 2] for loop
    * [difficulty level 2.5] initializing declaration (e.g., int x = y + 2)
  * type extensions and type checks
    * note that supporting a type other than long requires the compiler know the type of an expression; it's a heavy lifting
    * [difficulty level 3] pointers (long*, long**, etc.)
    * [difficulty level 4] floating point numbers (double)
    * [difficulty level 5] types of different sizes (int, float)
    * [difficulty level 6] structures and typedefs
  * optimization
    * use registers more aggressively
    * inline-expand function calls

* if you have done extra work beyond requirements, describe what you did in the extra.docx. clearly indicate the author of each section (who did what)