# An Optimizing Compiler

Compilers are used to generated efficient code for target architectures. Optimizing compilers are multiphase language processors with explicit optimization and code generation phases besides the standard syntax and program analysis. Here we discuss an optimizing compiler for our Cuppa1 language.  The compiler performs constant folding and peephole optimizations and generates code for the Exp1bytecode abstract machine.


In [1]:
# let the notebook access the code folder
import sys
sys.path.insert(1,"code")

# The Basic Compiler

At a fundamental level compilers can be understood as processors that match AST patterns of the source language and translate them into patterns in the target language.  Recall the AST construction in the [Cuppa1 frontend](code/cuppa1_frontend_gram.py) and consider for example the AST pattern for the assignment statement in Cuppa1,
```
('assign', name, exp)
```
We could easily envision translating this AST pattern into a pattern in [Exp1bytecode](code/exp1bytecode_gram.py) as follows,
```
store <name> <exp>;
```
where `<name>` and `<exp>` are the appropriate translations of the variable name and the assignment expression from Cuppa1 into Exp1bytecode.  Here the target pattern is just a single instruction. However, the more complicated AST patterns in the source language the more complicated the target patterns.

Turns out that in our case it is not that difficult to come up with pattern translations for the rest of the non-structured statements and expressions in Cuppa1. So for all the non-structured statements we have the pattern translations,
```
('assign', name, exp) => store <name> <exp>;
('put', exp) => print <exp>;
('get', name) => input <name>;
```
And for the expressions we have,
```
('+', c1, c2) => ('+' <c1> <c2>)
('-', c1, c2) => ('-' <c1> <c2>)
('*', c1, c2) => ('*' <c1> <c2>)
('/', c1, c2) => ('/' <c1> <c2>)
('==', c1, c2) => ('==' <c1> <c2>)
('<=', c1, c2) => ('<=' <c1> <c2>)
('id', name) => <name>
('integer', value) => <value>
('uminus', value) => - <value>
('not', value) => ! <value>
```
In order to come up with a pattern translation for structured statements such as the while statement we have to be clear what the behavior
of the while statement `while (cond) body` is:  Evaluate the condition `cond` and execute the `body` as long as the condition evaluates to a value not equal to zero.  If the condition ever evaluates to zero then ignore the body of the loop and continue execution right after the loop.  We can simulate this behavior in Exp1bytecode with jump instructions. Here is one way to translate the AST pattern for the while loop into a code pattern in Exp1bytecode:
```
('while', cond, body) => Ltop:
                             jumpF <cond> Lbottom;
                             <body>
                             jump Ltop; 
                         Lbottom:
                             noop;
```
Here `Ltop` and `Lbottom` are labels. We can easily verify that the Exp1bytecode pattern simulates exactly the behavior of the Cuppa1 while statement: If the condition evaluates to something other than zero then execute the body of the loop and then jump to the top of the loop in order to reevaluate the condition.  If the condition ever evaluates to zero then the `jumpF` instruction will transfer control to the `Lbottom` label. That is, if the condition evaluates to zero then we skip the loop body and continue executing the statement after the loop.  We have to have the `noop` instruction here because label definitions have to be attached to instructions.

We can do something similar with if-then statements,
```
('if', cond, then_stmt, ('nil',)) =>     jumpF <cond> Lbottom;
                                         <then_stmt>
                                     Lbottom:
                                         noop;
```
A closer look at the Exp1bytecode pattern shows that it similates the behavior of the Cuppa1 if-then statement: If the condition evaluates to a non-zero value then we execute the then-statement.  Otherwise, if the condition evaluates to zero we ignore the then-statement by jumping to the `Lbottom` label and continue execution after the if-then statement.  Finally, adding the else-statement to the if-then statement we have,
```
('if', cond, then_stmt, else_stmt) =>     jumpF <cond> Lelse;
                                          <then_stmt>
                                          jump Lbottom;
                                      Lelse: 
                                          <else_stmt>
                                      Lbottom:
                                         noop;
```
It is not difficult to see that the Exp1bytecode pattern similuates the behavior of the Cuppa1 if-then-else statement:  If the condition evaluates to a non-zero value then execute the then-statement and ignore the else-statement.  If the condition evaluates to zero ignore the then-statement and execute the else-statement.

One thing to keep in mind is the notion of *target pattern compositionality*.  By that we mean that any patterns generated from the same class of AST patterns should be able to be composed.  Consider for example the statements in Cuppa1.  Any one of the Exp1bytecode patterns due to statements in Cuppa1 should be able to be composed with any other Exp1bytecode pattern due to a statement without ever generating incorrect target code.  The same thing is true for Exp1bycode patterns generated from Cuppa1 expressions: Any Exp1bytecode pattern generated from a Cuppa1 expression should be composable with any other Exp1bytecode pattern due to a Cuppa1 expression and alway give rise to valid Exp1bytecode expressions.



## A Code Generation Tree Walker

We will implement our basic compiler as a tree walker that walks the Cuppa1 AST and for each of the AST patterns above will generate the corresponding target code pattern.  There are two design choices that we make: 

>Cuppa1 statement patterns will generate Exp1bytecode instructions on a list and Cuppa1 expression patterns will generated Exp1bytecode expressions returned as strings.

The reason for this will become clear later when we look at optimizations in this compiler. Let's take a look at some of the actual pattern translations.  A good place to start is the assignment statement,

In [3]:
from cuppa1_cc_codegen import *

In [4]:
# %load -s assign_stmt code/cuppa1_cc_codegen.py
def assign_stmt(node):

    (ASSIGN, name, exp) = node
    assert_match(ASSIGN, 'assign')
    
    exp_code = walk(exp)

    code = [('store', name, exp_code)]
    
    return code


Since our code generator is a tree walker what we are looking at is the node function for assignment nodes in the AST.  The function first pattern matches the assignment node.  Then it walks the expression AST which will return a string containing the Exp1bytecode for the Cuppa1 expression.  We now have all the pieces we need to assemble the target patterns according to our translation rule,
```
('assign', name, exp) => store <name> <exp>;
```
Here `<name>` is the name of the variable stored in variable `name` and `<exp>` is the Exp1bytecode string stored in the variable `exp_code` in the Python code above.  Now it is straight forward to store the instruction as a `'store'` tuple in a list which we return as the result of this translation.  Note that we do not generate the semicolon yet.  We will do that when we use the list of instruction tuples and generate the actual output.

It is easy to see that the node function for the put statement,

In [None]:
# %load -s put_stmt code/cuppa1_cc_codegen.py
def put_stmt(node):

    (PUT, exp) = node
    assert_match(PUT, 'put')
    
    exp_code = walk(exp)

    code = [('print', exp_code)]

    return code


implements the translation rule,
```
('put', exp) => print <exp>;
```
The node function for the structured while statement looks like this,

In [None]:
# %load -s while_stmt code/cuppa1_cc_codegen.py
def while_stmt(node):
    
    (WHILE, cond, body) = node
    assert_match(WHILE, 'while')
    
    top_label = label()
    bottom_label = label()
    cond_code = walk(cond)
    body_code = walk(body)

    code = [(top_label + ':',)]
    code += [('jumpF', cond_code, bottom_label)]
    code += body_code
    code += [('jump', top_label)]
    code += [(bottom_label + ':',)]
    code += [('noop',)]

    return code
