# An Optimizing Compiler

Compilers are programs that translate programs written in one language to the same program written in another, most often, lower level language.  We often refer to the languages as source and target language, respectively.  Optimizing compilers are multiphase language processors with explicit code optimization optimization phases in addition to the  code generation, syntax, and program analysis phases mentioned in Chapter 1. Here we discuss an optimizing compiler for our Cuppa1 language.  The source language for our compiler is the Cuppa1 language and the target language is our Exp1bytecode.  It is an optimizing compiler because the compiler performs constant folding and peephole optimizations in order to improve the generated code.


In [1]:
# let the notebook access the code folder
import sys
sys.path.insert(1,"code")

# The Basic Compiler

At a fundamental level compilers can be understood as processors that match AST patterns of the source language and translate them into patterns in the target language.  Recall the AST construction in the [Cuppa1 frontend](code/cuppa1_frontend_gram.py) and consider for example the AST pattern for the assignment statement in Cuppa1,
```
('assign', name, exp)
```
We could easily envision translating this AST pattern into a pattern in [Exp1bytecode](code/exp1bytecode_gram.py) as follows,
```
store <name> <exp>;
```
where `<name>` and `<exp>` are the appropriate translations of the variable name and the assignment expression from Cuppa1 into Exp1bytecode.  Here the target pattern is just a single instruction. However, the more complicated AST patterns in the source language the more complicated the target patterns as we will see when we look at programming construct that as the while loop.

In our case it is not that difficult to come up with pattern translations for all the non-structured statements and expressions in Cuppa1. For all the non-structured statements we have the pattern translations,
```
('assign', name, exp) => store <name> <exp>;
('put', exp) => print <exp>;
('get', name) => input <name>;
```
And for the expressions we have,
```
('+', c1, c2) => ('+' <c1> <c2>)
('-', c1, c2) => ('-' <c1> <c2>)
('*', c1, c2) => ('*' <c1> <c2>)
('/', c1, c2) => ('/' <c1> <c2>)
('==', c1, c2) => ('==' <c1> <c2>)
('<=', c1, c2) => ('<=' <c1> <c2>)
('id', name) => <name>
('integer', value) => <value>
('uminus', value) => - <value>
('not', value) => ! <value>
```
In order to come up with a pattern translation for structured statements such as the while statement we have to be clear what the behavior
of the while statement `while (cond) body` is:  Evaluate the condition `cond` and execute the `body` as long as the condition evaluates to a value not equal to zero.  If the condition ever evaluates to zero then ignore the body of the loop and continue execution right after the loop.  We can simulate this behavior in Exp1bytecode with jump instructions. One way to translate the AST pattern for the while loop into a code pattern in Exp1bytecode is,
```
('while', cond, body) => Ltop:
                             jumpF <cond> Lbottom;
                             <body>
                             jump Ltop; 
                         Lbottom:
                             noop;
```
Here `Ltop` and `Lbottom` are labels. We can easily verify that the Exp1bytecode pattern simulates exactly the behavior of the Cuppa1 while statement: If the condition evaluates to something other than zero then execute the body of the loop and then jump to the top of the loop in order to reevaluate the condition.  If the condition ever evaluates to zero then the `jumpF` instruction will transfer control to the `Lbottom` label. That is, if the condition evaluates to zero then we skip the loop body and continue executing the statement after the loop.  We have to have the `noop` instruction here because label definitions have to be attached to instructions.

We can do something similar with if-then statements,
```
('if', cond, then_stmt, ('nil',)) =>     jumpF <cond> Lbottom;
                                         <then_stmt>
                                     Lbottom:
                                         noop;
```
A closer look at the Exp1bytecode pattern shows that it similates the behavior of the Cuppa1 if-then statement: If the condition evaluates to a non-zero value then we execute the then-statement.  Otherwise, if the condition evaluates to zero we ignore the then-statement by jumping to the `Lbottom` label and continue execution after the if-then statement.  Finally, adding the else-statement to the if-then statement we have,
```
('if', cond, then_stmt, else_stmt) =>     jumpF <cond> Lelse;
                                          <then_stmt>
                                          jump Lbottom;
                                      Lelse: 
                                          <else_stmt>
                                      Lbottom:
                                         noop;
```
It is not difficult to see that the Exp1bytecode pattern similuates the behavior of the Cuppa1 if-then-else statement:  If the condition evaluates to a non-zero value then execute the then-statement and ignore the else-statement.  If the condition evaluates to zero ignore the then-statement and execute the else-statement.

One thing to keep in mind is the notion of *target pattern compositionality*.  By that we mean that any patterns generated from the same class of AST patterns should be able to be composed.  Consider for example the statements in Cuppa1.  Any one of the Exp1bytecode patterns due to statements in Cuppa1 should be able to be composed with any other Exp1bytecode pattern due to a statement without ever generating incorrect target code.  The same thing is true for Exp1bycode patterns generated from Cuppa1 expressions: Any Exp1bytecode pattern generated from a Cuppa1 expression should be composable with any other Exp1bytecode pattern due to a Cuppa1 expression and alway give rise to valid Exp1bytecode expressions.



## A Code Generation Tree Walker

Recall that the Cuppa1 frontend generates an AST for a source program and according to what we said above we need to find patterns in this AST that match the patterns in our translations rules in order to generate target code.  Consider the following Cuppa1 program for which the frontend generates a corresponding AST,

In [2]:
from cuppa1_lex import lexer
from cuppa1_frontend_gram import parser
from cuppa1_state import state
from grammar_stuff import dump_AST

In [3]:
program = \
'''
get x
x = x + 1
put x
'''
parser.parse(program, lexer=lexer)

In [4]:
dump_AST(state.AST)


(seq 
  |(get x) 
  |(seq 
  |  |(assign x 
  |  |  |(+ 
  |  |  |  |(id x) 
  |  |  |  |(integer 1))) 
  |  |(seq 
  |  |  |(put 
  |  |  |  |(id x)) 
  |  |  |(nil))))


It is easy to see that the left side patterns for our three pattern translation rules,
```
('get', name) => input <name>;
('assign', name, exp) => store <name> <exp>;
('put', exp) => print <exp>;
```
are present in the AST above.  In order to generate code we need to find those left side patterns in the AST and then apply the rules. Therefore,

> The code generator for our compiler is a tree walker that walks the Cuppa1 AST and for each AST pattern that appears in a pattern translation rule and matches the AST it will generate the corresponding target code.

There are two additional design choices that we make: 

>Cuppa1 statement patterns will generate Exp1bytecode instructions on a *list* and Cuppa1 expression patterns will generated Exp1bytecode expressions returned as *strings*.

The reason for this will become clear later when we look at optimizations in this compiler. Let's take a look at some of the actual pattern translations.  A good place to start is the get statement,

In [5]:
from cuppa1_cc_codegen import *

In [6]:
# %load -s get_stmt code/cuppa1_cc_codegen.py
def get_stmt(node):

    (GET, name) = node
    assert_match(GET, 'get')

    code = [('input', name)]

    return code


Since our code generator is a tree walker what we are looking at is the node function for get nodes in the AST.  The function first pattern matches the get node.  It then generates the target code pattern in for an instruction tuple in a list.  The instruction tuple we are generating is not unlike the instruction tuple we used to represent programs with in the Exp1bytecode abstract machine in Chapter 4.  The list with the single Exp1bytecode instruction is then returned in order to be combined with other lists of instructions.  Even though the translation rule for the get statement demands that we also generate the semicolon as part of the translation,
```
('get', name) => input <name>;
```
we delay this until we  generate the atual machine instructions.

In [7]:
# %load -s assign_stmt code/cuppa1_cc_codegen.py
def assign_stmt(node):

    (ASSIGN, name, exp) = node
    assert_match(ASSIGN, 'assign')
    
    exp_code = walk(exp)

    code = [('store', name, exp_code)]
    
    return code


The node function for assignment statements works in a very similar fashion.  It first pattern matches the node. Then it walks the expression AST which will return a string containing the Exp1bytecode for the Cuppa1 expression.  We now have all the pieces we need to assemble the target pattern according to our translation rule,
```
('assign', name, exp) => store <name> <exp>;
```
Here `<name>` is the name of the variable stored in variable `name` and `<exp>` is the Exp1bytecode expression string stored in the variable `exp_code` in the Python code above.  Now it is straight forward to create a tuple to represent the store instruction where the first component is the string `'store'`.  We put this tuple in a list which we return as the result of this translation. 

It is easy to see that the node function for the put statement,

In [8]:
# %load -s put_stmt code/cuppa1_cc_codegen.py
def put_stmt(node):

    (PUT, exp) = node
    assert_match(PUT, 'put')
    
    exp_code = walk(exp)

    code = [('print', exp_code)]

    return code


implements the translation rule,
```
('put', exp) => print <exp>;
```
The node function for the structured while statement looks like this,

In [9]:
# %load -s while_stmt code/cuppa1_cc_codegen.py
def while_stmt(node):
    
    (WHILE, cond, body) = node
    assert_match(WHILE, 'while')
    
    top_label = label()
    bottom_label = label()
    cond_code = walk(cond)
    body_code = walk(body)

    code = [(top_label + ':',)]
    code += [('jumpF', cond_code, bottom_label)]
    code += body_code
    code += [('jump', top_label)]
    code += [(bottom_label + ':',)]
    code += [('noop',)]

    return code


It is probably not that difficult to figure out that this node function implements the pattern translation rule for while loops,
```
('while', cond, body) => Ltop:
                             jumpF <cond> Lbottom;
                             <body>
                             jump Ltop; 
                         Lbottom:
                             noop;
```
Here the function `label()` generates a unique label name every time it is called.  A closer look at the [tree walker code](code/cuppa1_cc_codegen.py) will reveal that the node function for both variants of the if statement also generates the appropriate target code patterns.

What remains to be looked at is how the tree walker deals with `seq` nodes since they act as the glue between the statements in the AST we saw above.  Related to this is how the walker deals with `nil` nodes in a statement sequence since `seq` sequences are `nil` terminated.  Here are the two respective node functions,

In [10]:
# %load -s seq,nil code/cuppa1_cc_codegen.py
def seq(node):

    (SEQ, s1, s2) = node
    assert_match(SEQ, 'seq')
    
    stmt = walk(s1)
    lst = walk(s2)

    return stmt + lst

#########################################################################

def nil(node):
    
    (NIL,) = node
    assert_match(NIL, 'nil')
    
    return []
    
#########################################################################


After matching the node the `seq` node function first walks the statement which according to our design decision will return a list of Exp1bytecode instructions.  Then the node function will walk the rest of the program which will return a list of all the Exp1bytecode instructions the rest of the program generated.  The instructions of the current statement are then *prependended* to the list of instructions generated by the rest of the program giving us a list of instructions from the current statement all the way to the end of the program.  This list is returned as the result of the `seq` node function.

The `nil` node function simply returns an empty list.  Which is exactly what we want since `seq` sequences are `nil` terminated.

Consider our AST from above,

In [11]:
dump_AST(state.AST)


(seq 
  |(get x) 
  |(seq 
  |  |(assign x 
  |  |  |(+ 
  |  |  |  |(id x) 
  |  |  |  |(integer 1))) 
  |  |(seq 
  |  |  |(put 
  |  |  |  |(id x)) 
  |  |  |(nil))))


As the tree walker walks the tree from top to bottom it will generate the following list(s),
```
[('input', 'x')] + 
    [('store', 'x', '(+, x , 1)')] +
        [('print', 'x')] +
            []
```
which is the same as the concatenated list,
```
[('input', 'x'),
 ('store', 'x', '(+ x 1)'),
 ('print', 'x')]
```
That is, the result of the tree walk is a list of instructions that represent the original program in Exp1bytecode.

In order to understand the generation of the `assig` and `print` instruction fully we need to take a look at the translation of expressions.  Here are the main node functions, 

In [12]:
# %load -s binop_exp,id_exp,integer_exp code/cuppa1_cc_codegen.py
def binop_exp(node):

    (OP, c1, c2) = node
    if OP not in ['+', '-', '*', '/', '==', '<=']:
        raise ValueError("pattern match failed on " + OP)
    
    lcode = walk(c1)
    rcode = walk(c2)

    code = '(' + OP + ' ' + lcode + ' ' + rcode + ')'

    return code

#########################################################################

def id_exp(node):
    
    (ID, name) = node
    assert_match(ID, 'id')
    
    return name

#########################################################################

def integer_exp(node):

    (INTEGER, value) = node
    assert_match(INTEGER, 'integer')

    return str(value)


As we had said earlier, expressions generate strings.  We clearly see this when we look at the node function for binary operators `binop_exp`.  The function walks the two children which in turn generates an expression string for each child and then synthesizes a string the represents the current binary operation.  This string is the return value of this node function.  The node function of `id_exp`, that is variable names that appear in expressions, returns the name of the variable and the node function of `integer_exp` return the value of the integers as a string.  In this way we can explain the expression string `'(+ x 1)'` that appears in the `store` instruction in our generated list of instructions.

The code for the whole [tree walker code](code/cuppa1_cc_codegen.py).

<!-- TODO: movie for the codegen tree walker -->

## Formatting the Output

Our code generator generates a set of Exp1bytecode instruction tuples for a Cuppa1 program.  In order to generate code that can be executed by the Exp1bytecode abstract machine we need to turn this list of instruction tuples into actual Exp1bytecode instructions.  This is easily accomplished by traversing the list and printing out the tuples in a nice formatted way.  The following function takes a list of tuples and returns a string with the nicely formatted Exp1bytecode instructions,

In [13]:
# %load -s output,label_def code/cuppa1_cc_output.py
def output(instr_stream):

    output_stream = ''
    
    for instr in instr_stream:

        if label_def(instr):  # label def - print without preceeding '\t' or trailing ';'
            output_stream += instr[0] + '\n'

        else:                 # regular instruction - indent and put a ';' at the end
            output_stream += '\t'
                
            for component in instr:
                output_stream += component + ' '

            output_stream += ';\n'

    return output_stream

#########################################################################
def label_def(instr_tuple):

    instr_name = instr_tuple[0]
    
    if instr_name[-1] == ':':
        return True
    else:
        return False


The function iterates over the instruction tuples in the `instr_stream` and if the tuple is a label definition it will output it flush to the left margin, otherwise it will put a tab character in the output stream and then output the components of the current instruction tuple.  Consider our instruction list from before with a label to mark its start,

In [14]:
instr_stream = \
[('Lstart:',),
 ('input', 'x'),
 ('store', 'x', '(+ x 1)'),
 ('print', 'x')]

bytecode = output(instr_stream)

print(bytecode)

Lstart:
	input x ;
	store x (+ x 1) ;
	print x ;



This looks like legal Exp1bytecode, let's run it!

In [15]:
from exp1bytecode_interp import interp

In [16]:
interp(bytecode)

Please enter a value for x: 3
> 4


OK! Our output function generates legal code from a list of instruction tuples.  Let's put this all together into an actual compiler and then test it again.

## Architecture

Figure 1 shows the architecture of our basic Cuppa1 compiler.  We can identify the frontend that generates the AST, the tree walker that generates the instruction tuples, and the output function that converts the instruction tuples into legal Exp1bytecode.

![alt text](figures/chap06/1/figure/Slide1.jpg)
<p style="text-align: center;">
Fig. 1: Architecture of our basic Cuppa1 compiler.
</p>


### Examples

Let's put this compiler through its paces.  We do this by running the individual components separately so that we can see what's going on between each phase.  We start with the simple Cuppa1 program we looked at above.  But first we have load our prerequisite modules.

In [17]:
from cuppa1_lex import lexer
from cuppa1_frontend_gram import parser
from cuppa1_state import state
from grammar_stuff import dump_AST
from cuppa1_cc_codegen import walk as codegen
from cuppa1_cc_output import output
from pprint import pprint

In [18]:
program = \
'''
get x
x = x + 1
put x
'''

Running the frontend,

In [19]:
parser.parse(program, lexer=lexer)
dump_AST(state.AST)


(seq 
  |(get x) 
  |(seq 
  |  |(assign x 
  |  |  |(+ 
  |  |  |  |(id x) 
  |  |  |  |(integer 1))) 
  |  |(seq 
  |  |  |(put 
  |  |  |  |(id x)) 
  |  |  |(nil))))


Running the code generator,

In [20]:
instr_tuples = codegen(state.AST)
pprint(instr_tuples, width = 40)

[('input', 'x'),
 ('store', 'x', '(+ x 1)'),
 ('print', 'x')]


Running the output formatter,

In [21]:
bytecode = output(instr_tuples)
print(bytecode)

	input x ;
	store x (+ x 1) ;
	print x ;



Now, let's try something that has a structured programming statement in it.  How about the program that loops forever and does nothing?


In [22]:
program = 'while (1) {}'

Running the frontend.

In [23]:
parser.parse(program, lexer=lexer)
dump_AST(state.AST)


(seq 
  |(while 
  |  |(integer 1) 
  |  |(block 
  |  |  |(nil))) 
  |(nil))


Let's see what our code generator does with AST,

In [24]:
instr_tuples = codegen(state.AST)
pprint(instr_tuples, width = 40)

[('L0:',),
 ('jumpF', '1', 'L1'),
 ('jump', 'L0'),
 ('L1:',),
 ('noop',)]


It's probably easier to read when it is properly formatted as Exp1bytecode,

In [25]:
bytecode = output(instr_tuples)
print(bytecode)

L0:
	jumpF 1 L1 ;
	jump L0 ;
L1:
	noop ;



Here we see that the loop will only terminate if the expression 1 ever evaluates to 0 -- which of course will never happen -- so it loops forever: jumping to `L0`, testing whether the expression 1 evaluates to 0, jumping to `L0`, testing...

For our final example we'll look at something more complicated, the `fact` program from the Cuppa1 examples,

In [27]:
from cuppa1_examples import fact
print(fact)


get x;
y = 1;
while (1 <= x)
{
      y = y * x;
      x = x - 1;
}
put y;



Running the frontend,

In [28]:
parser.parse(fact, lexer=lexer)
dump_AST(state.AST)


(seq 
  |(get x) 
  |(seq 
  |  |(assign y 
  |  |  |(integer 1)) 
  |  |(seq 
  |  |  |(while 
  |  |  |  |(<= 
  |  |  |  |  |(integer 1) 
  |  |  |  |  |(id x)) 
  |  |  |  |(block 
  |  |  |  |  |(seq 
  |  |  |  |  |  |(assign y 
  |  |  |  |  |  |  |(* 
  |  |  |  |  |  |  |  |(id y) 
  |  |  |  |  |  |  |  |(id x))) 
  |  |  |  |  |  |(seq 
  |  |  |  |  |  |  |(assign x 
  |  |  |  |  |  |  |  |(- 
  |  |  |  |  |  |  |  |  |(id x) 
  |  |  |  |  |  |  |  |  |(integer 1))) 
  |  |  |  |  |  |  |(nil))))) 
  |  |  |(seq 
  |  |  |  |(put 
  |  |  |  |  |(id y)) 
  |  |  |  |(nil)))))


Time for the code generator,

In [29]:
instr_tuples = codegen(state.AST)
pprint(instr_tuples, width=40)

[('input', 'x'),
 ('store', 'y', '1'),
 ('L2:',),
 ('jumpF', '(<= 1 x)', 'L3'),
 ('store', 'y', '(* y x)'),
 ('store', 'x', '(- x 1)'),
 ('jump', 'L2'),
 ('L3:',),
 ('noop',),
 ('print', 'y')]


Formatting the Exp1bytecode,

In [30]:
bytecode = output(instr_tuples)
print(bytecode)

	input x ;
	store y 1 ;
L2:
	jumpF (<= 1 x) L3 ;
	store y (* y x) ;
	store x (- x 1) ;
	jump L2 ;
L3:
	noop ;
	print y ;



The generated code is starting to look interesting.  Here is the Cuppa1 source program again,

In [31]:
print(fact)


get x;
y = 1;
while (1 <= x)
{
      y = y * x;
      x = x - 1;
}
put y;



See if you can trace through the two programs simultaneously and make sense of the generated Exp1bytecode.

## Compiler Correctness

We now have two ways execute a Cuppa1 program:

1. We can interpret the program directly with the Cuppa1 interpreter from Chapter 5.
2. We can first translate the Cuppa1 program into Exp1bytecode and then execute the bytecode in the abstract bytecode machine.

Typically we view the interpretation of a programming language as the *reference implementation* for that programming language because interpreters are usually easier to construct. We then say that,

> A compiler is *correct* if the translated program, when executed, gives the same results as the interpreted program.

Visually we can show this relationship between compilers and interpreters as shown in Figure 2.

![alt text](figures/chap06/2/figure/Slide1.jpg)
<p style="text-align: center;">
Fig. 2: A visual representation of the compiler correctness problem.
</p>


Although the idea of compiler correctness is straight forward it is a very challenging problem.  Consider the fact that in order to show that a compiler is correct we will have to show that the relationship shown in Figure 2 holds for *all* Cuppa1 programs.  Now, since most programming languages allow you to write an infinite number of programs the brute force approach to compiler correctness does not work.  For most practical compilers we only use a sampling of all possible programs to show that a compiler is correct.

If we were to define the interpreters for both the Cuppa1 language and the Exp1bytecode in terms of mathematical constructs such as first order logic, then it would be possible to prove the correctness of the compiler.  However, the mathematical description of real programming languages such as Java or Python is almost impossible, so therefore we are back again to approximating compiler correctness using a *test suite* of source programs.

