# An Optimizing Compiler

Compilers are programs that translate programs written in one language to the same program written in another, most often, lower level language.  We often refer to the languages as source and target language, respectively.  Optimizing compilers are multiphase language processors with explicit code optimization optimization phases in addition to the  code generation, syntax, and program analysis phases mentioned in Chapter 1. Here we discuss an optimizing compiler for our Cuppa1 language.  The source language for our compiler is the Cuppa1 language and the target language is our Exp1bytecode.  It is an optimizing compiler because the compiler performs constant folding and peephole optimizations in order to improve the generated code.


In [1]:
# let the notebook access the code folder
import sys
sys.path.insert(1,"code")

# The Basic Compiler

At a fundamental level compilers can be understood as processors that match AST patterns of the source language and translate them into patterns in the target language.  Recall the AST construction in the [Cuppa1 frontend](code/cuppa1_frontend_gram.py) and consider for example the AST pattern for the assignment statement in Cuppa1,
```
('assign', name, exp)
```
We could easily envision translating this AST pattern into a pattern in [Exp1bytecode](code/exp1bytecode_gram.py) as follows,
```
store <name> <exp>;
```
where `<name>` and `<exp>` are the appropriate translations of the variable name and the assignment expression from Cuppa1 into Exp1bytecode.  Here the target pattern is just a single instruction. However, the more complicated AST patterns in the source language the more complicated the target patterns as we will see when we look at programming construct that as the while loop.

In our case it is not that difficult to come up with pattern translations for all the non-structured statements and expressions in Cuppa1. For all the non-structured statements we have the pattern translations,
```
('assign', name, exp) => store <name> <exp>;
('put', exp) => print <exp>;
('get', name) => input <name>;
```
And for the expressions we have,
```
('+', c1, c2) => ('+' <c1> <c2>)
('-', c1, c2) => ('-' <c1> <c2>)
('*', c1, c2) => ('*' <c1> <c2>)
('/', c1, c2) => ('/' <c1> <c2>)
('==', c1, c2) => ('==' <c1> <c2>)
('<=', c1, c2) => ('<=' <c1> <c2>)
('id', name) => <name>
('integer', value) => <value>
('uminus', value) => - <value>
('not', value) => ! <value>
```
In order to come up with a pattern translation for structured statements such as the while statement we have to be clear what the behavior
of the while statement `while (cond) body` is:  Evaluate the condition `cond` and execute the `body` as long as the condition evaluates to a value not equal to zero.  If the condition ever evaluates to zero then ignore the body of the loop and continue execution right after the loop.  We can simulate this behavior in Exp1bytecode with jump instructions. One way to translate the AST pattern for the while loop into a code pattern in Exp1bytecode is,
```
('while', cond, body) => Ltop:
                             jumpF <cond> Lbottom;
                             <body>
                             jump Ltop; 
                         Lbottom:
                             noop;
```
Here `Ltop` and `Lbottom` are labels. We can easily verify that the Exp1bytecode pattern simulates exactly the behavior of the Cuppa1 while statement: If the condition evaluates to something other than zero then execute the body of the loop and then jump to the top of the loop in order to reevaluate the condition.  If the condition ever evaluates to zero then the `jumpF` instruction will transfer control to the `Lbottom` label. That is, if the condition evaluates to zero then we skip the loop body and continue executing the statement after the loop.  We have to have the `noop` instruction here because label definitions have to be attached to instructions.

We can do something similar with if-then statements,
```
('if', cond, then_stmt, ('nil',)) =>     jumpF <cond> Lbottom;
                                         <then_stmt>
                                     Lbottom:
                                         noop;
```
A closer look at the Exp1bytecode pattern shows that it similates the behavior of the Cuppa1 if-then statement: If the condition evaluates to a non-zero value then we execute the then-statement.  Otherwise, if the condition evaluates to zero we ignore the then-statement by jumping to the `Lbottom` label and continue execution after the if-then statement.  Finally, adding the else-statement to the if-then statement we have,
```
('if', cond, then_stmt, else_stmt) =>     jumpF <cond> Lelse;
                                          <then_stmt>
                                          jump Lbottom;
                                      Lelse: 
                                          <else_stmt>
                                      Lbottom:
                                         noop;
```
It is not difficult to see that the Exp1bytecode pattern similuates the behavior of the Cuppa1 if-then-else statement:  If the condition evaluates to a non-zero value then execute the then-statement and ignore the else-statement.  If the condition evaluates to zero ignore the then-statement and execute the else-statement.

One thing to keep in mind is the notion of *target pattern compositionality*.  By that we mean that any patterns generated from the same class of AST patterns should be able to be composed.  Consider for example the statements in Cuppa1.  Any one of the Exp1bytecode patterns due to statements in Cuppa1 should be able to be composed with any other Exp1bytecode pattern due to a statement without ever generating incorrect target code.  The same thing is true for Exp1bycode patterns generated from Cuppa1 expressions: Any Exp1bytecode pattern generated from a Cuppa1 expression should be composable with any other Exp1bytecode pattern due to a Cuppa1 expression and alway give rise to valid Exp1bytecode expressions.



## A Code Generation Tree Walker

Recall that the Cuppa1 frontend generates an AST for a source program and according to what we said above we need to find patterns in this AST that match the patterns in our translations rules in order to generate target code.  Consider the following Cuppa1 program for which the frontend generates a corresponding AST,

In [2]:
from cuppa1_lex import lexer
from cuppa1_frontend_gram import parser
from cuppa1_state import state
from grammar_stuff import dump_AST

In [3]:
program = \
'''
get x
x = x + 1
put x
'''
parser.parse(program, lexer=lexer)

In [4]:
dump_AST(state.AST)


(seq 
  |(get x) 
  |(seq 
  |  |(assign x 
  |  |  |(+ 
  |  |  |  |(id x) 
  |  |  |  |(integer 1))) 
  |  |(seq 
  |  |  |(put 
  |  |  |  |(id x)) 
  |  |  |(nil))))


It is easy to see that the left side patterns for our three pattern translation rules,
```
('get', name) => input <name>;
('assign', name, exp) => store <name> <exp>;
('put', exp) => print <exp>;
```
are present in the AST above.  In order to generate code we need to find those left side patterns in the AST and then apply the rules. Therefore,

> The code generator for our compiler is a tree walker that walks the Cuppa1 AST and for each AST pattern that appears in a pattern translation rule and matches the AST it will generate the corresponding target code.

There are two additional design choices that we make: 

>Cuppa1 statement patterns will generate Exp1bytecode instructions on a *list* and Cuppa1 expression patterns will generated Exp1bytecode expressions returned as *strings*.

The reason for this will become clear later when we look at optimizations in this compiler. Let's take a look at some of the actual pattern translations.  A good place to start is the get statement,

In [5]:
from cuppa1_cc_codegen import *

In [6]:
# %load -s get_stmt code/cuppa1_cc_codegen.py
def get_stmt(node):

    (GET, name) = node
    assert_match(GET, 'get')

    code = [('input', name)]

    return code


Since our code generator is a tree walker what we are looking at is the node function for get nodes in the AST.  The function first pattern matches the get node.  It then generates the target code pattern in for an instruction tuple in a list.  The instruction tuple we are generating is not unlike the instruction tuple we used to represent programs with in the Exp1bytecode abstract machine in Chapter 4.  The list with the single Exp1bytecode instruction is then returned in order to be combined with other lists of instructions.  Even though the translation rule for the get statement demands that we also generate the semicolon as part of the translation,
```
('get', name) => input <name>;
```
we delay this until we  generate the atual machine instructions.

In [7]:
# %load -s assign_stmt code/cuppa1_cc_codegen.py
def assign_stmt(node):

    (ASSIGN, name, exp) = node
    assert_match(ASSIGN, 'assign')
    
    exp_code = walk(exp)

    code = [('store', name, exp_code)]
    
    return code


The node function for assignment statements works in a very similar fashion.  It first pattern matches the node. Then it walks the expression AST which will return a string containing the Exp1bytecode for the Cuppa1 expression.  We now have all the pieces we need to assemble the target pattern according to our translation rule,
```
('assign', name, exp) => store <name> <exp>;
```
Here `<name>` is the name of the variable stored in variable `name` and `<exp>` is the Exp1bytecode expression string stored in the variable `exp_code` in the Python code above.  Now it is straight forward to create a tuple to represent the store instruction where the first component is the string `'store'`.  We put this tuple in a list which we return as the result of this translation. 

It is easy to see that the node function for the put statement,

In [8]:
# %load -s put_stmt code/cuppa1_cc_codegen.py
def put_stmt(node):

    (PUT, exp) = node
    assert_match(PUT, 'put')
    
    exp_code = walk(exp)

    code = [('print', exp_code)]

    return code


implements the translation rule,
```
('put', exp) => print <exp>;
```
The node function for the structured while statement looks like this,

In [9]:
# %load -s while_stmt code/cuppa1_cc_codegen.py
def while_stmt(node):
    
    (WHILE, cond, body) = node
    assert_match(WHILE, 'while')
    
    top_label = label()
    bottom_label = label()
    cond_code = walk(cond)
    body_code = walk(body)

    code = [(top_label + ':',)]
    code += [('jumpF', cond_code, bottom_label)]
    code += body_code
    code += [('jump', top_label)]
    code += [(bottom_label + ':',)]
    code += [('noop',)]

    return code


It is probably not that difficult to figure out that this node function implements the pattern translation rule for while loops,
```
('while', cond, body) => Ltop:
                             jumpF <cond> Lbottom;
                             <body>
                             jump Ltop; 
                         Lbottom:
                             noop;
```
Here the function `label()` generates a unique label name every time it is called.  A closer look at the [tree walker code](code/cuppa1_cc_codegen.py) will reveal that the node function for both variants of the if statement also generates the appropriate target code patterns.

What remains to be looked at is how the tree walker deals with `seq` nodes since they act as the glue between the statements in the AST we saw above.  Related to this is how the walker deals with `nil` nodes in a statement sequence since `seq` sequences are `nil` terminated.  Here are the two respective node functions,

In [10]:
# %load -s seq,nil code/cuppa1_cc_codegen.py
def seq(node):

    (SEQ, s1, s2) = node
    assert_match(SEQ, 'seq')
    
    stmt = walk(s1)
    lst = walk(s2)

    return stmt + lst

#########################################################################

def nil(node):
    
    (NIL,) = node
    assert_match(NIL, 'nil')
    
    return []
    
#########################################################################


After matching the node the `seq` node function first walks the statement which according to our design decision will return a list of Exp1bytecode instructions.  Then the node function will walk the rest of the program which will return a list of all the Exp1bytecode instructions the rest of the program generated.  The instructions of the current statement are then *prependended* to the list of instructions generated by the rest of the program giving us a list of instructions from the current statement all the way to the end of the program.  This list is returned as the result of the `seq` node function.

The `nil` node function simply returns an empty list.  Which is exactly what we want since `seq` sequences are `nil` terminated.

Consider our AST from above,

In [11]:
dump_AST(state.AST)


(seq 
  |(get x) 
  |(seq 
  |  |(assign x 
  |  |  |(+ 
  |  |  |  |(id x) 
  |  |  |  |(integer 1))) 
  |  |(seq 
  |  |  |(put 
  |  |  |  |(id x)) 
  |  |  |(nil))))


As the tree walker walks the tree from top to bottom it will generate the following list(s),
```
[('input', 'x')] + 
    [('store', 'x', '(+, x , 1)')] +
        [('print', 'x')] +
            []
```
which is the same as the concatenated list,
```
[('input', 'x'),
 ('store', 'x', '(+ x 1)'),
 ('print', 'x')]
```
That is, the result of the tree walk is a list of instructions that represent the original program in Exp1bytecode.

In order to understand the generation of the `assig` and `print` instruction fully we need to take a look at the translation of expressions.  Here are the main node functions, 

In [12]:
# %load -s binop_exp,id_exp,integer_exp code/cuppa1_cc_codegen.py
def binop_exp(node):

    (OP, c1, c2) = node
    if OP not in ['+', '-', '*', '/', '==', '<=']:
        raise ValueError("pattern match failed on " + OP)
    
    lcode = walk(c1)
    rcode = walk(c2)

    code = '(' + OP + ' ' + lcode + ' ' + rcode + ')'

    return code

#########################################################################

def id_exp(node):
    
    (ID, name) = node
    assert_match(ID, 'id')
    
    return name

#########################################################################

def integer_exp(node):

    (INTEGER, value) = node
    assert_match(INTEGER, 'integer')

    return str(value)


As we had said earlier, expressions generate strings.  We clearly see this when we look at the node function for binary operators `binop_exp`.  The function walks the two children which in turn generates an expression string for each child and then synthesizes a string the represents the current binary operation.  This string is the return value of this node function.  The node function of `id_exp`, that is variable names that appear in expressions, returns the name of the variable and the node function of `integer_exp` return the value of the integers as a string.  In this way we can explain the expression string `'(+ x 1)'` that appears in the `store` instruction in our generated list of instructions.

The code for the whole [tree walker code](code/cuppa1_cc_codegen.py).

<!-- TODO: movie for the codegen tree walker -->

## Formatting the Output

Our code generator generates a set of Exp1bytecode instruction tuples for a Cuppa1 program.  In order to generate code that can be executed by the Exp1bytecode abstract machine we need to turn this list of instruction tuples into actual Exp1bytecode instructions.  This is easily accomplished by traversing the list and printing out the tuples in a nice formatted way.  The following function takes a list of tuples and returns a string with the nicely formatted Exp1bytecode instructions,

In [13]:
# %load -s output,label_def code/cuppa1_cc_output.py
def output(instr_stream):

    output_stream = ''
    
    for instr in instr_stream:

        if label_def(instr):  # label def - print without preceeding '\t' or trailing ';'
            output_stream += instr[0] + '\n'

        else:                 # regular instruction - indent and put a ';' at the end
            output_stream += '\t'
                
            for component in instr:
                output_stream += component + ' '

            output_stream += ';\n'

    return output_stream

#########################################################################
def label_def(instr_tuple):

    instr_name = instr_tuple[0]
    
    if instr_name[-1] == ':':
        return True
    else:
        return False


The function iterates over the instruction tuples in the `instr_stream` and if the tuple is a label definition it will output it flush to the left margin, otherwise it will put a tab character in the output stream and then output the components of the current instruction tuple.  Consider our instruction list from before with a label to mark its start,

In [14]:
instr_stream = \
[('Lstart:',),
 ('input', 'x'),
 ('store', 'x', '(+ x 1)'),
 ('print', 'x')]

bytecode = output(instr_stream)

print(bytecode)

Lstart:
	input x ;
	store x (+ x 1) ;
	print x ;



This looks like legal Exp1bytecode, let's run it!

In [15]:
from exp1bytecode_interp import interp

In [16]:
interp(bytecode)

Please enter a value for x: 3
> 4


OK! Our output function generates legal code from a list of instruction tuples.  Let's put this all together into an actual compiler and then test it again.

## Architecture

Figure 1 shows the architecture of our basic Cuppa1 compiler.  We can identify the frontend that generates the AST, the tree walker that generates the instruction tuples, and the output function that converts the instruction tuples into legal Exp1bytecode.

![alt text](figures/chap06/1/figure/Slide1.jpg)
<p style="text-align: center;">
Fig. 1: Architecture of our basic Cuppa1 compiler.
</p>


### Examples

Let's put this compiler through its paces.  We do this by running the individual components separately so that we can see what's going on between each phase.  We start with the simple Cuppa1 program we looked at above.  But first we have load our prerequisite modules.

In [17]:
from cuppa1_lex import lexer
from cuppa1_frontend_gram import parser
from cuppa1_state import state
from grammar_stuff import dump_AST
from cuppa1_cc_codegen import walk as codegen
from cuppa1_cc_output import output
from pprint import pprint

In [18]:
program = \
'''
get x
x = x + 1
put x
'''

Running the frontend,

In [19]:
parser.parse(program, lexer=lexer)
dump_AST(state.AST)


(seq 
  |(get x) 
  |(seq 
  |  |(assign x 
  |  |  |(+ 
  |  |  |  |(id x) 
  |  |  |  |(integer 1))) 
  |  |(seq 
  |  |  |(put 
  |  |  |  |(id x)) 
  |  |  |(nil))))


Running the code generator,

In [20]:
instr_tuples = codegen(state.AST)
pprint(instr_tuples, width = 40)

[('input', 'x'),
 ('store', 'x', '(+ x 1)'),
 ('print', 'x')]


Running the output formatter,

In [21]:
bytecode = output(instr_tuples)
print(bytecode)

	input x ;
	store x (+ x 1) ;
	print x ;



Now, let's try something that has a structured programming statement in it.  How about the program that loops forever and does nothing?


In [22]:
program = 'while (1) {}'

Running the frontend.

In [23]:
parser.parse(program, lexer=lexer)
dump_AST(state.AST)


(seq 
  |(while 
  |  |(integer 1) 
  |  |(block 
  |  |  |(nil))) 
  |(nil))


Let's see what our code generator does with AST,

In [24]:
instr_tuples = codegen(state.AST)
pprint(instr_tuples, width = 40)

[('L0:',),
 ('jumpF', '1', 'L1'),
 ('jump', 'L0'),
 ('L1:',),
 ('noop',)]


It's probably easier to read when it is properly formatted as Exp1bytecode,

In [25]:
bytecode = output(instr_tuples)
print(bytecode)

L0:
	jumpF 1 L1 ;
	jump L0 ;
L1:
	noop ;



Here we see that the loop will only terminate if the expression 1 ever evaluates to 0 -- which of course will never happen -- so it loops forever: jumping to `L0`, testing whether the expression 1 evaluates to 0, jumping to `L0`, testing...

For our final example we'll look at something more complicated, the `fact` program from the Cuppa1 examples,

In [26]:
from cuppa1_examples import fact
print(fact)


get x;
y = 1;
while (1 <= x)
{
      y = y * x;
      x = x - 1;
}
put y;



Running the frontend,

In [27]:
parser.parse(fact, lexer=lexer)
dump_AST(state.AST)


(seq 
  |(get x) 
  |(seq 
  |  |(assign y 
  |  |  |(integer 1)) 
  |  |(seq 
  |  |  |(while 
  |  |  |  |(<= 
  |  |  |  |  |(integer 1) 
  |  |  |  |  |(id x)) 
  |  |  |  |(block 
  |  |  |  |  |(seq 
  |  |  |  |  |  |(assign y 
  |  |  |  |  |  |  |(* 
  |  |  |  |  |  |  |  |(id y) 
  |  |  |  |  |  |  |  |(id x))) 
  |  |  |  |  |  |(seq 
  |  |  |  |  |  |  |(assign x 
  |  |  |  |  |  |  |  |(- 
  |  |  |  |  |  |  |  |  |(id x) 
  |  |  |  |  |  |  |  |  |(integer 1))) 
  |  |  |  |  |  |  |(nil))))) 
  |  |  |(seq 
  |  |  |  |(put 
  |  |  |  |  |(id y)) 
  |  |  |  |(nil)))))


Time for the code generator,

In [28]:
instr_tuples = codegen(state.AST)
pprint(instr_tuples, width=40)

[('input', 'x'),
 ('store', 'y', '1'),
 ('L2:',),
 ('jumpF', '(<= 1 x)', 'L3'),
 ('store', 'y', '(* y x)'),
 ('store', 'x', '(- x 1)'),
 ('jump', 'L2'),
 ('L3:',),
 ('noop',),
 ('print', 'y')]


Formatting the Exp1bytecode,

In [29]:
bytecode = output(instr_tuples)
print(bytecode)

	input x ;
	store y 1 ;
L2:
	jumpF (<= 1 x) L3 ;
	store y (* y x) ;
	store x (- x 1) ;
	jump L2 ;
L3:
	noop ;
	print y ;



The generated code is starting to look interesting.  Here is the Cuppa1 source program again,

In [30]:
print(fact)


get x;
y = 1;
while (1 <= x)
{
      y = y * x;
      x = x - 1;
}
put y;



See if you can trace through the two programs simultaneously and make sense of the generated Exp1bytecode.

## Compiler Correctness

We now have two ways to execute a Cuppa1 program:

1. We can interpret the program directly with the Cuppa1 interpreter from Chapter 5.
2. We can first translate the Cuppa1 program into Exp1bytecode and then execute the bytecode in the abstract bytecode machine.

Typically we view the interpretation of a programming language as the *reference implementation* for that programming language because interpreters are usually easier to construct. We then say that,

> A compiler is *correct* if the translated program, when executed, gives the same results as the interpreted program.

Visually we can show this relationship between our Cuppa1 interpreter and our Cuppa1 compiler as in Figure 2.

![alt text](figures/chap06/2/figure/Slide1.jpg)
<p style="text-align: center;">
Fig. 2: A visual representation of the compiler correctness problem.
</p>


Although the idea of compiler correctness is straight forward it is a very challenging problem.  Consider the fact that in order to show that a compiler is correct we will have to show that the relationship shown in Figure 2 holds for *all* Cuppa1 programs.  Now, since most programming languages allow you to write an infinite number of programs the brute force approach to compiler correctness does not work.  For most practical compilers we only use a sampling of all possible programs to show that a compiler is correct.

If we were to define the interpreters for both the Cuppa1 language and the Exp1bytecode in terms of mathematical constructs such as first order logic, then it would be possible to prove the correctness of the compiler.  However, the mathematical description of real programming languages such as Java or Python is almost impossible, so therefore we are back to approximating compiler correctness using a *test suite* of source programs.

It's an interesting question to see how our Cuppa1 compiler fares in terms of correctness.  In order to take a look at that we define a function `cc` (for Cuppa1 Compiler) that embodies the compiler architecture from above. We will also load the Cuppa1 interpreter and the Exp1bytecode abstract machine.



In [31]:
from cuppa1_lex import lexer
from cuppa1_frontend_gram import parser
from cuppa1_state import state
from cuppa1_cc_codegen import walk as codegen
from cuppa1_cc_output import output

def cc(instr_stream):
    parser.parse(instr_stream, lexer=lexer)
    instr_tuples = codegen(state.AST)  + [('stop',)]
    bytecode = output(instr_tuples)
    return bytecode

from cuppa1_interp import interp as cuppa1_interp
from exp1bytecode_interp import interp as bytecode_interp

Our test suite is composed of the example programs in the file [`cuppa1_examples.py`](code/cuppa1_examples.py),

In [32]:
from cuppa1_examples import fact, list, ifex, nested, logical_and, logical_or

We will begin with the `fact` program that will print out the factorial of the number given by the user.  First we will run the interpreter on the program and then we will translate it into Exp1bytecode and execute the bytecode. According to what we said above, in both cases we should see the same result.

In [33]:
cuppa1_interp(fact)

Value for x? 3
> 6


In [34]:
bytecode = cc(fact)
bytecode_interp(bytecode)

Please enter a value for x: 3
> 6


That is encouraging, in both cases we got the same result as is required for compiler correctness.  Let's try the program `list`,

In [35]:
cuppa1_interp(list)

Value for x? 5
> 5
> 4
> 3
> 2
> 1


In [36]:
bytecode = cc(list)
bytecode_interp(bytecode)

Please enter a value for x: 5
> 5
> 4
> 3
> 2
> 1


Next up is the `ifex` program,

In [37]:
cuppa1_interp(ifex)

Value for x? 1
> 0


In [38]:
bytecode = cc(ifex)
bytecode_interp(bytecode)

Please enter a value for x: 1
> 0


The `nested` program is next,

In [39]:
cuppa1_interp(nested)

> 2


In [40]:
bytecode = cc(nested)
bytecode_interp(bytecode)

> 2


And finally we'll test the two programs `logical_and` and `logical_or`,

In [41]:
cuppa1_interp(logical_and)

> 0
> 0
> 0
> 1
> 0
> 0
> 0
> 1
> 0
> 1
> 1
> 1


In [42]:
bytecode = cc(logical_and)
bytecode_interp(bytecode)

> 0
> 0
> 0
> 1
> 0
> 0
> 0
> 1
> 0
> 1
> 1
> 1


In [43]:
cuppa1_interp(logical_or)

> 0
> 0
> 0
> 1
> 0
> 1
> 0
> 1
> 1
> 1
> 1
> 1


In [44]:
bytecode = cc(logical_or)
bytecode_interp(bytecode)

> 0
> 0
> 0
> 1
> 0
> 1
> 0
> 1
> 1
> 1
> 1
> 1


Our compiler produced correct code for all the programs in our test suite and if you look carefully at the programs you will notice that together they cover all the major features of the Cuppa1 programming language.  That is, our test suite has *good coverage*.  Even though we were not able to test every possible program in the Cuppa1 language the fact that our compiler produced correct results for every program in our test suite increases our confidence that the compiler is correct.

# Optimization

The big difference between interpereters and compilers is that compilers have the ability to think about how to translate a source program into target code in the most effective way.  Usually that means trying to translate the program in such a way that it executes as fast as possible on the target machine.  Interpreters on the other hand are typically machine-agnostic in the sense that they do not try to optimize the program in any way but just execute it as we saw in Chapter 5 when we constructed the Cuppa1 interpreter.  Here we take a look at two optimizations our Cuppa1 compiler performs in order to improve the generated code.

## Constant Folding

Constant folding is an optimization that tries to find arithmetic operations in the source program that can be performed at *compile time* rather than runtime.  Consider the assignment statement,
```
x = 10 + 5
```
There is not reason to generate code to perform the addition at runtime since both operands of the addition are constants.  Here the compiler can replace the addition with the constant `15`,
```
x = 15
```
without changing the meaning of the program.  Constant folding is considered an optimization because it eliminates the need to perform operations at runtime and therefore improves the runtime performance of the program.  

Constant folding by itself is kind of a silly optimization because typically nobody writes code like the assignment above.  However, in conjunction with *constant propagation* and *dead code elimination*, constant folding can become quite powerful and useful.  Constant propogation is a program analysis technique that looks at variables in expressions to see if in fact they act like constants.  In dead code elimination a piece of code is defined as dead if it has no impact on the execution of the program.  Since dead code has no impact on the execution of the code the optimization removes this code to make target programs smaller and more efficient.  Consider the following program,
```
factor = 2
x = factor * 10
print x
```
Constant propagation will realize that the variable `factor` in the assignment expression acts like a constant and will replace the variable with the appropriate constant value,
```
factor = 2
x = 2 * 10
print x
```
Now, our constant folding optimization can compute the value of the expression `2 * 10` and replace the operation with the value `20`. Our program becomes,
```
factor = 2
x = 20
print x
```
Now we can apply constant propagation again to variable `x` in the print statement and we obtain,
```
factor = 2
x = 20
print 20
```
Now, both assignment statements in our program can be considered dead code because they no longer have any kind of impact on our program.  We can eliminate them without changing the behavior of the program!  We obtain,
```
print 20
```
We reduced our program to the printing of a constant!  We will look at constant propagation and dead code elimination in Chapter 13.



![alt text](figures/chap06/3/figure/Slide1.jpg)
<p style="text-align: center;">
Fig. 3: Constant folding as a tree rewriting process.
</p>


Now, back to the constant folding optimization in our Cuppa1 compiler.  One way to view constant folding is as a AST rewriting process as shown in Figure 3.  Here the AST for the expression `10 + 5` is replaced by an AST node for the constant `15`.  In order to accomplish this we need to walk the AST for a Cuppa1 program and look for patterns that allow us to rewrite the tree.  This is very similar to code generation tree walker where we walked the tree and looked for AST patterns that we could translate into Exp1bytecode.  The big difference being that in the constant folder we will be returning the rewritten tree from the tree walker rather than bytecode as in the code generator.

Perhaps the easiest way to explain this is by looking at the code of the [constant folder](code/cuppa1_cc_fold.py).  In particular, the node function for addition,


In [45]:
from grammar_stuff import assert_match, dump_AST
from cuppa1_cc_fold import *

In [46]:
# %load -s plus_exp code/cuppa1_cc_fold.py
def plus_exp(node):

    (OP, c1, c2) = node
    assert_match(OP, '+')
    
    ltree = walk(c1)
    rtree = walk(c2)

    # if the children are constants -- fold!
    if ltree[0] == 'integer' and rtree[0] == 'integer':
        return ('integer', ltree[1] + rtree[1])
    
    else:
        return ('+', ltree, rtree)


The first thing the node function for addition does is walking the children of an addition AST node.  Remember, in this case this means that walking the tree will rewrite the tree therefore we capture the rewritten child trees and then if the children are constants we fold them by replacing the plus AST node with a constant node.  If it is not possible to fold we take the two new child trees and create a new plus AST and return that.  Let's try it.

In [47]:
plus_node = ('+', ('integer', 10), ('integer', 1))
dump_AST(plus_node)


(+ 
  |(integer 10) 
  |(integer 1))


In [48]:
plus_exp(plus_node)

('integer', 11)

So the node function for plus expressions folded the addition of the constants 10 and 1 into the constant 11.  If you look at the code for the [constant folder](code/cuppa1_cc_fold.py) you'll see a similar behavior for all the binary arithmetic operations.  It is worthwhile to take a peek at a node function for the binary relational operators to see how they map constants into our truth values of 0 and 1.  Here is the node function for the `==` operation,

In [49]:
# %load -s eq_exp code/cuppa1_cc_fold.py
def eq_exp(node):

    (OP, c1, c2) = node
    assert_match(OP, '==')
    
    ltree = walk(c1)
    rtree = walk(c2)

    # if the children are constants -- fold!
    if ltree[0] == 'integer' and rtree[0] == 'integer':
        return ('integer', 1 if ltree[1] == rtree[1] else 0)
    
    else:
        return ('==', ltree, rtree)


Upon folding the node function returns 0 or 1 depending on the result of the Python `==` operation.  This extra step is necessary because the Python `==` operator returns the Python Boolean values of `True` or `False` but in our language these do not exist and therefore need to be mapped into 1 and 0, respectively.  Similar for the `<=` operator.

The remaining node functions of the tree walker traverse the AST in search of the patters above.  Notice that all the node functions are very careful to always return trees built from the rewritten child trees.

Let's try our walker on our assignment statement example to see if it does what we claim it does,

In [50]:
stmt = ('assign', 'x', ('+', ('integer', 10), ('integer', 5)))
dump_AST(stmt)


(assign x 
  |(+ 
  |  |(integer 10) 
  |  |(integer 5)))


In [51]:
from cuppa1_cc_fold import walk as fold

In [52]:
new_stmt = fold(stmt)
dump_AST(new_stmt)


(assign x 
  |(integer 15))


Good, we got exactly the result that we expected from the constant folder.  Let's take a look at the next optimization phase for our Cuppa1 compiler.

## Peephole Optimization

If you recall the code generator for our Cuppa1 compiler translates Cuppa1 AST patterns into Exp1bytecode patterns and simply composes the generated bytecode patterns into a list of instructions.  That can lead to very silly looking code.  Consider the `fact` program from our Cuppa1 example programs,

In [53]:
from cuppa1_examples import fact

In [54]:
print(fact)


get x;
y = 1;
while (1 <= x)
{
      y = y * x;
      x = x - 1;
}
put y;



Applying the compiler function we developed above to this program gives us,

In [55]:
bytecode = cc(fact)

In [56]:
print(bytecode)

	input x ;
	store y 1 ;
L13:
	jumpF (<= 1 x) L14 ;
	store y (* y x) ;
	store x (- x 1) ;
	jump L13 ;
L14:
	noop ;
	print y ;
	stop ;



Take a look at the two lines of code following the label definition for `L10`.  Because we are just composing target patterns we wind up with a strange and useless `noop` instruction right at the label definition.  We would like our compiler to output code that looks like this,

In [57]:
new_bytecode = \
'''
    input x ;
    store y 1 ;
L13:
    jumpF (<= 1 x) L14 ;
    store y (* y x) ;
    store x (- x 1) ;
    jump L13 ;
L14:
    print y ;
    stop ;
'''

The situation gets even worse when we start nesting structured programming constructs.  Here is a silly program that only considers even numbers that have a value of less or equal to 10,

In [58]:
print_even = \
'''
get x
r = x - 2*(x/2)
if (not r)
  if (x <= 10)
    put x
'''

In [59]:
cuppa1_interp(print_even)

Value for x? 2
> 2


In [60]:
bytecode = cc(print_even)

In [61]:
print(bytecode)

	input x ;
	store r (- x (* 2 (/ x 2))) ;
	jumpF !r L15 ;
	jumpF (<= x 10) L16 ;
	print x ;
L16:
	noop ;
L15:
	noop ;
	stop ;



Here we see that the generated code has a cascade of `noop` instructions that serve as placeholders for various label definitions.  What we would like to see as generated code from our compiler in this instance is,

In [63]:
new_bytecode = \
'''
input x ;
    store r (- x (* 2 (/ x 2))) ;
    jumpF !r L15 ;
    jumpF (<= x 10) L15 ;
    print x ;
L15:
    stop ;
'''

The code changes we performed in these examples can be accomplished by applying rewrite rules to the generated code.  For example, in the first instance where we had a label definition on a `noop` instruction followed by other unlabeled instructions we can just delete the `noop` instruction.  This gives rise to the following rewrite rule:
```
L:
    noop
    <other instruction>

=>

L:
    <other instruction>
```
In the second example we had a cascade of label definitions and `noop` instructions.  The last `noop` in cascade will be taken care of by our previous pattern if there are instructions following that `noop`.  However, in order to get rid of that first `noop` we need a new rewrite rule,
```
L1:
    noop
L2:
    <other instruction>
=>

L2:  -- with L1 backpatched to L2
    <other instruction>
```
The pattern refers to *backpatching* which means that all references to the label defintion `L1` are rewritten as references to label definition `L2`.  That is exactly what happens in the examples above.  A label references to `L15` are rewritten as label references to `L16` and the label definition of `L15` is deleted from the code together with its `noop` instruction.  Now it might dawn you on why our codegenerator generates code as a list of instructions because a list of instructions is easy to rewrite.

### The Design of a Peephole Optimizer

One way to think of a peephole optimizer is as a window (the peephole) which we slide across the generated code repeatedly and apply rewrite rules like the ones we developed above to the code within the window.  The peephole optimizer terminates once no longer any code is being rewritten. Figure 4 shows the basic architecture of this.

![alt text](figures/chap06/4/figure/Slide1.jpg)
<p style="text-align: center;">
Fig. 4: The design of a peephole optimizer.
</p>
