# A little Procrastination goes a Long Way: Program Analysis with Intermediate Representations


<!-- TODO: incorporate the new grammar (input) into the chapter -->
<!-- TODO: incorporate 'not' operator into the chapter -->

<!--
\index{intermediate representation}
\index{IR}
\index{bytecode interpreter}
\index{virtual machine}
-->

In this chapter we show that the simple, syntax directed scheme of processing programming
languages shown in Chapter 2 is not powerful enough to handle certain
programming constructs such as the `goto` statement or the `jump to label` instruction for instance.  We show that such programming
constructs can be processed by first constructing an intermediate representation (IR) of the program
and then use this IR during the actual processing of the program.  We illustrate these ideas by constructing an
IR for a simple bytecode language (virtual machine).  


<!--
We continue our discussion with the fact that the *ad hoc* IR design
we used for the bytecode interpreter has its limitations when designing processors for more complex
languages.  We then introduce the idea of the Abstract Syntax Tree (AST) as an intermediate representation
and show that this intermediate representation can be directly derived from the grammar itself giving us
a more principled way of constructing intermediate representations.
We illustrate these ideas with a pretty printer program for a simple high-level language.
-->


In [1]:
# let the notebook access the code folder
import sys
sys.path.insert(1,"code")

# Limits of Syntax-Directed Processing

In Chapter 3 we introduced syntax directed interpretation as a way to add semantics to programming
languages.
However, this scheme fails when some language construct needs access to information that is not directly computable based on the local syntactic structures.
Classic examples of this is the `goto` statement in C and the `jump to label`  machine code instruction.
In order to examine this a little bit closer we extend our Exp1 language with conditional and unconditional jump instructions
and call the new language Exp1bytecode.

## Introducing the Exp1bytecode Language

The Exp1bytecode language is based on our Exp1 language but introduces five new statements we now call instructions (the reasons will become obvious when we look at the interpreter for this language): 

* `noop` - an instruction that does nothing.
* `stop` - an instruction that halts the execution.
* `jumpT exp label` - an instruction that evaluates `exp` and then jumps to the `label` if the expression evaluates to true.
* `jumpF exp label` - an instruction that evaluates `exp` and then jumps to the `label` if the expression evaluates to false.
* `jump label` - an unconditional jump to the `label`.

Recall that in Exp1 expressions are based on integer values.
Therefore, in order to compute the truth values necessary for the conditional jump instructions we adopt the following convention: an expression value of zero represents the boolean value false and a non-zero expression value represents the boolean value true.

Our Exp1bytecode language also introduces the idea of labeled instructions as targets for jump statements.
Labels are names followed by a colon that precede an instruction.
For example,
```
      store x 5;
L1:   store x (- x 1);
      jumpT x L1;
```
This program loops while `x` is non-zero.

<!--
\index{relational operator}
-->

In order to write some interesting programs in this new language we also introduce two new operators:

* `=` - the equality relational operator.
* `=<` - the less-equal relational operator. 

Both operators return zero for the boolean value false and one for the boolean value true.

## The Exp1bytecode Grammar

The grammar for our Exp1bytecode language,

```Python
# %load code/exp1bytecode_gram.py
from ply import yacc
from exp1bytecode_lex import tokens, lexer

def p_grammar(_):
    '''
    prog : instr_list

    instr_list : labeled_instr instr_list
               | empty

    labeled_instr : label_def instr

    label_def : NAME ':' 
              | empty

    instr : PRINT exp ';'
          | STORE NAME exp ';'
          | JUMPT exp label ';'
          | JUMPF exp label ';'
          | JUMP label ';'
          | STOP ';'
          | NOOP ';'

    exp : '+' exp exp
        | '-' exp exp
        | '-' exp
        | '*' exp exp
        | '/' exp exp
        | EQ exp exp
        | LE exp exp
        | '(' exp ')'
        | var
        | NUMBER
        
    label : NAME
    var : NAME
    '''
    pass

def p_empty(p):
    'empty :'
    pass

def p_error(t):
    print("Syntax error at '%s'" % t.value)

parser = yacc.yacc()
```

You can still clearly see the Exp1 lineage shining through.  However, our original lists of statements in Exp1 are now lists of labeled instructions where a label definition is a name followed by a colon.  Instructions now include jump instructions as well as the `noop` and `stop`instuctions.
We have also enriched expressions to include all the standard arithmetic operators including the unary minus.
The unary minus introduces shift/reduce conflicts into our parser because the unary minus expression, `'-' exp`, is a prefix to the binary subtraction expression `'-' exp exp`.  However, the standard conflict resolution of LR parsers for shift/reduce conflicts is to always shift when possible.  This behavior is exactly what we want so we can just leave it the way it is.

## The Lexer for Exp1bytecode

The corresponding lexical analyzer for Exp1bytecode,

```Python
# %load code/exp1bytecode_lex.py
# Lexer for Exp1bytecode

from ply import lex

reserved = {
    'store' : 'STORE',
    'print' : 'PRINT',
    'jumpT' : 'JUMPT',
    'jumpF' : 'JUMPF',
    'jump'  : 'JUMP',
    'stop'  : 'STOP',
    'noop'  : 'NOOP'
}

literals = [':',';','+','-','*','/','(',')']

tokens = ['NAME','NUMBER','EQ','LE'] + list(reserved.values())

t_EQ = '='
t_LE = '=<'
t_ignore = ' \t'

def t_NAME(t):
    r'[a-zA-Z_][a-zA-Z_0-9]*'
    t.type = reserved.get(t.value,'NAME')    # Check for reserved words
    return t

def t_NUMBER(t):
    r'[0-9]+'
    t.value = int(t.value)
    return t

def t_NEWLINE(t):
    r'\n'
    pass

def t_COMMENT(t):
    r'\#.*'
    pass
    
def t_error(t):
    print("Illegal character %s" % t.value[0])
    t.lexer.skip(1)

# build the lexer
lexer = lex.lex()
```

There should be no real surprises in this definition. The obvious change are the additional keywords.  The other real change is that we added comments to our language in the form of `#` comments.  As usual with these kinds of comments once you start a comment it spans the rest of the line. The regular expression for this is `\#.*` - match zero or more character after the hash symbol not including the newline character.

## Testing our Exp1bytecode Parser

Let's exercise this parser.

In [2]:
from exp1bytecode_gram import parser

As input program we will use our example program from above that decrements `x` until `x` is zero.

In [3]:
input_stream = \
'''
      store x 5;
L1:   store x (- x 1);
      jumpT x L1;

'''

In [4]:
parser.parse(input_stream)

Good! No syntax or other errors were reported.  That means our parser works.

## Trying our Hand at Syntax Directed Processing...

<!--
\index{label!definition}
\index{label!reference}
-->

Now, back to our problem at hand: the syntax directed interpretation of this language.
As long as we are dealing with expressions in Exp1bytecode things are fine.
We could easily envision providing syntax directed interpretation for expression similar to what we
did in Exp1,
```Python
   ...

def p_plus_exp(p):
    """
    exp : '+' exp exp
    """
    p[0] = p[2] + p[3]

def p_minus_exp(p):
    """
    exp : '-' exp exp
    """
    p[0] = p[2] - p[3]

def p_paren_exp(p):
    """
    exp : '(' exp ')'
    """
    p[0] = p[2]

def p_var_exp(p):
    "exp : var"
    p[0] = p[1]
    
def p_num_exp(p):
    "exp : num"
    p[0] = p[1]

   ...
```
All information is available at the point when we recognize a syntactic structure and we are able to evaluate the embedded rules.
The same is true for the `print` and `store` instructions.  All the information required to interpret these instructions
is availble at the time when their syntax is recognized by the parser.
```Python
def p_print_instr(p):
    "instr : PRINT exp ';'"
    print("> {}".format(p[2]))
    
def p_store_instr(p):
    "instr : STORE NAME exp ';'"
    symbol_table[p[2]] = p[3]
```
This is of course not surprising, because these syntactic structures of Exp1bytecode are part of our Exp1 language and we have shown that we can use syntax directed processing for that language.

## Syntax Directed Interpretation Fails!

<!--
\index{forward jump}
-->

Trouble arises when we try to perform syntax directed interpretation of jump instructions. Consider,
```Python
def p_instr(p):
    '''
    instr :
         ...
         | JUMP label ';'
         ...
    '''
    if p[1] == 'jump':
        target = p[2]
        # and we are in deep trouble - the jump target is not local!?!
```
Trying to do this in a syntax directed fashion gets us into deep trouble because the target of the jump instruction
is not local to the syntactic unit of the jump instruction.  As a matter of fact, if the jump is a forward jump
in the code then we will not have actually seen the target yet!

Consider the following Exp1bytecode snippet,
```
      store x 10;
      jumpT (= x 10) L1;
      print 0;
      stop
L1:   print 1;
      stop;
```
This program stores the value ten in `x`, then checks if `x` has the value ten.  
If so, it jumps forward to the label `L1` and prints out the value one and stops the execution.
Otherwise it prints out the value zero and stops the execution.
It is a silly program but it illustrates the point quite nicely that the syntax directed processing of the `jumpT` statement
will fail because at the point of interpreting the jump statement we have not seen the label definition yet.

# Decoupling Syntax Analysis and Semantic Processing

<!--
\index{interpreter}
\index{syntax analysis}
\index{semantic analysis}
\index{intermediate representation}
\index{IR}
-->

In order to interpret languages like Exp1bytecode we need to decouple the syntax analysis from the actual interpretation, that is,
we need to procrastinate with our interpretation of the program by first constructing an abstract representation of it, the *intermediate representation*, and then in turn evaluate or interpret this abstract representation once it has been completely constructed.
This fits nicely with our architecture of an interpreter of Figure 6 in Chapter 1.
There our interpreter has two phases that are coupled with an intermediate representation of the program.
The first phase builds the intermediate representation and the second phase interprets the intermediate representation.

We can say the following about any kind of intermediate representation
> An intermediate representation (IR) is an abstract representation of the original program.

### An Abstract Machine based IR Design

<!--
\index{IR design}
-->

Since the IR is at the core of our interpreter this makes a good IR design paramount,

> A good IR should be easy to construct and easy to process.

Here we take an approach to IR design that is driven by particular features of our language at hand.
In our case we can view Exp1bytecode as representing an abstract machine.

## The Exp1bytecode Interpreter

---
<center>
<img src="figures/chap04/1/figure/Slide1.jpg" alt="">
Fig 1. IR design for the Exp1bytecode interpreter.
</center>

---


If we look at the Exp1bytecode syntax we can identify three major characteristics of this language:

* We have variables that hold values and these values can be changed and referenced by instructions.
* We have conditional and unconditional jumps which use label definitions and references to specify the range of the jumps.
* Programs in this language consist of a sequence of instructions.

Given these features of Exp1bytecode and given the fact that the
language looks like very abstract machine code, one design choice is to make our IR resemble a virtual machine that consists of three
major components:

* A symbol table to hold variable definitions.
* A label table to hold label definitions.
* A list of instructions.
 
Figure 1 shows our IR design.
Our abstract machine is shown with the program,
```
   store x 10 ;
L1:
   print x ;
   store x (- x 1) ;
   jumpT x L1 ;
   stop ;
```
loaded and ready to be interpreted.
Given that programs in the IR representation still look very much like the programs in the original textual representation the IR is easy to construct.
Also, given that programs are represented as a list of instructions they are easy to interpret -- we simply 
walk down the list of instructions and execute each one in turn.
So it seems that our IR design fulfills the two key points of IR design we made above: easy to construct and easy to process.

## This Solves Our Jump Problem!

Now, let us just think through the issue with labels that we had before when we attempted a syntax directed approach to the interpretation of Exp1bytecode.
In our new IR design labels behave much like variables in the sense that we have a definition point and we have label references.
The definition points of labels are the labeled instructions and the reference points are the labels in the jump
instructions.
In order to deal with this effectively our IR uses a label table that records the address of the instructions that act as definition points for a particular labels.
In our example in Figure 1 we see that the label table holds the label `L1` and the entry 
for this label points to the definition point of this label, namely the print statement in the program.
Label references point back to the label table and therefore we can find and resolve the targets for any jumps that occur in a program.
Also note that forward references are no longer a problem because during the syntax analysis phase we will have seen all label
definition points and entered them into the label table before the semantic phase started.

Here is an animation of the abstract machine executing our program.

<!-- chap04 q1 -->

<a href="http://www.youtube.com/watch?feature=player_embedded&v=7oY-FS0jHvo" target="_blank">
<img style='border:1px solid #000000' src="movie.jpg" width="120" height="90" />
</a>


## IR Implementation

In order to model our interpreter design we create a separate class to hold the machine state.

In [5]:
# %load code/exp1bytecode_interp_state
# define and initialize the structures of our abstract machine

class State:

    def __init__(self):
        self.initialize()
    
    def initialize(self):
        self.program = []
        self.symbol_table = dict()
        self.label_table = dict()
        self.instr_ix = 0

state = State()




The module makes a `state` object available that will serve to hold the state of our abstract machine both during the construction of the IR as well as during the interpretation.

### The Parser

The IR construction is embedded in the grammar specification for our parser.  Let's take a look.

```python
# %load code/exp1bytecode_interp_gram
from ply import yacc
from exp1bytecode_lex import tokens, lexer
from exp1bytecode_interp_state import state

def p_prog(_):
    '''
    prog : instr_list
    '''
    pass

def p_instr_list(_):
    '''
    instr_list : labeled_instr instr_list
              | empty
    '''
    pass

def p_labeled_instr(p):
    '''
    labeled_instr : label_def instr
    '''
    # if label exists record it in the label table
    if p[1]:
        state.label_table[p[1]] = state.instr_ix
    # append instr to program
    state.program.append(p[2])
    state.instr_ix += 1

def p_label_def(p):
    '''
    label_def : NAME ':' 
              | empty
    '''
    p[0] = p[1]

def p_instr(p):
    '''
    instr : PRINT exp ';'
          | INPUT NAME ';'
          | STORE NAME exp ';'
          | JUMPT exp label ';'
          | JUMPF exp label ';'
          | JUMP label ';'
          | STOP ';'
          | NOOP ';'
    '''
    # for each instr assemble the appropriate tuple
    if p[1] == 'print':
        p[0] = ('print', p[2])
    elif p[1] == 'input':
        p[0] = ('input', p[2])
    elif p[1] == 'store':
        p[0] = ('store', p[2], p[3])
    elif p[1] == 'jumpT':
        p[0] = ('jumpT', p[2], p[3])
    elif p[1] == 'jumpF':
        p[0] = ('jumpF', p[2], p[3])
    elif p[1] == 'jump':
        p[0] = ('jump', p[2])
    elif p[1] == 'stop':
        p[0] = ('stop',)
    elif p[1] == 'noop':
        p[0] = ('noop',)
    else:
        raise ValueError("Unexpected instr value: %s" % p[1])

def p_label(p):
    '''
        label : NAME
        '''
    p[0] = p[1]

def p_bin_exp(p):
    '''
    exp : '+' exp exp
        | '-' exp exp
        | '*' exp exp
        | '/' exp exp
        | EQ exp exp
        | LE exp exp
    '''
    p[0] = (p[1], p[2], p[3])
    
def p_uminus_exp(p):
    '''
    exp : '-' exp
    '''
    p[0] = ('UMINUS', p[2])
    
def p_not_exp(p):
    '''
    exp : '!' exp
    '''
    p[0] = ('!', p[2])
    
def p_paren_exp(p):
    '''
    exp : '(' exp ')'
    '''
    # parens are not necessary in trees
    p[0] = p[2]
    
def p_var_exp(p):
    '''
    exp : NAME
    '''
    p[0] = ('NAME', p[1])

def p_number_exp(p):
    '''
    exp : NUMBER
    '''
    p[0] = ('NUMBER', int(p[1]))

def p_empty(p):
    '''
    empty :
    '''
    p[0] = ''

def p_error(t):
    print("Syntax error at '%s'" % t.value)

parser = yacc.yacc(debug=False, tabmodule='exp1bytecodeparsetab')
```

In our preamble we import `yacc`, the lexer stuff, and the state object of  our abstract machine.  For the purposes of constructing the IR we will access the label table for resolving jump targets, the program list for our instructions, and the instruction index variable that keeps track of where to insert instructions into the program list.

### Handling Lists of Instructions

The first really interesting piece of code is the parsing function for the rule,
```
labeled_instr : label_def instr
```
The embedded action associated with this rule is,
```Python
# if label exists record it in the label table
if p[1]:
    state.label_table[p[1]] = state.instr_ix
# append instr to program
state.program.append(p[2])
state.instr_ix += 1
```
The first thing to notice is that we referencing the data structures for our abstract machine.
If the instruction is labeled then we record the label together with the current address in the 
label table.  We then append the instruction itself to the program list.

Parsing instructions constructs tuples that consist of the name of the instruction together with its arguments. Take a look,
```
'''
instr : PRINT exp ';'
     ...
      | JUMP label ';'
     ...
'''
```
```Python
    # for each instr assemble the appropriate tuple
    if p[1] == 'print':
        p[0] = ('print', p[2])
    ...
    elif p[1] == 'jump':
        p[0] = ('jump', p[2])
    ...
```
Here the tuple for the print instruction consists of the name `print` and the expression that represents the value to be printed out.
The tuple for the jump instruction consists of the name `jump` and the name of the target label.
Similarly for all the other instructions.

### Handling Expressions

Because we are delaying the evaluation of expressions until we have the IR constructed we need to 
have some sort of representation of the expression value that we can evaluate later to actually compute a value.
The idea is that we construct an expression or term tree from the source expression and that term tree can then be evaluated later to compute an actual integer value.
```
'''
exp : '+' exp exp
    | '-' exp exp
    | '*' exp exp
    | '/' exp exp
    | EQ exp exp
    | LE exp exp
'''
```
```Python
p[0] = (p[1], p[2], p[3])
```
Here we construct a term tree from the binary expressions.  Terms are just tuples where the first component of
the tuple is the name of the operator and the second and third components are the subexpressions of the binary expression.  In the embedded actions the variable `p[1]` holds the operator name and
`p[2]` and `p[3]` represent the sub expression terms, respectively.  All three of these items form a term tuple which is assigned
to the variable `p[0]`.

According to these rules the expression,
```
=< + 3 2 * 3 2
```
gives rise to the term tree,
```
('=<', ('+', ('NUMBER', 3), ('NUMBER', 2)), ('*', ('NUMBER', 3), ('NUMBER', 2)))
```

### Testing the Parser

Let's test run our parser.  We will use Python's builtin pretty printer to print out our `program` which is 
a list of statement tuples.

In [7]:
from exp1bytecode_interp_state import state
from exp1bytecode_interp_gram import parser
from exp1bytecode_lex import lexer
import pprint
pp = pprint.PrettyPrinter()

Setting up the input stream with our Exp1bytecode program.

In [8]:
input_stream = \
'''
   store x 10 ;
L1:
   print x ;
   store x (- x 1) ;
   jumpT x L1 ;
   stop ;
'''

Running the parser.

In [9]:
parser.parse(input_stream)

We have parsed the input program.  We can look at the structures in our abstract machine and we will find
that it has all been appropriately initialized.

In [10]:
# print out the program list of statement tuples
pp.pprint(state.program)

[('store', 'x', ('NUMBER', 10)),
 ('print', ('NAME', 'x')),
 ('store', 'x', ('-', ('NAME', 'x'), ('NUMBER', 1))),
 ('jumpT', ('NAME', 'x'), 'L1'),
 ('stop',)]


In [11]:
# print out the label table
pp.pprint(state.label_table)

{'L1': 1}


The label table is interesting because here we see that the lable `L1` points to instruction 1 in the list of instructions, namely the instruction `print x`.  Exactly what we expected.

In [12]:
# print the symbol table
pp.pprint(state.symbol_table)

{}


The symbol table is empty since we have not executed the program yet! We have just initialized our abstract machine.

### Interpreting the IR

In order to interpret the programs in our IR we need two functions.  The first one is the interpretation of instructions on the program list.

In [13]:
# %load -s interp_program code/exp1bytecode_interp.py
def interp_program():
    'execute abstract bytecode machine'
    
    # We cannot use the list iterator here because we
    # need to be able to interpret jump instructions
    
    # start at the first instruction in program
    state.instr_ix = 0
    
    # keep interpreting until we run out of instructions
    # or we hit a 'stop'
    while True:
        if state.instr_ix == len(state.program):
            # no more instructions
            break
        else:
            # get instruction from program
            instr = state.program[state.instr_ix]
        
        # instruction format: (type, [arg1, arg2, ...])
        type = instr[0]
        
        # interpret instruction
        if type == 'print':
            # PRINT exp
            exp_tree = instr[1]
            val = eval_exp_tree(exp_tree)
            print("> {}".format(val))
            state.instr_ix += 1
        
        elif type == 'store':
            # STORE type exp
            var_name = instr[1]
            val = eval_exp_tree(instr[2])
            state.symbol_table[var_name] = val
            state.instr_ix += 1

        elif type == 'jumpT':
            # JUMPT exp label
            val = eval_exp_tree(instr[1])
            if val:
                state.instr_ix = state.label_table.get(instr[2], None)
            else:
                state.instr_ix += 1

        elif type == 'jumpF':
            # JUMPF exp label
            val = eval_exp_tree(instr[1])
            if not val:
                state.instr_ix = state.label_table.get(instr[2], None)
            else:
                state.instr_ix += 1

        elif type == 'jump':
            # JUMP label
            state.instr_ix = state.label_table.get(instr[1], None)
        
        elif type == 'stop':
            # STOP
            break

        elif type == 'noop':
            # NOOP
            state.instr_ix += 1
        
        else:
            raise ValueError("Unexpected instruction type: {}".format(p[1]))


This function consists of one big `while` loop that fetches the next instruction from the program list.  It looks at the first component of the instruction tuple in order to determine what kind of instruction we are looking at then interprets that function accordingly.  Expression terms are evaluated to integer values using the `eval_exp_tree`
function.

In order to have a complete interpreter for our abstract machine we need to provide the function `eval_exp_tree`.

In [14]:
# %load -s eval_exp_tree code/exp1bytecode_interp.py
def eval_exp_tree(node):
    'walk expression tree and evaluate to an integer value'

    # tree nodes are tuples (TYPE, [arg1, arg2,...])
    
    type = node[0]
    
    if type == '+':
        # '+' exp exp
        v_left = eval_exp_tree(node[1])
        v_right = eval_exp_tree(node[2])
        return v_left + v_right
    
    elif type == '-':
        # '-' exp exp
        v_left = eval_exp_tree(node[1])
        v_right = eval_exp_tree(node[2])
        return v_left - v_right
    
    elif type == '*':
        # '*' exp exp
        v_left = eval_exp_tree(node[1])
        v_right = eval_exp_tree(node[2])
        return v_left * v_right
    
    elif type == '/':
        # '/' exp exp
        v_left = eval_exp_tree(node[1])
        v_right = eval_exp_tree(node[2])
        return v_left // v_right
    
    elif type == '=':
        # '=' exp exp
        v_left = eval_exp_tree(node[1])
        v_right = eval_exp_tree(node[2])
        return v_left == v_right
    
    elif type == '=<':
        # '=<' exp exp
        v_left = eval_exp_tree(node[1])
        v_right = eval_exp_tree(node[2])
        return v_left <= v_right
    
    elif type == 'UMINUS':
        # 'UMINUS' exp
        val = eval_exp_tree(node[1])
        return - val
    
    elif type == 'NAME':
        # 'NAME' var_name
        return state.symbol_table.get(node[1],0)

    elif type == 'NUMBER':
        # NUMBER val
        return node[1]


We can test our expression evaluation with the expression from before,
```
=< + 3 2 * 3 2
```

In [15]:
eval_exp_tree(('=<', ('+', ('NUMBER', 3), ('NUMBER', 2)), ('*', ('NUMBER', 3), (('NUMBER', 2)))))


True

### Running our Interpretation Functions

We can now interpret Exp1bytecode.  Recall that the program is loaded in the abstract machine

In [16]:
pp.pprint(state.program)

[('store', 'x', ('NUMBER', 10)),
 ('print', ('NAME', 'x')),
 ('store', 'x', ('-', ('NAME', 'x'), ('NUMBER', 1))),
 ('jumpT', ('NAME', 'x'), 'L1'),
 ('stop',)]


Running the program by calling our `interp_program` function.

In [17]:
interp_program()


> 10
> 9
> 8
> 7
> 6
> 5
> 4
> 3
> 2
> 1


After we ran our program, the symbol table is no longer empty, but it now contains our binding for the `x` variable.

In [18]:
print(state.symbol_table)

{'x': 0}


### Toplevel Interpreter Function

In order to tie everything together, here is a toplevel function,

In [19]:
# %load -s interp code/exp1bytecode_interp.py
def interp(input_stream):
    'driver for our Exp1bytecode interpreter.'

    # initialize our abstract machine
    state.initialize()
    
    # build the IR
    parser.parse(input_stream, lexer=lexer)
    
    # interpret the IR
    interp_program()



Set up the environment for our interpreter.

In [23]:
input_stream = \
'''
   store x 10 ;
L1:
   print x ;
   store x (- x 1) ;
   jumpT x L1 ;
   stop ;
'''

Here we go!

In [24]:
interp(input_stream)

> 10
> 9
> 8
> 7
> 6
> 5
> 4
> 3
> 2
> 1


# Summary

We began this chapter by showing that certain programming language features such as the jump instructions in Exp1bytecode cannot be processed using a syntax directed scheme.  In order to interpret languages that have
such features we have to decouple the syntactic phase and the semantic phase of our interpreter.
As an example we constructed an interpreter for Exp1bytecode whose IR is a representation of an abstract machine.
Here the state contained an instruction list, a label table, a symbol table, and an instruction index.
The syntax phase of the interpreter fills this structure out and the semantic phase then uses this information to execute the program now stored in the instruction list.

# Notes

Any compiler book will have a thorough discussion on intermediate representations (Watson, 2017; Cooper & Torcson, 2011; Appel, 2004).  Wikipedia has some nice entries on
[intermediate representations](https://en.wikipedia.org/wiki/Intermediate_representation)
and
[bytecode](https://en.wikipedia.org/wiki/Bytecode).



Watson, D. (2017). A Practical Approach to Compiler Construction.

Cooper, K., & Torczon, L. (2011). *Engineering a compiler*. Elsevier.

Appel, A. W. (2004). Modern compiler implementation in C. Cambridge university press.


# Exercises

1. How would you change Exp1bytecode to make it amenable for syntax-directed interpretation? 
(**Hint:** Add structured programming constructs.) Implement a grammar specification for your new language and illustrate that it can support syntax-directed interpretation.

2. Consider our Exp1bytecode language given in the grammar above. Add a new branching instruction called `compare` to the language. The syntax of this instruction is as follows,
```
instr: COMPARE exp exp label label label ';' 
```
and the semantics of this instruction can be described like this,
  - If the first expression has a value less than the second expression then jump to the first label.
  - If the expressions have equal values then jump to the second label.
  - If the first expression has a value larger than the second expression then jump to the third label.

  Modify the interpreter for Exp1bytecode to accommodate this new instruction.

3. Implement the Boolean operations `and` and `or` in the Exp1bytecode interpreter.