## plipy


![cover image](cover-image.gif)


## Programming Language Implementation with Python

Lutz Hamel, University of Rhode Island

** --DRAFT-- **

The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT).

### [00 - Front Matter](chap00.ipynb)

There has been a noticable shift in the approach of implementing programming languages over the past 15 or so years.
Before the introduction of Java in the mid 1990's the only "legitimate" way to implement a programming language was to construct a compiler that compiled programs written in a high-level source language such as C into target micro processor machine code.  The success of Java and its virtual machine approach legitimized alternative approaches to language implementation.  Since then many successful, modern programming languages such as Perl, Ruby, Python (to name but a few) have been implemented using alternative approaches. Implementation approaches explored by these languages include incremental interpreters, virtual machines, as well as just-in-time compilers.  
<!-- It is interesting to observe that this shift is not reflected in many of the available text books.  The majority of text books still view the "compile to machine code" as the primary programming language implementation paradigm and therefore ignore the implementation strategies of the majority of modern programming languages that are in use today. -->

Another thread we can follow is the rise of the idea of domain specific languages (Fowler, 2010) or sometimes called 'Little Languages'  (Bentley, 1986).  A domain specific language is a language that has been designed to solve problems in a specific domain.  This is in contrast to
general purpose languages such as Java or C++ which were designed to solve problems in a large spectrum of domains.   It is now generally accepted that developing domain specific programming languages is a legitimate part of a software engineering approach to developing a software solution.
Domain specific languages are only rarely implemented using a full blown compiler.  In most cases some sort of interpretation or virtual machine approach is used to implement these languages.  
<!-- Observe that this important development in programming languages is pretty much ignored by the majority of current text books. -->

Finally, 
<!-- since many current books on programming language implementation -->
only addressing the "compile to machine code" approach to language implementation makes  programming language implementation seem to be an extremely complicated task.  Which is of course true if you are looking at a high-performance compiler, but it is not true if you are looking at the implementation of smaller languages designed to handle specific tasks.  
By making an introduction to programming languages implementation overly complex, *e.g.* (Cooper & Torczon, 2011) spends 200+ pages on parsing theory alone, denies students access to an interesting way of solving software engineering problems.

The book you are looking at addresses the points raised above.  We take a pragmatic point of view of programming
language implementation with the goal of having the reader implement programming languages "from day one." We look at interpretation, virtual machines, and compilers using small, realistic languages.  One of the strengths of the  "implement from day one" approach is that we start with very simple languages and then build on the acquired expertise incrementally until we show how languages with functions and type systems can be interpreted and/or compiled.  

It turns out that Python is the perfect environment for the exploration of programming language implementation with its abilitiy to load and run modules independently from one another.  This allows us to incrementally explore and study different aspects of a programming language implementation interactively. We have purposefully stayed away from an fully object-oriented design for language processors only applying OO design sparingly.  It is the author's experience that a fully object-oriented design for language processors obscures the natural structure of these programs making them difficult to understand and maintain.  Instead we use functional aspects of Python such as higher-order programming as well as its native capability of constructing and pattern-matching n-tuples on the fly giving rise to highly readable language implementations.  

The material in this book is accessible to anyone who has taken a college-level programming course or has equivalent experience.  We also expect that the reader has had some exposure to Python programming.  Once the reader has mastered the material in this book he or she will be appropriately prepared to master advanced topics such as high-performance compilers and advanced interpreter/virtual machine architecture such as the implementation of Python itself.

Fowler, M. (2010). [*Domain-specific languages*](https://www.pearson.com/us/higher-education/program/Fowler-Domain-Specific-Languages/PGM305379.html). Pearson Education.

Bentley, J. (1986). [*Programming pearls: little languages*](http://dl.acm.org/citation.cfm?id=315691). Communications of the ACM, 29(8), 711-721.

Cooper, K., & Torczon, L. (2011). [*Engineering a compiler*](https://www.elsevier.com/books/engineering-a-compiler/cooper/978-0-12-088478-0). Elsevier.


## Part I: 

### [01 - Programming Languages and their Processors](chap01.ipynb)

- The Structure of Programming Languages
  - Parsing
- The Behavior of Programming Languages
- Language Processors
  - Building Blocks
    - Syntax Analysis
    - Semantic Analysis
    - Code Generation
  - Architectures
    - The Reader
    - The Generator
    - The Interpreter
    - The Translator
    - The Simple Translator
  - An Example: The Java Programming Language
- Summary
- Notes
- Exercises

### [02 - A Crash Course in Parsing and Lexing](chap02.ipynb)

- Grammars
  - The Basics
  - Derivations
  - Parse Trees
  - An Example: The Exp0 Language
- Parsers
  - Top-Down Parsing
    - Lookahead Sets
    - Left-Recursive Grammars are not LL(1)
    - Other Grammars that are not LL(1)
    - A Top-Down Parsing Algorithm
  - Bottom-Up Parsing
    - A Bottom-Up Parsing Algorithm
    - A Closer Look at LR(0)
  - Building Parsers by Hand
    - Recursive Descent is LL(1)
- Parser Generators
  - An Example: Our First Language Processor
- Lexical Analysis
  - An Example: The Exp1 Language
- Summary
- Notes
- Exercises

### [03 - Let the Syntax guide You](chap03.ipynb)



* What is Syntax-Directed Interpretation?
* An Interpreter for Exp1 using a Recursive Descent Parser
  - The Interpretation of Exp1 Programs
  - Syntax Directed Interpretation of Expressions
  - The Syntax Directed Interpretation of Variables and Numbers
  - Interpreting an Expression
  - Syntax Directed Interpretation of Statements
  - Interpreting Statements
  - Processing Statement Lists
  - Adding a Toplevel Driver
* An Interpreter for Exp1 using an LR(1) Parser
  - The Exp1 Parser
  - Testing our LR(1) Parser
  - Adding a Toplevel Driver
* Another Take on Syntax-Directed Processing: A Pretty Printer for Exp1
  - The Pretty Printer Parser
  - Testing the Pretty Printer
  - Putting all together
* Summary
* Notes
* Exercises

### [04 - A little Procrastination goes a Long Way: Program Analysis with Intermediate Representations](chap04.ipynb)

* Limits of Syntax-Directed Processing
* Introducing the Exp1bytecode Language
  - The Exp1bytecode Grammar
  - The Lexer for Exp1bytecode
  - Testing our Exp1bytecode Parser
* Trying our Hand at Syntax Directed Processing...
  - Syntax Directed Interpretation Fails!
* Decoupling Syntax Analysis and Semantic Processing
  - An Abstract Machine based IR Design
* The Exp1bytecode Interpreter
  - This Solves Our Jump Problem!
  - IR Implementation
  - The Parser
  - Handling Lists of Instructions
  - Handling Expressions
  - Testing the Parser
  - Interpreting the IR
  - Running our Interpretation Functions
  - Toplevel Interpreter Function
* Summary
* Notes
* Exercises

### [05 - The Magic of Tree Walking](chap05.ipynb)

 * Abstract Syntax Trees
   - The Tuple Representation of ASTs
 * The Cuppa1 Programming Language
 * The Cuppa1 Frontend
   - Statements
   - Statement Lists and Programs
   - Expressions
   - Generating ASTs
 * Tree Walking
   - A Simple Tree Walker
   - Tree Walkers are Plug'n Play
 * An Interpreter for Cuppa1
   - Running the Interpreter
 * A Cuppa1 Pretty Printer with a Twist
   - Architecture of the Cuppa1 Pretty Printer
     - The First Pass Walker
     - The Second Pass Walker
   - Testing the Pretty Printer
 * Summary
 * Notes
 * Exercises

### [06 -  Compilers](chap06.ipynb)

 * A Basic Compiler
     - A Code Generation Tree Walker
     - Formatting the Output
     - Architecture
       - Examples
 * Compiler Correctness
 * Optimization
     - Constant Folding
     - Peephole Optimization
       - The Design of a Peephole Optimizer
 * Putting it All Together
 * Summary
 * Notes
 * Exercises

### [07 - Of Scope & Symbol Tables](chap07.ipynb)

* The Cuppa2 Language
 - Cuppa2 Programs
* A Symbol Table for supporting Scope
 - Symbol Table Design
* A Cuppa2 Interpreter
* Syntactic vs. Semantic Errors
* Compiling Scoped Code
* A Cuppa2 Compiler
 - The Symbol Table and Front End
 - The Code Generator
 - Testing the Compiler
* Summary
* Notes
* Exercises

### [08 - The Almighty Function]()

### [09 - Type Systems]()

### [10 - Structured Data Types]()

## Part II

### [11 - Higher-Order Programming]()

### [12 - Object-Oriented Language Features]()

### [13 - Compiling for Real Machines]()

### [14 - PyPy/RPython: Constructing Fast Interpreters]()

### [13 - LLVM: A Compiler Construction Framework]()

