A Java-based, tree-walk interpreter for a custom programming language. The language features a Python-like, indentation-based syntax, dynamic typing, and standard control flow structures.
This project implements a full evaluator, including lexical analysis (with indentation tracking), recursive descent parsing, Abstract Syntax Tree (AST) generation, and a visitor-based interpreter.
- Indentation-based blocks: Uses significant whitespace (indentation/dedentation) to define scope, similar to Python.
- Dynamic Typing: Supports numbers (doubles), strings, and booleans with dynamic evaluation.
- Control Flow:
if/elseconditionals andwhileloops. - Logical Operators:
and,or, andnot. - Relational & Arithmetic Operations: Standard mathematical and comparison operators.
- Built-in Print:
print()statement supporting multiple arguments.
The language is built on the following Context-Free Grammar:
CompilationUnit := { Statement }
Statement := (IfStatement | WhileStatement | Expression | PrintStatement) NEWLINE
IfStatement := 'if' Expression ':' BlockStatement [ 'else' ':' BlockStatement ]
WhileStatement := 'while' Expression ':' BlockStatement
PrintStatement := 'print' '(' [ Expression { ',' Expression } ] ')'
BlockStatement := NEWLINE INDENT { Statement } DEDENT
Expression := E1 { 'or' E1 }
E1 := E2 { 'and' E2 }
E2 := [ 'not' ] E3
E3 := E4 { ('<' | '<=' | '>' | '>=' | '==' | '!=') E4 }
E4 := E5 { ('+' | '-') E5 }
E5 := E6 { ('*' | '/' | '%') E6 }
E6 := String | Number | Identifier { '=' Expression } | '(' Expression ')'
Identifier := (Letter | '_') { Letter | Digit | '_' }
Number := Digit { Digit } [ '.' { Digit } ]
String := '"' { AnyCharExceptDoubleQuoteOrNewline } '"'
You may notice that the grammar allows assignments to be evaluated as expressions (e.g., Identifier { '=' Expression } inside the E6 production). This deviates from the original Python language, which strictly treats assignments as statements.
This design choice ensures the grammar remains strictly LL(1). If Assignment were defined as a separate Statement production, the recursive descent parser would encounter a structural ambiguity when reading an Identifier. It would not be able to decide whether it was evaluating a standard mathematical/logical expression starting with a variable or parsing an assignment without utilizing a two-token lookahead, which would make the grammar LL(2).
- Lexer.java & Token.java: Scans the raw source code and converts it into a sequence of tokens. It maintains a stack to track indentation levels, generating synthetic INDENT and DEDENT tokens to pass structural block information to the parser.
- Parser.java: A recursive descent parser that strictly follows the LL(1) grammar above. It reads tokens and constructs the Abstract Syntax Tree.
- AST.java: Defines the node structures for the Abstract Syntax Tree (Expressions and Statements) and sets up the interfaces for the Visitor design pattern.
- Interpreter.java: Walks the AST using the Visitor pattern to evaluate expressions and execute statements. It manages a runtime environment (a HashMap) to keep track of variable states and handles truthiness, equality, and arithmetic logic.
- Main.java: The entry point that orchestrates the pipeline: File Reading -> Lexer -> Parser -> Interpreter.