- arrays
- classes with inheritance and virtual methods
- null implicitly converts to any array and to any class
- destructors (garbage collection) for every non-trivial type; can be disabled -- see Compiler flags.
- making IR SSA
- optimizations:
- constant propagation
- copy propagation
- unnecessary phi elimination
- dead code elimination
- GCSE (global common subexpression elimination)
- variables namespace is independent of the functions (and methods) namespace, e.g. (accepted code):
int foo() { return 42; }
int main() {
int foo = 3;
if (foo() + foo != 45) error();
return 0;
}
- field shadowing is not possible, e.g. (rejected code):
class X {
int x;
}
class Y extends X {
string x;
}
int main() {
return 0;
}
Tests comprehensively cover language construct, type conversions and required static analysis. Some tests reflect chosen language semantics, thus compilers implementing other interpretations may not pass them.
Tests are not divided into ones categories based on which extensions they require. Test folder structure:
good/
-- tests that present correct programsbad/
-- tests that present incorrect programswarnings/
-- tests that present programs producing diagnostic warnings
Just use
make
In case of compiler crashes please try:
make DEBUG=8
Project is implemented in C++17. For parsing BNFC with C backend was used.
For compiling assembly NASM
is used. For stripping symbols strip
command is used. For linking gcc
is used.
- my tests in folders
good/
,bad/
,warnings/
src/
-- project sourcessrc/ByBnfc/
-- source files generated by BNFC fromsrc/Latte.cf
src/ast/
-- AST and code for transforming BNFC-generated AST into my ASTsrc/ast/ast.hh
-- definition of the ASTsrc/ast/build.cc.template
-- transforming BNFC-generated AST into my ASTsrc/ast/build_gen.py
-- script for generatingsrc/ast/build.cc
fromsrc/ast/build.cc.template
src/backend/
-- backendsrc/backend/x86_64.cc
-- translating IR to x86_64 assembly
src/frontend/
-- frontendsrc/frontend/error.hh
-- utility to pretty-print compilation errors and warningssrc/frontend/global_symbols.cc
-- building table of all global symbols aka. functions, classes and their fields and methods + analyzing inheritance and virtual methodssrc/frontend/type_checker.cc
-- checking type correctnesssrc/frontend/static_analyzer.cc
-- static analysis i.e. compilation time computations, code reachability and control flow checking
src/ir/
-- middle end: IR, transforming IR, optimizationssrc/ir/ast_to_ir.cc
-- translating AST to IRsrc/ir/bblock_pred_succ_info.cc
-- calculating predecessors and successors for every basic blocksrc/ir/eliminate_dead_code.cc
-- dead code eliminationsrc/ir/eliminate_unnecessary_phis.cc
-- unnecessary phi eliminationsrc/ir/global_subexpression_elimination.cc
-- GCSEsrc/ir/ir.hh
-- definition of the IR languagesrc/ir/ir_printer.cc
-- printing of the IR languagesrc/ir/make_ssa.cc
-- transforming IR to SSA formsrc/ir/optimize.cc
-- running all optimizationssrc/ir/propagate_constants.cc
-- constant propagationsrc/ir/propagate_copies.cc
-- copy propagationsrc/ir/remove_phis.cc
-- removing all phis, so that the code is probably no longer SSA, but it is ready to be processed by backend
src/latc_x86_64.cc
-- main code, that glues everything togethersrc/persistent_map.hh
-- persistent map implemented using Treap (a randomized BST)src/persistent_map_tester.cc
-- test forsrc/persistent_map.hh
Compiler backend is x86_64 in Intel flavor (NASM is used for assembly). So far it is quite simple and without optimizations. Nonetheless, everything (including memory reclaiming) seems to work.
Compilation of assembly is done in three steps:
- running
NASM
to compile assembly into an object file (*.o
) - stripping local symbols using
strip
becauseNASM
always puts local labels into the symbol table which is totally unnecessary - linking object file into executable using
gcc
Arguments are passed on the stack, in a similar way that is in x86 C ABI.
All registers except RAX
are callee-save, RAX
is used for the result, and is caller-save.
Is realized via reference counting -- just like C++'s std::shared_ptr
. Types that have reference counting: string, classes and arrays.
Are implemented in assembly (using libc) and are pasted at the beginning of every generated *.s
file.
--emit-ir
-- save IR (after applying optimizations) to file (for filefoo/bar.lat
writes IR tofoo/bar.ir
)--disable-destructors
-- disable destructors (garbage collection implementation)--no-optimizations
-- disables all optimizations: constant propagation, GCSE, etc.