You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+12-16Lines changed: 12 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# chocopy-python-compiler
2
2
3
-
Ahead-of-time compiler for [Chocopy](https://chocopy.org/), a subset of Python 3.6 with type annotations static type checking.
3
+
Ahead-of-time compiler for [Chocopy](https://chocopy.org/), a subset of Python 3.6 with type annotations and static type checking.
4
4
5
5
Chocopy is used in compiler courses at several universities. This project has no relation to those courses, and is purely for my own learning/practice/fun.
6
6
@@ -13,12 +13,13 @@ Progress is documented on my [blog](https://yangdanny97.github.io/blog/):
13
13
14
14
This compiler is written entirely in Python. Since Chocopy is itself a subset of Python, lexing and parsing can be entirely handled by Python's `ast` module.
15
15
16
-
This compiler matches the functionality of the first 2 passes (parsing & typechecking) Chocopy's reference compiler implementation, and outputs the AST in a JSON format that is compatible with the reference implementation's backend. That means that you can parse and typecheck the Chocopy file with this compiler, then use the reference implementation's backend to handle assembly code generation.
16
+
The frontend of this compiler matches the functionality of the first 2 passes (parsing & typechecking) Chocopy's reference compiler implementation, and outputs the AST in a JSON format that is compatible with the reference implementation's backend. That means that you can parse and typecheck the Chocopy file with this compiler, then use the reference implementation's backend to handle assembly code generation.
17
17
18
-
Additionally, this compiler contains 2 backends not found in the reference implementation:
18
+
This compiler contains multiple backends not found in the reference implementation:
19
19
- Untyped Python 3 source code
20
20
- JVM bytecode, formatted for the Krakatau assembler
21
21
- CIL bytecode, formatted for the Mono ilasm assembler
22
+
- WASM, in WAT format
22
23
23
24
The test suite includes both static validation of generated/annotated ASTs, as well as runtime tests that actually execute the output programs to check correctness. Many of the AST validation test cases are taken from test suites included in the release code for Berkeley's CS164, with some additional tests written for more coverage.
24
25
@@ -42,6 +43,7 @@ The input file should have extension `.py`. If the output file is not provided,
42
43
- Python source outputs will be written to a file of the same name/location as the input file, with extension `.out.py`
43
44
- JVM outputs will be written to the same location as the input file, with the extension `.j`
44
45
- CIL outputs will be written to the same location as the input file, with the extension `.cil`
46
+
- WASM outputs will be written to the same location as the input file, with the extension `.wat`
45
47
46
48
**Flags:**
47
49
@@ -56,7 +58,7 @@ The input file should have extension `.py`. If the output file is not provided,
56
58
-`hoist` - output untyped Python 3 source code w/o nonlocals or nested function definitions
57
59
-`jvm` - output JVM bytecode formatted for the Krakatau assembler
58
60
-`cil` - output CIL bytecode formatted for the Mono ilasm assembler
59
-
-`wasm` - output WASM as plaintext in WAT format (WIP)
61
+
-`wasm` - output WASM as plaintext in WAT format
60
62
61
63
## Differences from the reference implementation:
62
64
@@ -110,8 +112,6 @@ The `demo_cil.sh` script is a useful utility to compile and run files with the C
110
112
111
113
## WASM Backend Notes:
112
114
113
-
This is WIP, not all features are supported (the binary tree example itself actually does not work, but you can try another one).
114
-
115
115
The WASM backend for this compiler outputs WASM in plaintext `.wat` format which can be converted to `.wasm` using `wat2wasm`:
116
116
1. Use this compiler to generate plaintext WebAssembly
@@ -126,26 +126,22 @@ The WASM backend for this compiler outputs WASM in plaintext `.wat` format which
126
126
The `demo_wasm.sh` script is a useful utility to compile and run files with the WASM backend with a single command (provide the path to the input source file as an argument).
127
127
- To run the same example as above, run `./demo_wasm.sh tests/runtime/binary_tree.py`
128
128
129
-
### WASM Backend - Supported Features:
130
-
- int, bool, string, list
131
-
- most operators
132
-
- assignment
133
-
- control flow
134
-
- stdlib: print, len, and assert
135
-
- globals
129
+
The `wasm.js` file contains all the runtime support needed to run the WASM generated by this compiler. This backend was designed was to minimize runtime JavaScript dependencies, so the only imported functions are for assertions and printing strings/integers/booleans.
136
130
137
131
### WASM Backend - Unsupported Features:
138
-
- nonlocal referencing function param
139
-
- stdlib: input (node.js does not have synchronous I/O out of the box so this is difficult)
132
+
-`input` stdlib function (node.js does not have synchronous I/O out of the box so this is difficult)
140
133
141
134
### WASM Backend - Memory Format, Safety, and Management:
142
135
143
136
- strings (utf-8) - first 4 bytes for length, followed by 1 byte for each character
144
137
- lists - first 4 bytes for length, followed by 8 bytes for each element
145
138
- ints - i64
146
139
- pointers (objects, strings, lists) - i32, where `None` is 0
140
+
- objects - first 8 bytes for vtable offset, followed by 8 bytes for each attribute, followed by 8 bytes for each method index. inherited attribute/method positions are same as parent.
141
+
142
+
Strings, lists, objects, and refs holding nonlocals are stored in the heap, aligned to 8 bytes. Right now, memory does not get freed/garbage collected once it is allocated, so large programs may run out of memory.
147
143
148
-
Strings, lists, objects, and refs holding nonlocals are stored in the heap, aligned to 8 bytes. Right now, memory does not get freed/garbage collected once it is allocated. To provide memory safety, string/list indexing have bounds checking and list operations have a null-check, which crashes the program with a generic "unreachable" instruction.
144
+
To provide memory safety, string/list indexing have bounds checking and list operations have a null-check, which crashes the program with a generic "unreachable" instruction.
0 commit comments