Skip to content

thesephist/schrift

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
src
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Schrift 🚄

Schrift is an experimental runtime for the Ink programming language, focused on performance and observability.

Schrift is currently ⚠️ under development ⚠️. Many parts of the runtime are not working yet. Specifically, Schrift currently lacks an implementation of the module system, most system interfaces, and an event loop. Schrift also doesn't implement a repl yet.

Motivation

I first wrote the Ink language and its Go-based interpreter as a toy project to learn about parsers and interpreters. Because of that provenance, the original Go interpreter has lots of shortcomings, especially in performance and runtime instrumentation capabilities. The Go runtime for Ink depends on the Go runtime for a call stack, the event loop, and memory management, which is convenient and fast, but limits control.

Schrift is my second attempt at an Ink interpreter, focused on performance and debugging, and better architecture in general. It's not designed to be a complete replacement, but if Schrift is successful, I think it can be better than the Go-based interpreter in almost every metric.

Goals

  • Performance. Schrift will run Ink programs quickly and efficiently, and allow for new optimizations to be introduced into the interpreter easily.
  • Correctness. Schrift's implementation of Ink will try to be 100% compatible with the existing implementation.
  • Debugging experience. Schrift will produce great error messages against erroneous programs, and allow errors to produce useful stack traces.
  • Instrumentation, specifically support for profiling and tracing at runtime, which is just not possible in the Go interpreter's design.

Why Rust?

I chose Rust for Schrift for two reasons. First, I wanted to learn Rust, and this seemed like a good learning project with known parameters that catered well to Rust's strengths as a systems language. Second, in order to see all the improvements I wanted out of a new interpreter design, I wanted as much control over the runtime and memory management as possible, and Rust offers a high degree of control without sacrificing safety.

Usage

Because Schrift is currently under development, there aren't any built binaries you can download. To try Schrift, You'll need to build the project from source.

Clone the repository and open the project

git clone https://github.com/thesephist/schrift.git
cd schrift

You can use Cargo to build a debug or release binary.

cargo build # debug
# -> saved to target/debug/schrift

cargo build --release # release
# -> saved to target/release/schrift

Schrift can currently only run programs from files. There are a few example programs in the ./test directory. You can run any of them like

./schrift test/002.ink

Schrift takes command line flags for debugging the compiler, to expose output of the tokenizer, the parser, and the compiler. These flags are available:

  • --debug-lex: print list of tokens
  • --debug-parse: print AST nodes
  • --debug-analyze: print AST nodes after static analysis transformations
  • --debug-compile: print generated bytecode
  • --debug-optimize: print generated bytecode after optimizations

For example, to see the generated bytecode for test/000.ink, run

./schrift --debug-compile test/000.ink

Design and implementation

Schrift is based on a bytecode compiler with a register-based virtual machine backend. You can read a detailed overview of Schrift's internals on the Ink blog. The Schrift interpreter has 5 stages.

Parse lex.rs, parse.rs

Schrift contains a lexer and a hand-written recursive descent parser for the full Ink language grammar. Comments are discarded by the parser, which produces an abstract syntax tree consumed by the static analyzer and compiler.

Static analysis analyze.rs

The static analyzer performs some light AST transformations, and makes annotations to AST nodes where it is helpful for the code generator, which is the next step in the pipeline. At the moment, the static analyzer will catch some semantic errors, but is mostly a no-op. However, this stage exists to create a space for potential static analysis operations to take place in the future.

Notably, this step would be an ideal place to perform any normalization of expressions for easier code generation. For example, it may convert unnecessary expression groups (e.g. an expression group that wraps a single AST node) into its containing node, which will generate faster code. Static analysis is also a good potential step to do constant folding, which is more difficult after compilation.

Compilation, i.e. (byte)code generation gen.rs

The compiler transforms the AST into a series of bytecode blocks that link together into a format executable by the Ink virtual machine.

Schrift's bytecode is register-based and designed to be an optimized single static assignment (SSA) form of the program. Each function and expression list (block) in Ink is compiled to a separate contiguous block of bytecode, called a Block, to allow for incremental compilation and replacements of parts of a program in a repl. An Ink program is then compiled into a flat list of Blocks that reference each other to form a call graph. Here's a sample Block from the Fibonacci sequence sample (test/007.ink), with annotations. You can produce this output by running Schrift with the --debug-compile flag.

#5                  # this is Block #5 in the program

consts: [           # constants used in this block
    0,              # number constant
    Func(2, []),    # Funcs are other blocks we can jump to
    1,
    Func(3, []),
    Func(4, [])
]

binds: [6]          # implementation detail for closures,
                    # references a parent scope's register
  @0    NOP
  @2    LOAD_CONST 0        # load constant from constant pool
  @3    LOAD_CONST 1
  @1    CALL_IF_EQ @3, @0 == @2, 2
                    # CALL_IF_EQ is the only branching
                    # construct in Schrift. It calls a closure
                    # if two register values are equal.
  @4    LOAD_CONST 2
  @5    LOAD_CONST 3
  @1    CALL_IF_EQ @5, @0 == @4, 1
  @6    NOP
  @0    ESCAPE @0           # escape stack value to vm heap
  @7    LOAD_ESC 0          # load escaped value to stack
  @8    LOAD_CONST 4
  @1    CALL_IF_EQ @8, @0 == @6, 0

The Block is the atomic unit of control flow in Ink. Code cam jump ahead in a Block, but all other control flow is achieved through direct calls of other Blocks in the code.

The Schrift bytecode format tries to take advantage of data locality in the processor as much as possible, and provide a representation of the program fit for the optimization step.

Optimizations optimize.rs

Schrift optimizes generated bytecode before execution. Most optimizations are local to each bytecode Block, and therefore take after peephole optimization techniques. The optimizer tries to:

  • De-duplicate constants
  • Merge and remove redundant instructions
  • Maximize register reuse
  • Monomorphize function calls by argument count (which are variadic when initially generated)

In the future, I'd also like the optimizer to perform:

  • Common subexpression elimination
  • Dead branch / code elimination

Notably, at the moment, inlining optimizations are currently out of scope. This is because the bytecode format lacks a general backward jump instruction that we need to implement a non-tail-recursive loop.

Schrift virtual machine vm.rs

The "virtual machine" and the runtime are closely linked. Currently, the virtual machine is a naive implementation of some call stack and heap state that implements the bytecode specification. I haven't had a chance to come up with a thought-through design yet, so there isn't much to note here.

The VM keeps a growable heap in a Vec<gen::Val> with pointers from the call stack, and has specialized instructions for escaping values from the stack to the heap. The stack is a growable array of registers that can hold values, and each frame points to its bytecode Block which holds some code metadata.

Runtime and garbage collection runtime.rs

Primitive values in Schrift (all values except the composite value "list" or "object") are stack-allocated by default. This makes many use cases of local variables like loop counters efficient. When values are assigned to composites or captured in closures, the escape the local scope, and are heap allocated after-the-fact with the ESCAPE VM instruction that replaces a register value with a reference-counted pointer to the heap.

At the moment, Schrift uses atomic reference counting to manage Ink's heap-allocated memory. This is in contrast to the Go-based Ink interpreter, which uses the Go runtime's garbage collector. This decision was made tentatively for a few reasons:

  • ARC is more memory efficient.
  • ARC tends to have lower GC latency than a mark-and-sweep GC
  • ARC is better suited to Rust's ownership model.
  • ARC trivially allows the runtime to become multithreaded.

Some open questions around GC and memory management in Schrift are being tracked in Issue #2.

About

A more experimental runtime for Ink, focused on perf and instrumentation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published