Description
(Apologies if this has been asked before, I wasn't sure how to search for this)
I am building a programming language that will initially target wasm, using WasmTime as a runtime. I have been struggling to find a convenient way to actually build a wasm module manually in code without first going through the binary or text representation.
Essentially the high level goal is to translate my language's AST structure to an equivalent wasm AST structure that can then be encoded to .wasm
or printed to .wat
.
This is a summary of each relevant crate in wasm-tools
as far as I've been able to tell:
Crate | Input | Output |
---|---|---|
wasm-encoder |
AST* | binary |
wasmparser |
binary | events representing AST |
wasmprinter |
binary | WAT |
wat |
WAT | binary |
wast (parsing) |
WAT | AST |
wast (encoding) |
AST* | binary |
The only options that take an AST as input are wasm-encoder
and wast
.
wasm-encoder
is a great abstraction, but it is fairly low-level and still depends on the caller to manually do some things that could be done for the caller, such as:
- Encoding sections in the correct order
- Ensuring that every function and function import has a corresponding type in the types section
- Maintaining index spaces
- Maintaining sections at all (as opposed to dealing only with functions, tables, etc.)
A better abstraction would provide an API that works effectively like the WAT format, such as:
- Adding module items in any order and internally grouping them into sections
- Inlining functions with type definitions
- Inlining exports
- A simpler way to reference module items instead of using indices
- Inlining data and element segments with their corresponding tables and memories
This brings me to wast
, which provides an API similar to this. However, the API has some limitations that make it clumsy to use in the way I am intending:
- The AST structure makes heavy use of borrowed strings. This makes sense because it allows the AST to simply reference the parsed source text without cloning, but it prevents assembling the AST using owned strings, which is difficult to work around (I tried a sort of "string container" that could be referenced by the AST but it felt very clumsy).
- The structure mirrors the WAT syntax, which is obviously needed since that's what it parses, but it's pretty verbose for my use case. For example, things like
$
ids and@name
annotations could be consolidated into a single concept that wraps around indices. - Most types include a
Span
that references a byte offset in the source text, which is not needed when there is no source text.
All of these things can be worked around, but I still think there is value in having a separate crate that exposes a WAT-like API that makes it easy to assemble wasm module structures. These modules could then be encoded directly to WAT or binary.
As a side note, if anyone is aware of an existing crate that does this, I would love to take a look.