Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing Incremental Compilation for WebAssembly #5985

Closed
ifreund opened this issue Aug 4, 2020 · 3 comments
Closed

Implementing Incremental Compilation for WebAssembly #5985

ifreund opened this issue Aug 4, 2020 · 3 comments
Labels
frontend Tokenization, parsing, AstGen, Sema, and Liveness.
Milestone

Comments

@ifreund
Copy link
Member

ifreund commented Aug 4, 2020

Intro

I'm opening this issue to start a discussion on how best to do incremental compilation for WebAssembly. As far as I know, this is not something that has been done before, so we may need to get creative. Below you will find my current ideas on how this could work, which I will update as things become more clear.

Overview

A wasm binary is divided up into the following 11 sections:

  1. type - list of function signatures
  2. import - list of imports with info (module, name, type)
  3. function - list mapping function index to type index
  4. table - list of tables and their properties (currently only 1 table is allowed)
  5. memory - list of memories and their properties (currently only 1 memory is allowed)
  6. global - list of globals and constant initialization expressions
  7. export - list of export with name, type, and index
  8. start - an optional function index of a start function
  9. element - initialization values for tables
  10. code - bodies of the functions, in the same order as the function section
  11. data - initialization values for memories

Additionally, custom sections may appear before, after, or between these 11 sections. We may therefore use custom sections as padding in order to avoid copying section data around during incremental compilation.

For reference, the spec can be found here: https://webassembly.github.io/spec/core/index.html

Type Section

Types are referenced by index in the following places:

  • function imports in the import section
  • the function section
  • call_indirect instructions in the code section

As I see it, we have a few possible strategies here for how to implement incremental compilation for this section:

  1. Keep the type list equal to the list of all types referenced in the import and function sections in order, simply listing any duplicate types multiple times. On deletion of an import or function delete the corresponding type entry as well, shifting all type entries which appear later in the list. Then update all references to these shifted types in the import and function sections.

  2. Always append new types to the end of the type list. Never delete types in order to avoid invalidating type indexes and needing to update other code.

  3. List only unique types. On adding a new type, first check if it's already listed and use that index if so. Never remove types.

  4. List only unique types. Use reference counting to determine when types may be deleted. Shift types and update references on deletion.

Each of these possibilities naturally has its advantages and disadvantages:

  1. This ensures that unused types will be cleaned up. However the bookkeeping required to achieve this is non-trivial and it could easily end up taking more memory than e.g. option 3 in practice as types may be highly duplicated.

  2. This is the simplest option to implement and maintain. It will also likely be the cheapest computationally as it requires no bookkeeping. However, it is the least space-efficient.

  3. This option is nearly as simple to implement as option 2 and could potentially be much more space-efficient.

  4. This option is the most space efficient but also the most complex to implement as there is significant bookkeeping necessary. This is likely more worth implementing than option 1, but it is unclear how much of an advantage, if any, this would give over option 3 in practice.

My recommendation: implement option 2 first. It is the simplest option and upgrading to option 3 would be fairly easy. While option 4 or something like it would use less space, it is unclear whether or not this would be worth the massive increase in complexity in practice.

Global Section

The global section contains a list of global constants and variables of any of the four value types. These globals are initialized with a constant expression. Globals are referenced by index in the following places:

  • global exports in the export section
  • global.get and global.set instructions in the code section

Options for how to approach incremental compilation of the globals include:

  1. Add new globals to end of list. Update globals in place, potentially shifting memory but never global indexes. Never remove globals in order to avoid shifting indexes.

  2. Same as 1, but keep a list of "dead" globals which are no longer referenced and overwrite these instead of appending to the list if possible.

  3. Some strategy that achieves perfect memory usage of the global list by tracking references for each global index and updating them on shifting indexes.

My recommendation: Implement option 1 first. It is the simplest to implement and likely the fastest-running method. Additionally, the implementation could be extended to use the list of "dead" globals described in option 2 without affecting the rest of the code.

Function, Table, Element, and Code Sections

In wasm, there are two instructions which call a function: call and call_indirect. call takes a function index as a static argument while call_indirect instead takes a type index as a static argument and an index into a table as a dynamic argument, calling the function referenced by the function index found at that location in the table.

This means that if we always use call_indirect instead of call, we can update generate the code section with no direct references to function indexes. Function indexes are then referenced only in the following places:

  • function exports in the export section
  • the start section
  • the element section

Why is it important that functions indexes are not directly referenced in the code section? If they were, we would be unable to re-order functions without updating all function indexes in the entire code section. This is significant as the code section is usually the largest section in the resulting binary by a significant margin. Therefore, updating a function which appears early in the code section may move the function to the end instead of shifting the rest of the code section in the binary.

If this approach is taken, the data stored for each function would be:

  • the index of the table holding the function's current index
  • whether or not the function is the start function (could also read from the start section and compare ids)
  • bookkeeping on where in the export section the function is referenced

An alternative would be to strictly maintain function ordering and pad each function with nop instructions in order to avoid shifting memory due to resize. However, I think this is inferior because of the amount of memory that would be wasted for this padding and because we cannot ensure that function indexes will not be shifted; importing a function prepends to the global function list and implicitly bumps the indexes of all non-imported functions.

Moving functions to the end of the code section on update will also have the effect of making subsequent updates to the same function essentially free, which I believe is highly desirable in practice for incremental compilation.

Import Section

There are 4 types of imports possible: functions, tables, memories, and globals. Of these, we will only import function and global imports in practice as wasm modules are currently limited to a single table and a single memory. Imported functions and globals are prepended to their respective lists, bumping the indexes of all the functions and globals defined in our module.

If we take the path of making all function calls indirect, then performing this index bump on importing a new function should be doable. For globals however, I don't see any good option. Either we simply regenerate the code section with the new global indexes or we track the location of every global index in the code section.

Export Section

The export section is made up of a list of name/index pairs for exported functions, globals, tables, and memories. As these are exports, nothing else in our binary is dependent on the contents of this table, the only challenge is keeping the indexes stored here in sync with the function/global definitions they refer to.

The simplest way to handle this would be to skip incremental compilation for this section and re-emit the section with proper indexes whenever an export is added/removed or an exported function/global index changes.

Alternatively we could maintain a mapping of function/global index to locations in the export section. Note that nothing prevents a single function or global from being exported under multiple different names.

Memory and Data Sections

The memory and data section are where we define and store any chunks of memory that are known at compile time (e.g. static strings). I think our strategy here for how to distribute this data in memory and keep track of it during incremental compilation will end up looking very similar to a simple general purpose allocator implementation. The challenge here is in making sure that all memory which is no longer referenced gets "freed" by our "allocator" so that it can be reused for new static data.

I don't have anything more specific in mind for now, but I think that is OK as the implementation of memory should be fairly decoupled from the implementations of the rest of the sections. It is also not necessary for trivial programs, so we have the opportunity to implement the other sections first and test out the ideas presented above.

@Vexu Vexu added the frontend Tokenization, parsing, AstGen, Sema, and Liveness. label Aug 4, 2020
@Vexu Vexu added this to the 0.7.0 milestone Aug 4, 2020
@andrewrk
Copy link
Member

andrewrk commented Aug 5, 2020

Type Section

I have a counter proposal for this one.

Background

The stage1 compiler is heavily reliant on being single-threaded. It makes heavy use of interning, especially on types. For example, whenever it creates a pointer type, it checks if the same pointer type already exists, and, if so, uses that same type object. It will only ever create the pointer type object once.

Stage2 takes a different approach. It does not yet support multi-threaded compilations, but such support is planned. In order to accomplish this, care has to be taken to avoid a design that relies too much on shared resources - such as type objects. That is why in stage2, the way types are represented is fundamentally different. If you have a look at Type.zig you will see that there are multiple ways to represent the same type, and there is not necessarily a canonical one. This is to save memory, as well as to make it possible for multiple threads to create and manipulate type objects without synchronizing on a mutex. If types were interned, threads would be required to obtain a mutex before checking/performing the internment.

Internment in this discussion is equivalent to the "List only unique types" strategy listed above in points (3) and (4).

Observation: Unused Types can be Exploited for Padding

Based on my understanding of the wasm binary layout (which is entirely based on this github issue; I have not yet followed up on any of the links of the official specs), it appears that it is possible to have types in the Type section that are unused. Is that correct?

In this case, it should be possible to insert arbitrary "space" into the Types section by inserting intentionally unused types to be used as padding. By keeping track of where these spaces are, it is possible to treat the Types section as something that can be allocated and freed from, without shifting of indexes being necessary.

Conclusion

In the interest of preparing the codebase for multi-threaded compilation, and for simpler memory lifetime management that matches the already existing Decl memory management strategy, I would suggest an approach closer to (1). The idea would be to keep each Decl allocated a corresponding block of indexes from the Types section. If a Decl grew to need more types, it would start overwriting the unused Types, and if it outgrew that padding, only the Types data for that one Decl would get moved to the end of the Types section, and in its original place would be replaced by unused Types for padding. The main idea here is to treat each Decl independently, and have room to grow/shrink for each Decl within the Types section.


I haven't made it to reading about the other sections yet, but I will do that tomorrow!

@ifreund
Copy link
Member Author

ifreund commented Aug 18, 2020

A PoC for this approach has been implemented in #6056 (with slightly different details). However, I'm starting to think this path is leading us towards a local maximum. There's another possible approach where we store much more data in memory, including buffers of generated code, and rewrite the entire file on each update.

The various indexes would remain unknown until actually writing the file, which means that there would be no padding needed in the resulting binary. This would make future multithreading much easier as code could be generated in parallel without contention/synchronization over indexes. It also has potential to be faster as there would be no copying around of data in the output file and the resulting file would be much smaller and therefore faster to write. The resulting lack of indirection for function calls and significantly smaller binary size would presumably have a positive impact on the performance of generated binaries as well. The disadvantage is of course increased memory usage of the compiler.

Overall, the approach seems promising and I plan to investigate it before further work on the initial proposal above.

@ifreund
Copy link
Member Author

ifreund commented Aug 19, 2020

A proof of concept for the in-memory approach described in the previous comment has been implemented in #6088. I don't think there's anything actionable left in this issue.

@ifreund ifreund closed this as completed Aug 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
frontend Tokenization, parsing, AstGen, Sema, and Liveness.
Projects
None yet
Development

No branches or pull requests

3 participants