New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tidy up preprocessor and parser #12
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
These don’t really affect the preprocessor because it doesn’t care about the details, but it’s less confusing if the modules in the tests are at least syntactically valid.
Since almost all of the preprocessor tests are for fields within a single module body, we can reduce boilerplate (and indentation levels) by introducing an assertion which automatically wraps its arguments in `(module …)`.
Most of the preprocessor tests use syntax which relies upon the ability to omit an explicit type use [0] and have it automatically inserted when abbreviations are expanded. To prevent these tests from breaking when we implement desugaring of type use abbreviations we need to update them to contain full type uses which are unaffected by the expansion of that abbreviation. When adding an unabbreviated type use, I’ve removed the corresponding inline declarations unless they’re the focus of the test. [0] https://webassembly.github.io/spec/core/text/modules.html#abbreviations
This makes it easier to see what’s gone wrong.
As with `offset` and element expressions, the test is demonstrating desugaring of instructions in an invalid program because that’s the correct behaviour for the parsing phase regardless of validation. In practice only constant instructions [0] may appear inside a global definition [1] but we have no way to demonstrate desugaring of constant instructions yet. [0] https://webassembly.github.io/spec/core/valid/instructions.html#valid-constant [1] https://webassembly.github.io/spec/core/valid/modules.html#globals
We need to desugar an inline import after expanding it because its description can contain other abbreviations (e.g. multiple anonymous parameters and results in the case of function imports). I overlooked this earlier because I didn’t write a test to cover it and presumably the official WebAssembly test suite doesn’t cover it either, so no tests failed when I removed inline import support from the parser.
These are the only two remaining kinds of field and neither supports any abbreviations, so we can list them as an intentional no-op case and have all field kinds explicitly represented.
This separation exists in most places in the preprocessor, so for the sake of stylistic consistency we should do it everywhere.
…cessor There’s no reason for these to be different, and the longer names are clearer.
It’s still the wrong way of doing it, but at least now it’s only wrong once.
This logic is repeated in a lot of places, so let’s extract and reuse it. IDs are essentially always optional so I’ve chosen to write a single method which unconditionally tries to read an ID (and may return nil) rather than a pair of #can_read_id?/#read_id methods. That does mean we have to check for `id.nil?` in a couple of places rather than asking a dedicated predicate, but it’s rare enough that I think this trade-off is the right one.
…uction This is how we do it everywhere else, and there’s no reason for it to be different here. The pattern match also has the advantage of tolerating non-string values (e.g. arrays), which makes the #can_read_list? call unnecessary. Removing the numbered parameter changes the block’s arity, so we need to add an explicit dummy parameter to keep #repeatedly happy. In hindsight I don’t think the `until:` API is a good idea so I’m going to change that soon.
This provides a nice expressive interface for dealing with indexes, moves a complicated regexp out of the parser, and puts the implementation in a helper where the preprocessor will be able to share it in future.
In 8ea2aa0 I removed unnecessary `?:` from regular expressions which already didn’t capture unnamed groups (because they also contained named ones), but I neglected to check whether other regular expressions were unintentionally capturing unnamed groups, so I’m fixing that now. Although this adds more syntactic noise to the regular expressions, I think our intent is clearer if we only capture groups which we intend to use, so I’m prepared to pay the price.
The pattern match will raise an exception if it fails, so there’s no need to do it ourselves.
The grammar says a type use appears here [0] so we should be reusing all of the appropriate parsing machinery. This creates the opportunity to check that no named parameters appear in the type use, which we’ll do in the next commit. [0] https://webassembly.github.io/spec/core/text/instructions.html#text-call-indirect
We already do this in #parse_blocktype because of its side condition which requires a type use to contain only unnamed entries [0]. The same side condition applies to `call_indirect`’s type use [1] so we should perform the same check. [0] https://webassembly.github.io/spec/core/text/instructions.html#text-instr-block [1] https://webassembly.github.io/spec/core/text/instructions.html#text-call-indirect
The grammar says zero or more results appear here [0] so we should be reusing the appropriate parsing machinery. [0] https://webassembly.github.io/spec/core/text/instructions.html#parametric-instructions
These are indexes [0], so we should be reusing the index-parsing machinery. [0] https://webassembly.github.io/spec/core/text/instructions.html#table-instructions
This’ll allow the loop to be terminated by the `StopIteration` exception instead of needing to support custom conditions through arguments.
It’s admittedly a bit “exceptions for control flow”, but this way of using `StopIteration` is idiomatic in Ruby so I think giving the caller arbitrary control over loop termination is easier to understand than relying upon a mystery `until:` argument. I’ve resisted the temptation to push `raise StopIteration` further down the call stack (into e.g. #read_list, or even #read) to avoid spooky action at a distance; I think the logic is easiest to understand when `raise StopIteration` appears syntactically within the #repeatedly block. I’ve also chosen a guard-style raise (`raise … if …`) rather than putting the whole body of the block inside a conditional (`if … else raise … end`) because the former puts the `raise` lexically adjacent to the #repeatedly call instead of pushing it further away (e.g. inside the `else` clause) which should improve clarity. This choice also helps to keep the indentation levels under control.
We always use `StopIteration` now.
… parser This avoids us having to do the concatenating ourselves. Again I’ve chosen guard-style `raise StopIteration if …` clauses to keep them adjacent to the #repeatedly call for clarity’s sake.
I’d like to use destructuring with optional block parameters in the implementation of #unzip_pairs, but there’s a bug in YJIT [0] which prevents this from working. The easiest solution for now is to turn YJIT off. [0] Shopify/yjit#313
When `pairs` is empty then `pairs.transpose` is `[]`, which gets destructured into a pair of nils for the two block parameters. We can replace the nils in this case by defaulting each parameter to the empty array.
The only uses of #unzip_pairs are to unzip the results of these methods, so we might as well have them return an unzipped result.
This was the only caller of #parse_declarations which didn’t unzip its result with #unzip_pairs. By doing it consistently across all callers we create the opportunity to have #parse_declarations return unzipped results directly, which we’ll do in the next commit.
The only uses of #unzip_pairs are to unzip the result of the #parse_declarations method, so we might as well have it return an unzipped result.
This is now the only caller of #unzip_pairs, so we can safely build it into #parse_declarations directly.
Validation constrains modules to at most one memory [0] but the parser doesn’t need to worry about that. It creates more regularity in the parser to support arbitrarily many memories (just like functions, tables, globals etc) and let validation deal with any problems. [0] https://webassembly.github.io/spec/core/valid/modules.html#valid-module
Now that these are all arrays (except for `start`) we can consolidate the setup of their initial values.
I’d like to support multiple memories here to match the AST parser, and that change will be easier if we assign `current_module.memory` to a local variable instead of repeatedly using it inline.
Validation won’t actually allow multiple memories [0] but it makes the implementation more regular, and more like the formal WebAssembly language specification, if we assume they’re possible. All of the memory instructions implicitly operate on memory index zero [1] so we’ve now made that explicit in their implementation. The #initialise_memories method no longer raises an error if the memory index is non-zero; this check should’ve been happening during validation anyway (which we’re not doing yet) so losing it doesn’t matter. [0] https://webassembly.github.io/spec/core/valid/modules.html#valid-module [1] https://webassembly.github.io/spec/core/syntax/instructions.html#memory-instructions
Now that modules have multiple memories, we can use the same approach for all relevant fields during allocation [0]. This makes it clear that exports are different and shouldn’t be “built” in the same way; we could’ve already seen this from the definition of module allocation [1]. I’m not going to get distracted by making any more changes to the interpreter at the moment, but we should come back to this in future. [0] https://webassembly.github.io/spec/core/exec/modules.html#allocation [1] https://webassembly.github.io/spec/core/exec/modules.html#alloc-module
tomstuart
added a commit
that referenced
this pull request
Jul 19, 2023
To desugar the type use abbreviation [0] we’ll need access to a list of all the module’s type definitions [1] when preprocessing a type use [2], but the full list of type definitions will only be available once the complete module has been processed. This creates a chicken-and-egg situation: we can’t desugar a type use until we’ve processed all of the enclosing module’s fields, but many of those fields will contain type uses which need to be desugared during processing. This commit breaks the deadlock by refactoring the preprocessor to allow some field-processing methods to return a “deferred result”, namely a proc which must be called with a list of type definitions in order to produce the final S-expression. Deferred results give us the ability to process a module’s fields first and then provide type definitions later; the next step will be to generate the type definitions during processing so we can make use of that ability. This continues the preparatory work from #12 and #13. We’re still in the “make the change easy” [3] phase, so there’s no externally visible difference in the preprocessor’s behaviour yet. [0] https://webassembly.github.io/spec/core/text/modules.html#abbreviations [1] https://webassembly.github.io/spec/core/text/modules.html#types [2] https://webassembly.github.io/spec/core/text/modules.html#type-uses [3] https://twitter.com/KentBeck/status/250733358307500032
tomstuart
added a commit
that referenced
this pull request
Jul 20, 2023
) The #process_module method is ready to provide type definitions to the deferred results produced by preprocessing the module’s fields so that the type use abbreviation [0] can be desugared, but we don’t currently have any way to communicate those type definitions to #process_module. Although the processed fields do include type definition fields, they’re all deferred until we’re ready to provide the type definitions — another chicken-and-egg situation. This commit adds a “list of type definitions” return value to #process_type_definition and refactors other preprocessor methods to allow that value to propagate up to #process_module. The next (and final!) step will be to populate it with real type definitions and use those to desugar the type use abbreviation. This concludes the preparatory work begun in #12, #13 and #14, and therefore still doesn’t introduce any externally visible changes to the preprocessor’s behaviour. [0] https://webassembly.github.io/spec/core/text/modules.html#abbreviations
tomstuart
added a commit
that referenced
this pull request
Aug 20, 2023
…tions (#17) Despite #12, the preprocessor, parser and interpreter (and their tests) were still a little messy and inconsistent. This commit cleans them up to fix many minor annoyances in preparation for implementing instruction abbreviations. The problems addressed are mostly related to naming and code organisation. The largest change is that all of the preprocessor tests have been updated to no longer depend upon the remaining instruction abbreviations (i.e. folded instructions and missing `call_index` index) which we intend to desugar.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The preprocessor and parser (and their tests) were becoming a little messy and inconsistent. This PR cleans them up to fix many minor annoyances in preparation for implementing the type use abbreviation.
The problems addressed are mostly related to naming and code organisation. The most significant change is that the instructions inside a global definition’s initialiser expression and the expanded result of an inline import are now being preprocessed correctly; I’d previously overlooked the need to do this because neither case is exercised by the WebAssembly test suite for any of the abbreviations implemented so far.