Skip to content

Support componentizing modules that have duplicate imports. #2145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 6, 2025

Conversation

erikrose
Copy link
Contributor

@erikrose erikrose commented Apr 15, 2025

When the --deduplicate-imports option is passed to wasm-tools component new, we rewrite the input module to compact out the duplicates. This allows tinyGo's output to be wrapped in a component. Closes #2063.

Using Reencode takes the maintenance pressure I feared off the deduplication engine; it won't have to chase future changes to the wasm format. Instead, Reencode should adapt to them and continue calling function_index() in all the right places.

Suggestions on a better place to put wasi_snapshot_preview1.wasm adapter are welcome.

@erikrose erikrose force-pushed the rewrite-dupe-modules branch from 0e0b765 to 4feaa10 Compare April 15, 2025 21:18
@erikrose erikrose changed the title Add beginnings of function-deduplicating module rewriter, which will allow tinyGo output to be wrapped in a component. Add function-deduplicating module rewriter Apr 16, 2025
@alexcrichton
Copy link
Member

To confirm on this: is it ok to lose all debuginfo for these modules? The downside of rewriting the module is that it changes all the binary offsets of instructions in a module which causes debuginfo to become invalidated. Does TinyGo have DWARF debugging information that would be lost?

If debuginfo would be lost, that's where in #2063 I was thinking that the imports would all be preserved (as to not tamper with the function index space) and then the instantiation would provide the same function multiple times to the module.

If debuginfo is not lost, then this strategy is probably ok too. I'd prefer if it were off-by-default though due to the consequences of losing debuginfo and rewriting the binary, so mind adding a CLI flag along the lines of --dedupe-imports-if-necessary or something like that?

@erikrose
Copy link
Contributor Author

erikrose commented Apr 21, 2025

In an ideal world, I too would prefer that the import-preserving strategy had worked, but it gradually became a real rat's nest. I had the imports renaming and was getting close to having the requisite aliases created and reimported into a go-between module that would handle the field-level renamings. But at that point I looked ahead and saw that not only would the types of imports need to be changed (AdapterExports would no longer be coming directly from adapters, for instance) but the shimming machinery would need attention as well, along with who knows what next. There seemed to be a pattern of more discovered complexity as I went, and the code to support what ought to be a niche and perhaps even temporary feature was spreading all over the codebase.

So, feeling discouraged, I took a day to spike out the rewriting approach to weigh its complexity, and, what-do-you-know, it turned out simple, short, and centralized. I had avoided this tack at first for fear of a maintenance burden as constructs get added to wasm, but it turns out Reencode already provides a function_index() hook, so no further burden is incurred.

The casualty, as you point out, is custom sections that refer to byte offsets. @sunfishcode's thinking was to be extremely conservative, stripping any custom sections except ones known not to contain byte offsets: specifically, producers and target features. That way, we at least don't start spitting out (potentially non-obvious) garbage during a debug session. I think it's totally reasonable to add a flag to avoid surprise, however. (To answer your question, it looks like TinyGo does emit debug info.)

@alexcrichton
Copy link
Member

I agree yeah the Reencode trait makes this rewrite pretty easy, that was some of the main motivation for developing the trait! The downside though is that this functionality will have to be off-by-default as it breaks debuginfo. That's why I'd be curious to push on the import renaming strategy a bit more perhaps?

I'm a bit surprised that the implementation became pretty complicated, but I also didn't read the first few commits in fine detail so I could very well be missing things too. My hope would be that validate_module would get updated to take not just the "raw module" but a map of some kind tracking import renaming. For example duplicate imports would always get rewritten and the map here would say "import foo::bar2 was originally from foo::bar" or something like that).

Then the ImportMap would take this auxiliary information and would use it during the classify function, e.g. by overwriting the module/field combo that's being used to perform lookups with what the original names were. That would then naturally become an Import and hook up the rest of the machinery in wit-component. My hope is that after the modifications in validation.rs, plus import renaming, you wouldn't need to do anything else. In theory adapters/exports/shims/etc all wouldn't need to be updated.

@erikrose
Copy link
Contributor Author

erikrose commented Apr 23, 2025

I'm a bit surprised that the implementation became pretty complicated

You're not alone! :-)

You're welcome to peruse my abandoned branch, though it's messy and has a few course reversals. The commit messages are a decent read, but the code is best read all at once. As you can see, I tried to do most of my intervention in insert_import() (where I mapped the import names to unique ones) and encode_core_instantiation() (where I used the mappings).

Since you point to classify() in your comment and I hadn't really considered doing anything to it, I'll have one more look with that in mind. I have some other deadlines coming up, so I'll have to be careful with my time, but maybe something elegant will emerge! If not, this Reencode approach could at least be a temporary solution. It kicks in only for modules which currently don't work at all, and we could always replace it with an in-place renaming later if we figure out how.

@alexcrichton
Copy link
Member

Ah yeah I think the main difference is that the structure of ImportMap doesn't change. For example with:

``wasm
(import "wasi_snapshot_preview1" "fd_write" (func (type 5)))
(import "wasi_snapshot_preview1" "fd_write" (func (type 5)))


that'd get rewritten to:

```wasm
(import "wasi_snapshot_preview1" "fd_write" (func (type 5)))
(import "wasi_snapshot_preview1" "fd_write_2" (func (type 5)))

which would then do "duplicate work" for each import to determine that they both get wired up to the adapter's fd_write import. During classify of wasi_snapshot_preview1::fd_write_2 an auxiliary map would be consulted to determine "ok we're actually classifying wasi_snapshot_preview1::fd_write instead" but the result is written to the import of wasi_snapshot_preview1::fd_write_2. That way I don't think much in encoding.rs it needs to change, it'll see "just yet another import" where the core wasm name is guaranteed unique via rewriting

@erikrose
Copy link
Contributor Author

Tracing through this now. So far, what I've (re-)learned is that classify() doesn't do anything with field names (at least as far as imports of functions from adapters goes)—just module names. So I'm back to concluding that the mapping has to be applied during encoding.

@erikrose erikrose force-pushed the rewrite-dupe-modules branch 4 times, most recently from 2d5c417 to 1b89d9b Compare May 1, 2025 20:57
@erikrose erikrose changed the title Add function-deduplicating module rewriter Support componentizing modules that have duplicate imports. May 1, 2025
@erikrose erikrose marked this pull request as ready for review May 1, 2025 21:05
@erikrose
Copy link
Contributor Author

erikrose commented May 1, 2025

@alexcrichton Okay, I think we're ready to rock here! I gave the non-rewriting approach another Ideal Engineering Day and kicked over several obstacles but ultimately ran out the clock. (Here's where I got, in case you're curious.) This version, however, works great and is ready for your review. Thanks so much for all your help getting this far!

erikrose added 2 commits May 2, 2025 09:50
When the `--deduplicate-imports` option is passed to `wasm-tools component new`, we rewrite the input module to compact out the duplicates. This allows tinyGo's output to be wrapped in a component. Closes bytecodealliance#2063.

Using `Reencode` takes the maintenance pressure I feared off the deduplication engine; it won't have to chase future changes to the wasm format. Instead, `Reencode` should adapt to them and continue calling `function_index()` in all the right places.
@erikrose erikrose force-pushed the rewrite-dupe-modules branch from 1b89d9b to 549e0f3 Compare May 2, 2025 13:50
Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me, just some minor comments below.

FWIW I think one thing missing on your other branch is we'll still want to rewrite the module, but instead only the import section instead of the entire module. This PR drops imports, where what I was thinking was that we'd rename imports in-place. That I think would integrate much well with the rest of the system, basically "pretend the user specified some other module". Again though I'm ok if you prefer this approach, but wanted to leave this thought nonetheless.

Comment on lines 1 to 14
;; RUN: component new % -o component.wasm --adapt tests/cli/wasi_snapshot_preview1.wasm --deduplicate-imports
(module
;; The second of these imports gets stripped out, letting `component new` succeed.
(func $i (import "wasi_snapshot_preview1" "proc_exit") (param i32))
(func $j (import "wasi_snapshot_preview1" "proc_exit") (param i32))
(func (export "exported_func")
i32.const 42
call $i
i32.const 42
call $j
)
(memory (;0;) 1)
(export "memory" (memory 0))
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test might be better suited for crates/wit-component/tests/components/* due to the multi-file nature of this test. That'll also avoid the need to check in a wasi_snapshot_preview1.wasm binary since there are already other tests with textual forms of adapters in that directory too. Additionally that'll avoid the need to have -o component.wasm which isn't a checked-in test expectation either, the test expectation will show up as text.

The main thing is you'll have to add new plumbing for --deduplicate-imports in that test suite since right now all the component tests are run with default settings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, it's fun and funny that adapt-wasi-snapshot-preview1.wat, over in wit-component, chose proc_exit as a "hello world" routine, just as I did.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alrighty, how does that look? I did pretty much the simplest thing that could possibly work to signal that the harness should deduplicate. (Since we got this far without needing these sorts of options, I don't expect a mad gallop of more. And if there is a need that exceeds this syntax, it's easy to change the spelling of this one test.)

@alexcrichton alexcrichton added this pull request to the merge queue May 6, 2025
Merged via the queue into bytecodealliance:main with commit d2c36f4 May 6, 2025
32 checks passed
@erikrose
Copy link
Contributor Author

erikrose commented May 6, 2025

Thanks, Alex!

alexcrichton added a commit to alexcrichton/wasm-tools that referenced this pull request May 12, 2025
This commit is a refinement to the support added in bytecodealliance#2145 to support
duplicate imports by default in wasm-tools, namely by avoiding changing
any binary offsets in a module except for the import section. This
implements a strategy of renaming imports in the module that goes into a
component to be unique. Classification of what an import maps to is then
ensured to use the original names, not the current names, in the module.
github-merge-queue bot pushed a commit that referenced this pull request May 27, 2025
…2187)

* Support duplicate imports by default in `wasm-tools component new`

This commit is a refinement to the support added in #2145 to support
duplicate imports by default in wasm-tools, namely by avoiding changing
any binary offsets in a module except for the import section. This
implements a strategy of renaming imports in the module that goes into a
component to be unique. Classification of what an import maps to is then
ensured to use the original names, not the current names, in the module.

* Fix clippy lints

* Review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tolerate duplicate imports when creating components
2 participants