Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StringLifting? #7370

Open
kripken opened this issue Mar 13, 2025 · 4 comments
Open

StringLifting? #7370

kripken opened this issue Mar 13, 2025 · 4 comments

Comments

@kripken
Copy link
Member

kripken commented Mar 13, 2025

It would be nice to allow toolchains to emit magic JS string imports all the time, which would make the output immediately runnable in VMs. That would be instead of emitting stringref and letting Binaryen lower it. The benefit of stringref is that Binaryen can optimize strings (it has them in the IR), but in a debug build you don't need that, and just want to run the build. Right now, toolchains can do some work to emit either JS string imports or stringref, depending on build type (debug or optimized), but we could save them the effort if Binaryen could read JS string imports and turn them into optimizable stringref.

We already have a StringLowering pass that turns stringref into JS string imports, which works well, so we could have a StringLifting that does the inverse. However, the inverse problem is a lot harder, consider e.g.

(module
  (import "\'" "foo" (global $string.foo externref))
  (func $use
    (local $temp externref)
    (local.set $temp (global.get $string.foo))
  )
)

We can turn that imported JS string into a string.const, but the type would change from externref to stringref, and no longer fit in the local.

The existing lowering pass handles this by just turning every stringref into externref, which is fine as the goal is to lower away all native wasm strings. But we can't do that in a lifting pass, as there might be legitimate and unrelated externref uses to keep.

Inferring the types to change (in locals, globals, params, results, struct and array fields, tags, etc. etc.) would be... challenging, and likely brittle.

If we used type imports this could work - toolchains would not use raw externref but something more specific. But that proposal is far off (phase 1), so toolchains can't depend on it.

We could use custom annotations instead. I looked a little into how that might work, but it seems like in contexts.h, where we get the annotations, we'd need to do something with them. That seems like a widespread and annoying change at the parsing level. Perhaps instead we could stash the annotations on the IR or on the side (like we do with debug info), and then a lifting pass could use that?

To be honest that doesn't seem very appealing either, both in terms of needed work on the Binaryen side, and toolchains - they'd need to add these annotations everywhere, and forgetting some annotation - say on some struct field - would lead to very odd errors.

As all of this is meant to save toolchains time, I looked at a huge 33MB wasm file from Java (the largest file I have that uses strings). Running --string-lowering-magic-imports (and reading and writing the binary) takes 14 seconds on my modest laptop - on a beefy machine it would be significantly faster. So we are talking single digits of seconds here, most likely. In that case, the benefit to toolchains seems pretty modest?

@tlively What are your thoughts here?

@tlively
Copy link
Member

tlively commented Mar 13, 2025

The benefits are much larger than just saving some optimization time. I expect most producers not to emit stringref at all and to directly use the standard string builtins instead. This is both to keep wasm-opt an optional component of their toolchain and to avoid straying from the bounds of standard Wasm. For such producers, we simply cannot do any string optimizations today. With string lifting, these producers would newly benefit from our optimizations.

To fix the type problem, we should keep our string type but make it a subtype of extern rather than any. It's ok that this does not follow the stringref proposal because unlowered stringref cannot be run anywhere that matters.

There are more subtle typing problems that come from introducing more refined types internally than we can emit in the final output. What if two rec groups differ only in that one uses externref where the other using stringref? When the binary is written, those rec groups will suddenly and unintentionally start defining the same types.

We will have the same problem with using exact types internally even when custom descriptors are not enabled, so I have it on my plate to solve this. The same solution can apply to stringrefs as well.

For parsing, we can either unconditionally parse calls to the properly typed and named imports as string operations, or we can do so only when a string feature is enabled or other command line flag provided, or we can do so only when the imports are annotated.

For simplicity and to do the best thing in the most cases, I would always parse calls to the imports as string operations with a command line flag to opt out of this behavior.

@kripken
Copy link
Member Author

kripken commented Mar 13, 2025

To fix the type problem, we should keep our string type but make it a subtype of extern rather than any

Makes sense, I see how making string a subtype of extern can work here.

The benefits are much larger than just saving some optimization time. I expect most producers not to emit stringref at all and to directly use the standard string builtins instead.

I agree that this would allow us to optimize more producers' code. But do you have any in mind?

AFAIK Java uses stringref already while Kotlin and Dart do not use imported JS strings (at least that is what I see in files from recent testcases).

(Your point still stands that producers may use string imports and it would be nice to optimize those, but this would feel a lot more worth fixing if we have such examples already.)

@tlively
Copy link
Member

tlively commented Mar 13, 2025

I know Dart is transitioning to builtin strings, and I would expect Kotlin to use them eventually as well, at least for explicit JS interop use cases.

@kripken
Copy link
Member Author

kripken commented Mar 13, 2025

Great, thanks, that's what I was missing then. Good to know.

I'll look into making stringref a subtype of externref then, to start.

kripken added a commit that referenced this issue Mar 20, 2025
StringLowering converts strings to externs, which makes sense as we
lower stringrefs to imported JS strings. For the reverse transform it is
convenient to just have strings be subtypes of ext, see #7370 - that makes it
simple to switch stringref to externref and vice versa.

This also adds support for internalizing externref strings, which we
represent as anyref literals (basically a hidden subtype of anyref).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants