-
Notifications
You must be signed in to change notification settings - Fork 762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StringLifting? #7370
Comments
The benefits are much larger than just saving some optimization time. I expect most producers not to emit stringref at all and to directly use the standard string builtins instead. This is both to keep wasm-opt an optional component of their toolchain and to avoid straying from the bounds of standard Wasm. For such producers, we simply cannot do any string optimizations today. With string lifting, these producers would newly benefit from our optimizations. To fix the type problem, we should keep our There are more subtle typing problems that come from introducing more refined types internally than we can emit in the final output. What if two rec groups differ only in that one uses We will have the same problem with using exact types internally even when custom descriptors are not enabled, so I have it on my plate to solve this. The same solution can apply to stringrefs as well. For parsing, we can either unconditionally parse calls to the properly typed and named imports as string operations, or we can do so only when a string feature is enabled or other command line flag provided, or we can do so only when the imports are annotated. For simplicity and to do the best thing in the most cases, I would always parse calls to the imports as string operations with a command line flag to opt out of this behavior. |
Makes sense, I see how making string a subtype of extern can work here.
I agree that this would allow us to optimize more producers' code. But do you have any in mind? AFAIK Java uses stringref already while Kotlin and Dart do not use imported JS strings (at least that is what I see in files from recent testcases). (Your point still stands that producers may use string imports and it would be nice to optimize those, but this would feel a lot more worth fixing if we have such examples already.) |
I know Dart is transitioning to builtin strings, and I would expect Kotlin to use them eventually as well, at least for explicit JS interop use cases. |
Great, thanks, that's what I was missing then. Good to know. I'll look into making stringref a subtype of externref then, to start. |
StringLowering converts strings to externs, which makes sense as we lower stringrefs to imported JS strings. For the reverse transform it is convenient to just have strings be subtypes of ext, see #7370 - that makes it simple to switch stringref to externref and vice versa. This also adds support for internalizing externref strings, which we represent as anyref literals (basically a hidden subtype of anyref).
It would be nice to allow toolchains to emit magic JS string imports all the time, which would make the output immediately runnable in VMs. That would be instead of emitting stringref and letting Binaryen lower it. The benefit of stringref is that Binaryen can optimize strings (it has them in the IR), but in a debug build you don't need that, and just want to run the build. Right now, toolchains can do some work to emit either JS string imports or stringref, depending on build type (debug or optimized), but we could save them the effort if Binaryen could read JS string imports and turn them into optimizable stringref.
We already have a StringLowering pass that turns stringref into JS string imports, which works well, so we could have a StringLifting that does the inverse. However, the inverse problem is a lot harder, consider e.g.
We can turn that imported JS string into a
string.const
, but the type would change fromexternref
tostringref
, and no longer fit in the local.The existing lowering pass handles this by just turning every
stringref
intoexternref
, which is fine as the goal is to lower away all native wasm strings. But we can't do that in a lifting pass, as there might be legitimate and unrelatedexternref
uses to keep.Inferring the types to change (in locals, globals, params, results, struct and array fields, tags, etc. etc.) would be... challenging, and likely brittle.
If we used type imports this could work - toolchains would not use raw externref but something more specific. But that proposal is far off (phase 1), so toolchains can't depend on it.
We could use custom annotations instead. I looked a little into how that might work, but it seems like in
contexts.h
, where we get the annotations, we'd need to do something with them. That seems like a widespread and annoying change at the parsing level. Perhaps instead we could stash the annotations on the IR or on the side (like we do with debug info), and then a lifting pass could use that?To be honest that doesn't seem very appealing either, both in terms of needed work on the Binaryen side, and toolchains - they'd need to add these annotations everywhere, and forgetting some annotation - say on some struct field - would lead to very odd errors.
As all of this is meant to save toolchains time, I looked at a huge 33MB wasm file from Java (the largest file I have that uses strings). Running
--string-lowering-magic-imports
(and reading and writing the binary) takes 14 seconds on my modest laptop - on a beefy machine it would be significantly faster. So we are talking single digits of seconds here, most likely. In that case, the benefit to toolchains seems pretty modest?@tlively What are your thoughts here?
The text was updated successfully, but these errors were encountered: