Skip to content

Integrating LLVM optimizations with wasm-opt #7634

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

xuruiyang2002
Copy link
Contributor

This draft is about leveraging llvm opt to benefiting wasm-opt.

Languages like C/C++ and Rust are from LLVM and benefit a lot. However, not all come from LLVM (GC languages like Java, Kotlin, Dart, etc). wasm-opt wishes to take the role of a toolchain optimizer but cannot do optimizations due to the AST level optimizations. For example, wasm-opt cannot optimize the redundant store (first one):

    ;; Store 1 into memory at address 0:
    (i32.const 0)     
    (i32.const 1)     
    (i32.store)       
    
    ;; Store 0 into memory at address 0:
    (i32.const 0)     
    (i32.const 0)     
    (i32.store)       
  
    ;; Load the value from memory address 0 and return it:
    (i32.const 0)     
    (i32.load)

The general idea is: translate Binaryen IR (from LLVM-compatible code) into LLVM IR, let llvm-opt optimize it, and then get back the optimized result . The most closely related work is Speeding up SMT Solving via Compiler Optimization (FSE 2023), which uses a similar approach by translating SMT queries into LLVM IR to benefit from LLVM optimizations.

An earlier prototype implementing this idea can be found in this PR: main...kripken:binaryen:llvm. That experiment used existing tools like wabt, emcc, and llvm-opt, but a direct 1-to-1 translation may be better.

(I'll continue this if time allows)

@kripken
Copy link
Member

kripken commented Jun 4, 2025

I think there is a lot of potential here!

Btw, I remembered in #7637 (comment) that our dataflow IR may be useful here, which is SSA-like:

https://github.com/WebAssembly/binaryen/tree/main/src/dataflow

There is a simple pass that does so,

https://github.com/WebAssembly/binaryen/blob/main/src/passes/DataFlowOpts.cpp

I'm not sure, but an option might be to use the existing Binaryen IR => DataFlow IR, and add DataFlow IR => LLVM IR (and the last part could be simpler since it would be SSA => SSA).


// Create global memory buffer
ArrayType* memType = ArrayType::get(llvmBuilder->getInt8Ty(), totalSize);
GlobalVariable* llvmMem = new GlobalVariable(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this be rewritten back into, presumably, multi-memory WASM? Or should this pass only work on single-memory WASM?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still relatively new to compilers and WebAssembly (currently studying both), so please forgive any naivety in my code. This was just a small attempt...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a newbie, I’m trying to learn compilers and sys like llvm and wasm. However, the docs are huge and not very beginner-friendly. Any advice on where and how to start? Thanks!

Value* visitStore(Store* store) {
// 1. Get the @wasm_memory global variable.
GlobalVariable* wasmMemory =
llvmMod->getGlobalVariable("wasm_memory", true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If multiple memories are supported, this would be unsound.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial though would be:

  1. for MVP, we just need map each instructions while carefully dealing with semantics gaps such as UB in LLVM, inconsistency of FP spec and other low-level differences between the source and target semantics.
  2. for non-MVP (GC), we could perform code slicing to collect non GC parts (LLVM-optimizable), and transpile & send them to LLVM optimizer, then retrieve optimized code back and "stitch" it back.

So, in my humble view, it's better to satrt with MVP first.


Value* visitConst(Const* c) {
assert(c->type.isBasic());
switch (c->type.getBasic()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note LLVM supports WASM's externref

@kripken
Copy link
Member

kripken commented Jun 12, 2025

There is now a proposal to add wasm input to upstream LLVM:

https://discourse.llvm.org/t/rfc-mlir-dialect-for-webassembly/86758

If accepted, that could be very useful here, as it would let some wasm modules be read by LLVM, optimized, and re-emitted as LLVM.

They will never support all of wasm (like GC, I assume), but we could do work on our side to "filter" out the parts they can't handle, let them optimize, and then re-apply the filtered parts, something like that. That might still be a lot of work for us, but a lot less than otherwise.

@xuruiyang2002
Copy link
Contributor Author

Thanks for sharing, and I'll read it carefully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants