-
Notifications
You must be signed in to change notification settings - Fork 795
Closed
Description
Given the following code:
(module
(import "External" "external_function" (func $external_function))
(func $_start call $main)
(func $main
(local $0 i32)
(local $1 i32)
block ;; label = @1
block (result i32) ;; label = @2
i32.const 1
local.set $0
i32.const 0
end
br_if 0 (;@1;)
i32.const 1
local.set $1
end
local.get $1
local.set $0
i32.const 1
local.set $1
block ;; label = @1
local.get $0
br_if 0 (;@1;)
i32.const 0
i32.load8_u
local.tee $0
br_if 0 (;@1;)
call $external_function
i32.const 0
local.set $1
end
i32.const 0
local.get $1
local.tee $0
i32.store)
(memory $0 258 258)
(export "_start" (func $_start)))
wasm-opt (4f9374c) cannot remove unused branch by -O3
, while -O2
can. In details, the -O2
produces (simplified):
(func $_start
(i32.store
(i32.const 0)
(i32.const 1)
)
)
while -O3
produces (simplified):
(func $_start
(local $0 i32)
(block $block1
(br_if $block1
(local.tee $0
(i32.const 1)
)
)
(br_if $block1
(i32.load8_u
(i32.const 0)
)
)
(call $external_function)
(local.set $0
(i32.const 0)
)
)
(i32.store
(i32.const 0)
(local.get $0)
)
)
which is a significant optimization regression.
Metadata
Metadata
Assignees
Labels
No labels
Activity
xuruiyang2002 commentedon Jun 3, 2025
I think it is the
CoalesceLocals
pass that pollutes the unused block removal by introducing thelocal.tee
instruction, and thelocal.tee
hinders the analysis of the condition of firstbr_if
.The direct solution I think is handling
local.tee
inoptimizeBoolean
. Is there any better way?kripken commentedon Jun 3, 2025
I think the natural place to fix this is in RemoveUnusedBrs, which aims to remove branches we don't need. If the fallthrough of the condition is constant, we can infer how the branch happens, and apply that (keeping dropped children as needed). The specific place there would be to add a
visitBreak
in theFinalOptimizer
class.opt for WebAssembly#7637
xuruiyang2002 commentedon Jun 4, 2025
Yes, it's more natural to fix in the RemoveUnusedBrs pass.
By the way, I am also curious about the root cause, why cannot optimize this pattern:
Above is the minimal code snippet which can trigger the missed optimization, and removing any of the item above won't trigger. Why? Does the mixed cases difficult to analyze?
BTW, I already understand the benefits of optimizing on AST IR (e.g., direct, language-specified) from
However, it seems wasm-opt suffer a lot from
side-effect
code. For example:getDroppedChildrenAndAppend
(while in LLVM it doesn't bother to)All of these lead to missed optimizations. In LLVM, we don’t have to worry about so many side-effect ordering issues — right? How can we understand the pros and especially the cons of language-specific IR. Thank you.
xuruiyang2002 commentedon Jun 4, 2025
(Sorry for accidentally closed this issue... could you please reopen it)
kripken commentedon Jun 4, 2025
I'm not sure what you mean by "mixed cases"? What is the problem there aside from not handling the
local.tee
properly?Yes, this is a tradeoff. Binaryen tries to be very close to wasm, and that has some downsides, like anything can be an expression with side effects.
The general idea in the past was that we'd handle side effects like that by "flattening" the code, as
-O4
does, using--flatten
, which avoids nested side effects. However, as wasm grew, flattening code like that became harder (e.g. GC and exceptions add more branches, pops off the stack, etc., which flatten doesn't model well). So it is better not to rely on flattening now.Another idea in the past was to use a "dataflow" IR which is in SSA form, like LLVM, see
https://github.com/WebAssembly/binaryen/blob/main/src/dataflow/node.h
That worked well enough for integration with Souper, but it does add overhead to convert the IR like that.
Overall, it is better to add
getDroppedChildrenAndAppend
in relevant places, I think. There is probably a long list of missing places, though also I think we've covered the great majority of common cases already.As for tradeoffs, all these downsides should be compared to the benefit of optimizing on an IR that matches wasm closely. By doing that we don't miss out on low-level opportunities. LLVM, for example, has to optimize not just LLVM IR but also the backend codegen after lowering (e.g. after turning SSA into concrete registers, new opportunities can arise).
xuruiyang2002 commentedon Jun 6, 2025
Thanks for careful explanation.
Well, some misunderstandings in my question, but now I understand now.
RemoveUnusedBrs: Optimize breaks with constant fallthroughs (#7639)