Skip to content

Significant optimization regression due to failing remove unused branch #7637

@xuruiyang2002

Description

@xuruiyang2002
Contributor

Given the following code:

(module
  (import "External" "external_function" (func $external_function))
  (func $_start call $main)
  (func $main
  (local $0 i32)
    (local $1 i32)
    block  ;; label = @1
      block (result i32)  ;; label = @2
        i32.const 1
        local.set $0
        i32.const 0
      end
      br_if 0 (;@1;)
      i32.const 1
      local.set $1
    end
    local.get $1
    local.set $0
    i32.const 1
    local.set $1
    block  ;; label = @1
      local.get $0
      br_if 0 (;@1;)
      i32.const 0
      i32.load8_u
      local.tee $0
      br_if 0 (;@1;)
      call $external_function
      i32.const 0
      local.set $1
    end
    i32.const 0
    local.get $1
    local.tee $0
    i32.store)
  (memory $0 258 258)
  (export "_start" (func $_start)))

wasm-opt (4f9374c) cannot remove unused branch by -O3, while -O2 can. In details, the -O2 produces (simplified):

 (func $_start
  (i32.store
   (i32.const 0)
   (i32.const 1)
  )
 )

while -O3 produces (simplified):

 (func $_start
  (local $0 i32)
  (block $block1
   (br_if $block1
    (local.tee $0
     (i32.const 1)
    )
   )
   (br_if $block1
    (i32.load8_u
     (i32.const 0)
    )
   )
   (call $external_function)
   (local.set $0
    (i32.const 0)
   )
  )
  (i32.store
   (i32.const 0)
   (local.get $0)
  )
 )

which is a significant optimization regression.

Activity

xuruiyang2002

xuruiyang2002 commented on Jun 3, 2025

@xuruiyang2002
ContributorAuthor

I think it is the CoalesceLocals pass that pollutes the unused block removal by introducing the local.tee instruction, and the local.tee hinders the analysis of the condition of first br_if.

The direct solution I think is handling local.tee in optimizeBoolean. Is there any better way?

kripken

kripken commented on Jun 3, 2025

@kripken
Member

I think the natural place to fix this is in RemoveUnusedBrs, which aims to remove branches we don't need. If the fallthrough of the condition is constant, we can infer how the branch happens, and apply that (keeping dropped children as needed). The specific place there would be to add a visitBreak in the FinalOptimizer class.

added a commit that references this issue on Jun 4, 2025
xuruiyang2002

xuruiyang2002 commented on Jun 4, 2025

@xuruiyang2002
ContributorAuthor

Yes, it's more natural to fix in the RemoveUnusedBrs pass.

By the way, I am also curious about the root cause, why cannot optimize this pattern:

 (func $_start
  (local $0 i32)
  (block $block
   (br_if $block   <----------------  item1
     (local.tee $0
     (i32.const 1)
    )
   )
   (br_if $block  <---------------- item2
    (i32.load
     (i32.const 0)
    )
   )
   (local.set $0 <--------------- item3
    (i32.const 0)
   )
  )
  (i32.store <------------------- item4
   (i32.const 0)
   (local.get $0)
  )
 )

Above is the minimal code snippet which can trigger the missed optimization, and removing any of the item above won't trigger. Why? Does the mixed cases difficult to analyze?


BTW, I already understand the benefits of optimizing on AST IR (e.g., direct, language-specified) from

  1. Future of Binaryen in a stack machine world? #663
  2. Why compile to WebAssembly using Binaryen?

However, it seems wasm-opt suffer a lot from side-effect code. For example:

  • ASTs with side effects often require manually using getDroppedChildrenAndAppend (while in LLVM it doesn't bother to)
  • Mixed cases are hard to analyze (example above)

All of these lead to missed optimizations. In LLVM, we don’t have to worry about so many side-effect ordering issues — right? How can we understand the pros and especially the cons of language-specific IR. Thank you.

xuruiyang2002

xuruiyang2002 commented on Jun 4, 2025

@xuruiyang2002
ContributorAuthor

(Sorry for accidentally closed this issue... could you please reopen it)

kripken

kripken commented on Jun 4, 2025

@kripken
Member

I'm not sure what you mean by "mixed cases"? What is the problem there aside from not handling the local.tee properly?

All of these lead to missed optimizations. In LLVM, we don’t have to worry about so many side-effect ordering issues — right?

Yes, this is a tradeoff. Binaryen tries to be very close to wasm, and that has some downsides, like anything can be an expression with side effects.

The general idea in the past was that we'd handle side effects like that by "flattening" the code, as -O4 does, using --flatten, which avoids nested side effects. However, as wasm grew, flattening code like that became harder (e.g. GC and exceptions add more branches, pops off the stack, etc., which flatten doesn't model well). So it is better not to rely on flattening now.

Another idea in the past was to use a "dataflow" IR which is in SSA form, like LLVM, see

https://github.com/WebAssembly/binaryen/blob/main/src/dataflow/node.h

That worked well enough for integration with Souper, but it does add overhead to convert the IR like that.

Overall, it is better to add getDroppedChildrenAndAppend in relevant places, I think. There is probably a long list of missing places, though also I think we've covered the great majority of common cases already.

As for tradeoffs, all these downsides should be compared to the benefit of optimizing on an IR that matches wasm closely. By doing that we don't miss out on low-level opportunities. LLVM, for example, has to optimize not just LLVM IR but also the backend codegen after lowering (e.g. after turning SSA into concrete registers, new opportunities can arise).

xuruiyang2002

xuruiyang2002 commented on Jun 6, 2025

@xuruiyang2002
ContributorAuthor

Thanks for careful explanation.

I'm not sure what you mean by "mixed cases"? What is the problem there aside from not handling the local.tee properly?

Well, some misunderstandings in my question, but now I understand now.

added a commit that references this issue on Jun 24, 2025
9bd8871
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Participants

      @kripken@xuruiyang2002

      Issue actions

        Significant optimization regression due to failing remove unused branch · Issue #7637 · WebAssembly/binaryen