Skip to content

[LV] Enable auto-vectorisation of loops with uncountable exits #133099

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

david-arm
Copy link
Contributor

Until now the feature to enable vectorisation of some early exit
loops with uncountable exits was controlled under a flag, off by
default. Now that we have efficient code generation for
vectorising such loops (see PR #130766) and we are still far
enough away from the next LLVM release it seems like a good time
to enable the feature by default. If any issues arise
post-commit it can be easily reverted.

Using this patch I built and ran the LLVM test suite successfully,
which on neoverse-v1 led to the vectorisation of 114 additional
early exit loops. I then performed a bootstrap build of clang,
which built cleanly with 64 extra early exit loops vectorising.

I also built and ran SPEC2017 successfully with no change in
performance for both neoverse-v1 and neoverse-v2.

Until now the feature to enable vectorisation of some early exit
loops with uncountable exits was controlled under a flag, off by
default. Now that we have efficient code generation for
vectorising such loops (see PR llvm#130766) and we are still far
enough away from the next LLVM release it seems like a good time
to enable the feature by default. If any issues arise
post-commit it can be easily reverted.

Using this patch I built and ran the LLVM test suite successfully,
which on neoverse-v1 led to the vectorisation of 114 additional
early exit loops. I then performed a bootstrap build of clang,
which built cleanly with 64 extra early exit loops vectorising.

I also built and ran SPEC2017 successfully with no change in
performance for both neoverse-v1 and neoverse-v2.
@llvmbot
Copy link
Member

llvmbot commented Mar 26, 2025

@llvm/pr-subscribers-vectorizers

Author: David Sherwood (david-arm)

Changes

Until now the feature to enable vectorisation of some early exit
loops with uncountable exits was controlled under a flag, off by
default. Now that we have efficient code generation for
vectorising such loops (see PR #130766) and we are still far
enough away from the next LLVM release it seems like a good time
to enable the feature by default. If any issues arise
post-commit it can be easily reverted.

Using this patch I built and ran the LLVM test suite successfully,
which on neoverse-v1 led to the vectorisation of 114 additional
early exit loops. I then performed a bootstrap build of clang,
which built cleanly with 64 extra early exit loops vectorising.

I also built and ran SPEC2017 successfully with no change in
performance for both neoverse-v1 and neoverse-v2.


Full diff: https://github.com/llvm/llvm-project/pull/133099.diff

7 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/multi_early_exit.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/multi_early_exit_live_outs.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/single_early_exit.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/single_early_exit_live_outs.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/single_early_exit_with_outer_loop.ll (+1-1)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 5244a5e7b1c41..7010715947a54 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -396,7 +396,7 @@ static cl::opt<bool> UseWiderVFIfCallVariantsPresent(
     cl::desc("Try wider VFs if they enable the use of vector variants"));
 
 static cl::opt<bool> EnableEarlyExitVectorization(
-    "enable-early-exit-vectorization", cl::init(false), cl::Hidden,
+    "enable-early-exit-vectorization", cl::init(true), cl::Hidden,
     cl::desc(
         "Enable vectorization of early exit loops with uncountable exits."));
 
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll b/llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll
index 7d5b73477f6ed..b30f46195a326 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
-; RUN: opt -S < %s -p loop-vectorize -enable-early-exit-vectorization | FileCheck %s --check-prefixes=CHECK
+; RUN: opt -S < %s -p loop-vectorize | FileCheck %s --check-prefixes=CHECK
 
 target triple = "aarch64-unknown-linux-gnu"
 
diff --git a/llvm/test/Transforms/LoopVectorize/multi_early_exit.ll b/llvm/test/Transforms/LoopVectorize/multi_early_exit.ll
index 0e753a535cd2d..94af5b7c7607d 100644
--- a/llvm/test/Transforms/LoopVectorize/multi_early_exit.ll
+++ b/llvm/test/Transforms/LoopVectorize/multi_early_exit.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
-; RUN: opt -S < %s -p loop-vectorize -enable-early-exit-vectorization | FileCheck %s
+; RUN: opt -S < %s -p loop-vectorize | FileCheck %s
 
 declare void @init_mem(ptr, i64);
 
diff --git a/llvm/test/Transforms/LoopVectorize/multi_early_exit_live_outs.ll b/llvm/test/Transforms/LoopVectorize/multi_early_exit_live_outs.ll
index 4027f6a0f5dfd..7759c10032e9b 100644
--- a/llvm/test/Transforms/LoopVectorize/multi_early_exit_live_outs.ll
+++ b/llvm/test/Transforms/LoopVectorize/multi_early_exit_live_outs.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
-; RUN: opt -S < %s -p loop-vectorize -enable-early-exit-vectorization | FileCheck %s
+; RUN: opt -S < %s -p loop-vectorize | FileCheck %s
 
 declare void @init_mem(ptr, i64);
 
diff --git a/llvm/test/Transforms/LoopVectorize/single_early_exit.ll b/llvm/test/Transforms/LoopVectorize/single_early_exit.ll
index dedf5f0be624e..4b580e42f009e 100644
--- a/llvm/test/Transforms/LoopVectorize/single_early_exit.ll
+++ b/llvm/test/Transforms/LoopVectorize/single_early_exit.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
-; RUN: opt -S < %s -p loop-vectorize -enable-early-exit-vectorization -force-vector-width=4 | FileCheck %s
+; RUN: opt -S < %s -p loop-vectorize -force-vector-width=4 | FileCheck %s
 
 declare void @init_mem(ptr, i64);
 
diff --git a/llvm/test/Transforms/LoopVectorize/single_early_exit_live_outs.ll b/llvm/test/Transforms/LoopVectorize/single_early_exit_live_outs.ll
index 14651d60e1532..df9d2b477ec68 100644
--- a/llvm/test/Transforms/LoopVectorize/single_early_exit_live_outs.ll
+++ b/llvm/test/Transforms/LoopVectorize/single_early_exit_live_outs.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
-; RUN: opt -S < %s -p loop-vectorize -enable-early-exit-vectorization -force-vector-width=4 | FileCheck %s
+; RUN: opt -S < %s -p loop-vectorize -force-vector-width=4 | FileCheck %s
 
 declare void @init_mem(ptr, i64);
 
diff --git a/llvm/test/Transforms/LoopVectorize/single_early_exit_with_outer_loop.ll b/llvm/test/Transforms/LoopVectorize/single_early_exit_with_outer_loop.ll
index 51cfc72752014..da26c962c7d2b 100644
--- a/llvm/test/Transforms/LoopVectorize/single_early_exit_with_outer_loop.ll
+++ b/llvm/test/Transforms/LoopVectorize/single_early_exit_with_outer_loop.ll
@@ -1,4 +1,4 @@
-; RUN: opt -S < %s -p loop-vectorize,'print<loops>' -disable-output -enable-early-exit-vectorization 2>&1 | FileCheck %s
+; RUN: opt -S < %s -p loop-vectorize,'print<loops>' -disable-output 2>&1 | FileCheck %s
 
 declare void @init_mem(ptr, i64);
 

@llvmbot
Copy link
Member

llvmbot commented Mar 26, 2025

@llvm/pr-subscribers-llvm-transforms

Author: David Sherwood (david-arm)

Changes

Until now the feature to enable vectorisation of some early exit
loops with uncountable exits was controlled under a flag, off by
default. Now that we have efficient code generation for
vectorising such loops (see PR #130766) and we are still far
enough away from the next LLVM release it seems like a good time
to enable the feature by default. If any issues arise
post-commit it can be easily reverted.

Using this patch I built and ran the LLVM test suite successfully,
which on neoverse-v1 led to the vectorisation of 114 additional
early exit loops. I then performed a bootstrap build of clang,
which built cleanly with 64 extra early exit loops vectorising.

I also built and ran SPEC2017 successfully with no change in
performance for both neoverse-v1 and neoverse-v2.


Full diff: https://github.com/llvm/llvm-project/pull/133099.diff

7 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/multi_early_exit.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/multi_early_exit_live_outs.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/single_early_exit.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/single_early_exit_live_outs.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/single_early_exit_with_outer_loop.ll (+1-1)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 5244a5e7b1c41..7010715947a54 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -396,7 +396,7 @@ static cl::opt<bool> UseWiderVFIfCallVariantsPresent(
     cl::desc("Try wider VFs if they enable the use of vector variants"));
 
 static cl::opt<bool> EnableEarlyExitVectorization(
-    "enable-early-exit-vectorization", cl::init(false), cl::Hidden,
+    "enable-early-exit-vectorization", cl::init(true), cl::Hidden,
     cl::desc(
         "Enable vectorization of early exit loops with uncountable exits."));
 
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll b/llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll
index 7d5b73477f6ed..b30f46195a326 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
-; RUN: opt -S < %s -p loop-vectorize -enable-early-exit-vectorization | FileCheck %s --check-prefixes=CHECK
+; RUN: opt -S < %s -p loop-vectorize | FileCheck %s --check-prefixes=CHECK
 
 target triple = "aarch64-unknown-linux-gnu"
 
diff --git a/llvm/test/Transforms/LoopVectorize/multi_early_exit.ll b/llvm/test/Transforms/LoopVectorize/multi_early_exit.ll
index 0e753a535cd2d..94af5b7c7607d 100644
--- a/llvm/test/Transforms/LoopVectorize/multi_early_exit.ll
+++ b/llvm/test/Transforms/LoopVectorize/multi_early_exit.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
-; RUN: opt -S < %s -p loop-vectorize -enable-early-exit-vectorization | FileCheck %s
+; RUN: opt -S < %s -p loop-vectorize | FileCheck %s
 
 declare void @init_mem(ptr, i64);
 
diff --git a/llvm/test/Transforms/LoopVectorize/multi_early_exit_live_outs.ll b/llvm/test/Transforms/LoopVectorize/multi_early_exit_live_outs.ll
index 4027f6a0f5dfd..7759c10032e9b 100644
--- a/llvm/test/Transforms/LoopVectorize/multi_early_exit_live_outs.ll
+++ b/llvm/test/Transforms/LoopVectorize/multi_early_exit_live_outs.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
-; RUN: opt -S < %s -p loop-vectorize -enable-early-exit-vectorization | FileCheck %s
+; RUN: opt -S < %s -p loop-vectorize | FileCheck %s
 
 declare void @init_mem(ptr, i64);
 
diff --git a/llvm/test/Transforms/LoopVectorize/single_early_exit.ll b/llvm/test/Transforms/LoopVectorize/single_early_exit.ll
index dedf5f0be624e..4b580e42f009e 100644
--- a/llvm/test/Transforms/LoopVectorize/single_early_exit.ll
+++ b/llvm/test/Transforms/LoopVectorize/single_early_exit.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
-; RUN: opt -S < %s -p loop-vectorize -enable-early-exit-vectorization -force-vector-width=4 | FileCheck %s
+; RUN: opt -S < %s -p loop-vectorize -force-vector-width=4 | FileCheck %s
 
 declare void @init_mem(ptr, i64);
 
diff --git a/llvm/test/Transforms/LoopVectorize/single_early_exit_live_outs.ll b/llvm/test/Transforms/LoopVectorize/single_early_exit_live_outs.ll
index 14651d60e1532..df9d2b477ec68 100644
--- a/llvm/test/Transforms/LoopVectorize/single_early_exit_live_outs.ll
+++ b/llvm/test/Transforms/LoopVectorize/single_early_exit_live_outs.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
-; RUN: opt -S < %s -p loop-vectorize -enable-early-exit-vectorization -force-vector-width=4 | FileCheck %s
+; RUN: opt -S < %s -p loop-vectorize -force-vector-width=4 | FileCheck %s
 
 declare void @init_mem(ptr, i64);
 
diff --git a/llvm/test/Transforms/LoopVectorize/single_early_exit_with_outer_loop.ll b/llvm/test/Transforms/LoopVectorize/single_early_exit_with_outer_loop.ll
index 51cfc72752014..da26c962c7d2b 100644
--- a/llvm/test/Transforms/LoopVectorize/single_early_exit_with_outer_loop.ll
+++ b/llvm/test/Transforms/LoopVectorize/single_early_exit_with_outer_loop.ll
@@ -1,4 +1,4 @@
-; RUN: opt -S < %s -p loop-vectorize,'print<loops>' -disable-output -enable-early-exit-vectorization 2>&1 | FileCheck %s
+; RUN: opt -S < %s -p loop-vectorize,'print<loops>' -disable-output 2>&1 | FileCheck %s
 
 declare void @init_mem(ptr, i64);
 

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to fix #128061 before enabling, as without it we currently may compute incorrect bounds for early-exit loops, which in turn are used in isDereferenceableAndAlignedInLoop

@david-arm
Copy link
Contributor Author

I think it would be good to fix #128061 before enabling, as without it we currently may compute incorrect bounds for early-exit loops, which in turn are used in isDereferenceableAndAlignedInLoop

Oh yeah sure, that makes absolute sense. That's on my list of PRs to review today!

@lukel97
Copy link
Contributor

lukel97 commented Mar 31, 2025

I tried this out on RISC-V and we get 107% more loops vectorized on one of the SPEC CPU 2017 benchmarks, with an overall geomean 5% improvement across all the benchmarks.

There's also an LNT run here and there's no regressions. So no issues on the RISC-V side!

@david-arm
Copy link
Contributor Author

I tried this out on RISC-V and we get 107% more loops vectorized on one of the SPEC CPU 2017 benchmarks, with an overall geomean 5% improvement across all the benchmarks.

There's also an LNT run here and there's no regressions. So no issues on the RISC-V side!

OK great! I'm glad that you see no regressions for RISC-V, although it's worth saying this patch only starts vectorising early exit loops if we know something about the object we're loading from. For example, if you can prove that all loads are within the bounds of an alloca then it's safe to vectorise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants