Skip to content

[RISCV] Optimize vp.splice with 0 offset. #145533

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 24, 2025
Merged

Conversation

topperc
Copy link
Collaborator

@topperc topperc commented Jun 24, 2025

We can skip the slidedown if the offset is 0.

We can skip the slidedown if the offset is 0.
@llvmbot
Copy link
Member

llvmbot commented Jun 24, 2025

@llvm/pr-subscribers-backend-risc-v

Author: Craig Topper (topperc)

Changes

We can skip the slidedown if the offset is 0.


Full diff: https://github.com/llvm/llvm-project/pull/145533.diff

3 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+5-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-splice.ll (+11)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vp-splice.ll (+24-14)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 13ee3ee63d1a6..2b3f8d1cdf60f 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -13297,10 +13297,11 @@ RISCVTargetLowering::lowerVPSpliceExperimental(SDValue Op,
     DownOffset = DAG.getNode(ISD::SUB, DL, XLenVT, EVL1, UpOffset);
   }
 
-  SDValue SlideDown =
-      getVSlidedown(DAG, Subtarget, DL, ContainerVT, DAG.getUNDEF(ContainerVT),
-                    Op1, DownOffset, Mask, UpOffset);
-  SDValue Result = getVSlideup(DAG, Subtarget, DL, ContainerVT, SlideDown, Op2,
+  if (ImmValue != 0)
+    Op1 = getVSlidedown(DAG, Subtarget, DL, ContainerVT,
+                        DAG.getUNDEF(ContainerVT), Op1, DownOffset, Mask,
+                        UpOffset);
+  SDValue Result = getVSlideup(DAG, Subtarget, DL, ContainerVT, Op1, Op2,
                                UpOffset, Mask, EVL2, RISCVVType::TAIL_AGNOSTIC);
 
   if (IsMaskVector) {
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-splice.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-splice.ll
index 7245d8a11f563..7bf22247093f7 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-splice.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-splice.ll
@@ -30,6 +30,17 @@ define <2 x i64> @test_vp_splice_v2i64_negative_offset(<2 x i64> %va, <2 x i64>
   ret <2 x i64> %v
 }
 
+define <2 x i64> @test_vp_splice_v2i64_zero_offset(<2 x i64> %va, <2 x i64> %vb, i32 zeroext %evla, i32 zeroext %evlb) {
+; CHECK-LABEL: test_vp_splice_v2i64_zero_offset:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a1, e64, m1, ta, ma
+; CHECK-NEXT:    vslideup.vx v8, v9, a0
+; CHECK-NEXT:    ret
+
+  %v = call <2 x i64> @llvm.experimental.vp.splice.v2i64(<2 x i64> %va, <2 x i64> %vb, i32 0, <2 x i1> splat (i1 1), i32 %evla, i32 %evlb)
+  ret <2 x i64> %v
+}
+
 define <2 x i64> @test_vp_splice_v2i64_masked(<2 x i64> %va, <2 x i64> %vb, <2 x i1> %mask, i32 zeroext %evla, i32 zeroext %evlb) {
 ; CHECK-LABEL: test_vp_splice_v2i64_masked:
 ; CHECK:       # %bb.0:
diff --git a/llvm/test/CodeGen/RISCV/rvv/vp-splice.ll b/llvm/test/CodeGen/RISCV/rvv/vp-splice.ll
index ffeb493989103..2a59cf2c12f01 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vp-splice.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vp-splice.ll
@@ -40,6 +40,16 @@ define <vscale x 2 x i64> @test_vp_splice_nxv2i64_negative_offset(<vscale x 2 x
   ret <vscale x 2 x i64> %v
 }
 
+define <vscale x 2 x i64> @test_vp_splice_nxv2i64_zero_offset(<vscale x 2 x i64> %va, <vscale x 2 x i64> %vb, i32 zeroext %evla, i32 zeroext %evlb) {
+; CHECK-LABEL: test_vp_splice_nxv2i64_zero_offset:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a1, e64, m2, ta, ma
+; CHECK-NEXT:    vslideup.vx v8, v10, a0
+; CHECK-NEXT:    ret
+  %v = call <vscale x 2 x i64> @llvm.experimental.vp.splice.nxv2i64(<vscale x 2 x i64> %va, <vscale x 2 x i64> %vb, i32 0, <vscale x 2 x i1> splat (i1 1), i32 %evla, i32 %evlb)
+  ret <vscale x 2 x i64> %v
+}
+
 define <vscale x 2 x i64> @test_vp_splice_nxv2i64_masked(<vscale x 2 x i64> %va, <vscale x 2 x i64> %vb, <vscale x 2 x i1> %mask, i32 zeroext %evla, i32 zeroext %evlb) {
 ; CHECK-LABEL: test_vp_splice_nxv2i64_masked:
 ; CHECK:       # %bb.0:
@@ -295,10 +305,10 @@ define <vscale x 16 x i64> @test_vp_splice_nxv16i64(<vscale x 16 x i64> %va, <vs
 ; CHECK-NEXT:    addi a5, a5, -1
 ; CHECK-NEXT:    slli a1, a4, 3
 ; CHECK-NEXT:    mv a7, a2
-; CHECK-NEXT:    bltu a2, a5, .LBB21_2
+; CHECK-NEXT:    bltu a2, a5, .LBB22_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    mv a7, a5
-; CHECK-NEXT:  .LBB21_2:
+; CHECK-NEXT:  .LBB22_2:
 ; CHECK-NEXT:    addi sp, sp, -80
 ; CHECK-NEXT:    sd ra, 72(sp) # 8-byte Folded Spill
 ; CHECK-NEXT:    sd s0, 64(sp) # 8-byte Folded Spill
@@ -311,10 +321,10 @@ define <vscale x 16 x i64> @test_vp_splice_nxv16i64(<vscale x 16 x i64> %va, <vs
 ; CHECK-NEXT:    slli a7, a7, 3
 ; CHECK-NEXT:    addi a6, sp, 64
 ; CHECK-NEXT:    mv t0, a2
-; CHECK-NEXT:    bltu a2, a4, .LBB21_4
+; CHECK-NEXT:    bltu a2, a4, .LBB22_4
 ; CHECK-NEXT:  # %bb.3:
 ; CHECK-NEXT:    mv t0, a4
-; CHECK-NEXT:  .LBB21_4:
+; CHECK-NEXT:  .LBB22_4:
 ; CHECK-NEXT:    vl8re64.v v24, (a5)
 ; CHECK-NEXT:    add a5, a6, a7
 ; CHECK-NEXT:    vl8re64.v v0, (a0)
@@ -328,10 +338,10 @@ define <vscale x 16 x i64> @test_vp_splice_nxv16i64(<vscale x 16 x i64> %va, <vs
 ; CHECK-NEXT:    vsetvli zero, a0, e64, m8, ta, ma
 ; CHECK-NEXT:    vse64.v v16, (a6)
 ; CHECK-NEXT:    mv a0, a3
-; CHECK-NEXT:    bltu a3, a4, .LBB21_6
+; CHECK-NEXT:    bltu a3, a4, .LBB22_6
 ; CHECK-NEXT:  # %bb.5:
 ; CHECK-NEXT:    mv a0, a4
-; CHECK-NEXT:  .LBB21_6:
+; CHECK-NEXT:  .LBB22_6:
 ; CHECK-NEXT:    vsetvli zero, a0, e64, m8, ta, ma
 ; CHECK-NEXT:    vse64.v v0, (a5)
 ; CHECK-NEXT:    sub a2, a3, a4
@@ -363,10 +373,10 @@ define <vscale x 16 x i64> @test_vp_splice_nxv16i64_negative_offset(<vscale x 16
 ; CHECK-NEXT:    addi a6, a6, -1
 ; CHECK-NEXT:    slli a1, a5, 3
 ; CHECK-NEXT:    mv a4, a2
-; CHECK-NEXT:    bltu a2, a6, .LBB22_2
+; CHECK-NEXT:    bltu a2, a6, .LBB23_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    mv a4, a6
-; CHECK-NEXT:  .LBB22_2:
+; CHECK-NEXT:  .LBB23_2:
 ; CHECK-NEXT:    addi sp, sp, -80
 ; CHECK-NEXT:    sd ra, 72(sp) # 8-byte Folded Spill
 ; CHECK-NEXT:    sd s0, 64(sp) # 8-byte Folded Spill
@@ -379,10 +389,10 @@ define <vscale x 16 x i64> @test_vp_splice_nxv16i64_negative_offset(<vscale x 16
 ; CHECK-NEXT:    slli a4, a4, 3
 ; CHECK-NEXT:    addi a7, sp, 64
 ; CHECK-NEXT:    mv t0, a2
-; CHECK-NEXT:    bltu a2, a5, .LBB22_4
+; CHECK-NEXT:    bltu a2, a5, .LBB23_4
 ; CHECK-NEXT:  # %bb.3:
 ; CHECK-NEXT:    mv t0, a5
-; CHECK-NEXT:  .LBB22_4:
+; CHECK-NEXT:  .LBB23_4:
 ; CHECK-NEXT:    vl8re64.v v24, (a6)
 ; CHECK-NEXT:    add a6, a7, a4
 ; CHECK-NEXT:    vl8re64.v v0, (a0)
@@ -396,10 +406,10 @@ define <vscale x 16 x i64> @test_vp_splice_nxv16i64_negative_offset(<vscale x 16
 ; CHECK-NEXT:    vsetvli zero, a0, e64, m8, ta, ma
 ; CHECK-NEXT:    vse64.v v16, (a7)
 ; CHECK-NEXT:    mv a0, a3
-; CHECK-NEXT:    bltu a3, a5, .LBB22_6
+; CHECK-NEXT:    bltu a3, a5, .LBB23_6
 ; CHECK-NEXT:  # %bb.5:
 ; CHECK-NEXT:    mv a0, a5
-; CHECK-NEXT:  .LBB22_6:
+; CHECK-NEXT:  .LBB23_6:
 ; CHECK-NEXT:    vsetvli zero, a0, e64, m8, ta, ma
 ; CHECK-NEXT:    vse64.v v0, (a6)
 ; CHECK-NEXT:    sub a2, a3, a5
@@ -410,10 +420,10 @@ define <vscale x 16 x i64> @test_vp_splice_nxv16i64_negative_offset(<vscale x 16
 ; CHECK-NEXT:    li a3, 8
 ; CHECK-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
 ; CHECK-NEXT:    vse64.v v24, (a5)
-; CHECK-NEXT:    bltu a4, a3, .LBB22_8
+; CHECK-NEXT:    bltu a4, a3, .LBB23_8
 ; CHECK-NEXT:  # %bb.7:
 ; CHECK-NEXT:    li a4, 8
-; CHECK-NEXT:  .LBB22_8:
+; CHECK-NEXT:  .LBB23_8:
 ; CHECK-NEXT:    sub a2, a6, a4
 ; CHECK-NEXT:    add a1, a2, a1
 ; CHECK-NEXT:    vle64.v v16, (a1)

Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Another case here possibly worth optimizing - if index is zero, and evla is a multiple of a register size, the slideup can become a whole register move.

Though, maybe some of this should be done at IR instead of DAG? I can't quite see how to frame this transform, but the one I just proposed could be a vector extract/insert.

@topperc topperc merged commit 7150b2c into llvm:main Jun 24, 2025
7 checks passed
@topperc topperc deleted the pr/slidedown-zero branch June 24, 2025 17:02
DrSergei pushed a commit to DrSergei/llvm-project that referenced this pull request Jun 24, 2025
We can skip the slidedown if the offset is 0.
anthonyhatran pushed a commit to anthonyhatran/llvm-project that referenced this pull request Jun 26, 2025
We can skip the slidedown if the offset is 0.
rlavaee pushed a commit to rlavaee/llvm-project that referenced this pull request Jul 1, 2025
We can skip the slidedown if the offset is 0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants