Skip to content

[DAGCombiner] Forward vector store to vector load with extract_subvector #145707

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

joe-rivos
Copy link

Loading a smaller fixed vector type from a stored larger fixed vector type
can be substituted with an extract_subvector, provided the smaller type
entirely contained in the larger type, and an extract_element would be
legal for the given offset.

Granted the result for RISCV is the same number of instructions, but we avoid the loads.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added the llvm:SelectionDAG SelectionDAGISel as well label Jun 25, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 25, 2025

@llvm/pr-subscribers-llvm-selectiondag

Author: None (joe-rivos)

Changes

Loading a smaller fixed vector type from a stored larger fixed vector type
can be substituted with an extract_subvector, provided the smaller type
entirely contained in the larger type, and an extract_element would be
legal for the given offset.

Granted the result for RISCV is the same number of instructions, but we avoid the loads.


Full diff: https://github.com/llvm/llvm-project/pull/145707.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+21)
  • (added) llvm/test/CodeGen/RISCV/forward-vec-store.ll (+66)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 91f696e8fe88e..6c213fd6de268 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -19913,6 +19913,27 @@ SDValue DAGCombiner::ForwardStoreValueToDirectLoad(LoadSDNode *LD) {
     }
   }
 
+  // Loading a smaller fixed vector type from a stored larger fixed vector type
+  // can be substituted with an extract_subvector, provided the smaller type
+  // entirely contained in the larger type, and an extract_element would be
+  // legal for the given offset.
+  if (TLI.isOperationLegalOrCustom(ISD::EXTRACT_SUBVECTOR, LDType) &&
+      LDType.isFixedLengthVector() && STType.isFixedLengthVector() &&
+      !ST->isTruncatingStore() && LD->getExtensionType() == ISD::NON_EXTLOAD &&
+      LDType.getVectorElementType() == STType.getVectorElementType() &&
+      (Offset * 8 + LDType.getFixedSizeInBits() <=
+       STType.getFixedSizeInBits()) &&
+      (Offset % LDType.getScalarStoreSize() == 0)) {
+    unsigned EltOffset = Offset / LDType.getScalarStoreSize();
+    // The extract index must be a multiple of the result's element count.
+    if (EltOffset % LDType.getVectorElementCount().getFixedValue() == 0) {
+      auto DL = SDLoc(LD);
+      SDValue VecIdx = DAG.getVectorIdxConstant(EltOffset, DL);
+      Val = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, LDType, Val, VecIdx);
+      return ReplaceLd(LD, Val, Chain);
+    }
+  }
+
   // TODO: Deal with nonzero offset.
   if (LD->getBasePtr().isUndef() || Offset != 0)
     return SDValue();
diff --git a/llvm/test/CodeGen/RISCV/forward-vec-store.ll b/llvm/test/CodeGen/RISCV/forward-vec-store.ll
new file mode 100644
index 0000000000000..e8dad81eb1e47
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/forward-vec-store.ll
@@ -0,0 +1,66 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=riscv64 -mattr=+v,+d,+zvfh -o - %s | FileCheck %s
+
+define void @forward_store(<32 x half> %halves, ptr %p, ptr %p2, ptr %p3, ptr %p4) {
+; CHECK-LABEL: forward_store:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    li a4, 32
+; CHECK-NEXT:    vsetivli zero, 8, e16, m2, ta, ma
+; CHECK-NEXT:    vslidedown.vi v16, v8, 8
+; CHECK-NEXT:    vsetivli zero, 8, e16, m4, ta, ma
+; CHECK-NEXT:    vslidedown.vi v12, v8, 16
+; CHECK-NEXT:    vsetvli zero, a4, e16, m4, ta, ma
+; CHECK-NEXT:    vse16.v v8, (a0)
+; CHECK-NEXT:    vsetivli zero, 8, e16, m4, ta, ma
+; CHECK-NEXT:    vslidedown.vi v8, v8, 24
+; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
+; CHECK-NEXT:    vse16.v v16, (a1)
+; CHECK-NEXT:    vse16.v v12, (a2)
+; CHECK-NEXT:    vse16.v v8, (a3)
+; CHECK-NEXT:    ret
+  store <32 x half> %halves, ptr %p, align 256
+  %gep1 = getelementptr inbounds nuw i8, ptr %p, i32 16
+  %gep2 = getelementptr inbounds nuw i8, ptr %p, i32 32
+  %gep3 = getelementptr inbounds nuw i8, ptr %p, i32 48
+  %ld1 = load <8 x half>, ptr %gep1, align 4
+  %ld2 = load <8 x half>, ptr %gep2, align 4
+  %ld3 = load <8 x half>, ptr %gep3, align 4
+  store <8 x half> %ld1, ptr %p2
+  store <8 x half> %ld2, ptr %p3
+  store <8 x half> %ld3, ptr %p4
+  ret void
+}
+
+define void @no_forward_store(<32 x half> %halves, ptr %p, ptr %p2, ptr %p3, ptr %p4) {
+; CHECK-LABEL: no_forward_store:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addi a4, a0, 8
+; CHECK-NEXT:    li a5, 32
+; CHECK-NEXT:    vsetvli zero, a5, e16, m4, ta, ma
+; CHECK-NEXT:    vse16.v v8, (a0)
+; CHECK-NEXT:    addi a5, a0, 16
+; CHECK-NEXT:    addi a0, a0, 64
+; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
+; CHECK-NEXT:    vle16.v v8, (a4)
+; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; CHECK-NEXT:    vle32.v v9, (a5)
+; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
+; CHECK-NEXT:    vle16.v v10, (a0)
+; CHECK-NEXT:    vse16.v v8, (a1)
+; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; CHECK-NEXT:    vse32.v v9, (a2)
+; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
+; CHECK-NEXT:    vse16.v v10, (a3)
+; CHECK-NEXT:    ret
+  store <32 x half> %halves, ptr %p, align 256
+  %gep1 = getelementptr inbounds nuw i8, ptr %p, i32 8
+  %gep2 = getelementptr inbounds nuw i8, ptr %p, i32 16
+  %gep3 = getelementptr inbounds nuw i8, ptr %p, i32 64
+  %ld1 = load <8 x half>, ptr %gep1, align 4
+  %ld2 = load <4 x i32>, ptr %gep2, align 4
+  %ld3 = load <8 x half>, ptr %gep3, align 4
+  store <8 x half> %ld1, ptr %p2
+  store <4 x i32> %ld2, ptr %p3
+  store <8 x half> %ld3, ptr %p4
+  ret void
+}

@joe-rivos
Copy link
Author

Tagging reviewers:
@david-arm
@preames

Loading a smaller fixed vector type from a stored larger fixed vector type
can be substituted with an extract_subvector, provided the smaller type
entirely contained in the larger type, and an extract_element would be
legal for the given offset.
@joe-rivos joe-rivos force-pushed the vector-store-forward branch from 7f8bd0e to 32572df Compare June 26, 2025 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants