-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[DAGCombiner] Forward vector store to vector load with extract_subvector #145707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-llvm-selectiondag Author: None (joe-rivos) ChangesLoading a smaller fixed vector type from a stored larger fixed vector type Granted the result for RISCV is the same number of instructions, but we avoid the loads. Full diff: https://github.com/llvm/llvm-project/pull/145707.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 91f696e8fe88e..6c213fd6de268 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -19913,6 +19913,27 @@ SDValue DAGCombiner::ForwardStoreValueToDirectLoad(LoadSDNode *LD) {
}
}
+ // Loading a smaller fixed vector type from a stored larger fixed vector type
+ // can be substituted with an extract_subvector, provided the smaller type
+ // entirely contained in the larger type, and an extract_element would be
+ // legal for the given offset.
+ if (TLI.isOperationLegalOrCustom(ISD::EXTRACT_SUBVECTOR, LDType) &&
+ LDType.isFixedLengthVector() && STType.isFixedLengthVector() &&
+ !ST->isTruncatingStore() && LD->getExtensionType() == ISD::NON_EXTLOAD &&
+ LDType.getVectorElementType() == STType.getVectorElementType() &&
+ (Offset * 8 + LDType.getFixedSizeInBits() <=
+ STType.getFixedSizeInBits()) &&
+ (Offset % LDType.getScalarStoreSize() == 0)) {
+ unsigned EltOffset = Offset / LDType.getScalarStoreSize();
+ // The extract index must be a multiple of the result's element count.
+ if (EltOffset % LDType.getVectorElementCount().getFixedValue() == 0) {
+ auto DL = SDLoc(LD);
+ SDValue VecIdx = DAG.getVectorIdxConstant(EltOffset, DL);
+ Val = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, LDType, Val, VecIdx);
+ return ReplaceLd(LD, Val, Chain);
+ }
+ }
+
// TODO: Deal with nonzero offset.
if (LD->getBasePtr().isUndef() || Offset != 0)
return SDValue();
diff --git a/llvm/test/CodeGen/RISCV/forward-vec-store.ll b/llvm/test/CodeGen/RISCV/forward-vec-store.ll
new file mode 100644
index 0000000000000..e8dad81eb1e47
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/forward-vec-store.ll
@@ -0,0 +1,66 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=riscv64 -mattr=+v,+d,+zvfh -o - %s | FileCheck %s
+
+define void @forward_store(<32 x half> %halves, ptr %p, ptr %p2, ptr %p3, ptr %p4) {
+; CHECK-LABEL: forward_store:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li a4, 32
+; CHECK-NEXT: vsetivli zero, 8, e16, m2, ta, ma
+; CHECK-NEXT: vslidedown.vi v16, v8, 8
+; CHECK-NEXT: vsetivli zero, 8, e16, m4, ta, ma
+; CHECK-NEXT: vslidedown.vi v12, v8, 16
+; CHECK-NEXT: vsetvli zero, a4, e16, m4, ta, ma
+; CHECK-NEXT: vse16.v v8, (a0)
+; CHECK-NEXT: vsetivli zero, 8, e16, m4, ta, ma
+; CHECK-NEXT: vslidedown.vi v8, v8, 24
+; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma
+; CHECK-NEXT: vse16.v v16, (a1)
+; CHECK-NEXT: vse16.v v12, (a2)
+; CHECK-NEXT: vse16.v v8, (a3)
+; CHECK-NEXT: ret
+ store <32 x half> %halves, ptr %p, align 256
+ %gep1 = getelementptr inbounds nuw i8, ptr %p, i32 16
+ %gep2 = getelementptr inbounds nuw i8, ptr %p, i32 32
+ %gep3 = getelementptr inbounds nuw i8, ptr %p, i32 48
+ %ld1 = load <8 x half>, ptr %gep1, align 4
+ %ld2 = load <8 x half>, ptr %gep2, align 4
+ %ld3 = load <8 x half>, ptr %gep3, align 4
+ store <8 x half> %ld1, ptr %p2
+ store <8 x half> %ld2, ptr %p3
+ store <8 x half> %ld3, ptr %p4
+ ret void
+}
+
+define void @no_forward_store(<32 x half> %halves, ptr %p, ptr %p2, ptr %p3, ptr %p4) {
+; CHECK-LABEL: no_forward_store:
+; CHECK: # %bb.0:
+; CHECK-NEXT: addi a4, a0, 8
+; CHECK-NEXT: li a5, 32
+; CHECK-NEXT: vsetvli zero, a5, e16, m4, ta, ma
+; CHECK-NEXT: vse16.v v8, (a0)
+; CHECK-NEXT: addi a5, a0, 16
+; CHECK-NEXT: addi a0, a0, 64
+; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma
+; CHECK-NEXT: vle16.v v8, (a4)
+; CHECK-NEXT: vsetivli zero, 4, e32, m1, ta, ma
+; CHECK-NEXT: vle32.v v9, (a5)
+; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma
+; CHECK-NEXT: vle16.v v10, (a0)
+; CHECK-NEXT: vse16.v v8, (a1)
+; CHECK-NEXT: vsetivli zero, 4, e32, m1, ta, ma
+; CHECK-NEXT: vse32.v v9, (a2)
+; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma
+; CHECK-NEXT: vse16.v v10, (a3)
+; CHECK-NEXT: ret
+ store <32 x half> %halves, ptr %p, align 256
+ %gep1 = getelementptr inbounds nuw i8, ptr %p, i32 8
+ %gep2 = getelementptr inbounds nuw i8, ptr %p, i32 16
+ %gep3 = getelementptr inbounds nuw i8, ptr %p, i32 64
+ %ld1 = load <8 x half>, ptr %gep1, align 4
+ %ld2 = load <4 x i32>, ptr %gep2, align 4
+ %ld3 = load <8 x half>, ptr %gep3, align 4
+ store <8 x half> %ld1, ptr %p2
+ store <4 x i32> %ld2, ptr %p3
+ store <8 x half> %ld3, ptr %p4
+ ret void
+}
|
Tagging reviewers: |
Loading a smaller fixed vector type from a stored larger fixed vector type can be substituted with an extract_subvector, provided the smaller type entirely contained in the larger type, and an extract_element would be legal for the given offset.
7f8bd0e
to
32572df
Compare
Loading a smaller fixed vector type from a stored larger fixed vector type
can be substituted with an extract_subvector, provided the smaller type
entirely contained in the larger type, and an extract_element would be
legal for the given offset.
Granted the result for RISCV is the same number of instructions, but we avoid the loads.