-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[IA] Remove recursive [de]interleaving support #143875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[IA] Remove recursive [de]interleaving support #143875
Conversation
Now that the loop vectorizer emits just a single llvm.vector.[de]interleaveN intrinsic, we can remove the need to recognise recursively [de]interleaved intrinsics. No in-tree target currently has instructions to emit an interleaved access with a factor > 8, and I'm not aware of any other passes that will emit recursive interleave patterns, so this code is effectively dead. Some tests have been converted from the recursive form to a single intrinsic, and some others were deleted that are no longer needed, e.g. to do with the recursive tree. This closes off the work started in llvm#139893.
@llvm/pr-subscribers-llvm-transforms Author: Luke Lau (lukel97) ChangesNow that the loop vectorizer emits just a single llvm.vector.[de]interleaveN intrinsic after #141865, we can remove the need to recognise recursively [de]interleaved intrinsics. No in-tree target currently has instructions to emit an interleaved access with a factor > 8, and I'm not aware of any other passes that will emit recursive interleave patterns, so this code is effectively dead. Some tests have been converted from the recursive form to a single intrinsic, and some others were deleted that are no longer needed, e.g. to do with the recursive tree. This closes off the work started in #139893. Patch is 98.56 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/143875.diff 10 Files Affected:
diff --git a/llvm/lib/CodeGen/InterleavedAccessPass.cpp b/llvm/lib/CodeGen/InterleavedAccessPass.cpp
index 49f1504d244ed..9c4c86cebe7e5 100644
--- a/llvm/lib/CodeGen/InterleavedAccessPass.cpp
+++ b/llvm/lib/CodeGen/InterleavedAccessPass.cpp
@@ -629,173 +629,12 @@ static unsigned getIntrinsicFactor(const IntrinsicInst *II) {
}
}
-// For an (de)interleave tree like this:
-//
-// A C B D
-// |___| |___|
-// |_____|
-// |
-// A B C D
-//
-// We will get ABCD at the end while the leaf operands/results
-// are ACBD, which are also what we initially collected in
-// getVectorInterleaveFactor / getVectorDeinterleaveFactor. But TLI
-// hooks (e.g. lowerDeinterleaveIntrinsicToLoad) expect ABCD, so we need
-// to reorder them by interleaving these values.
-static void interleaveLeafValues(MutableArrayRef<Value *> SubLeaves) {
- unsigned NumLeaves = SubLeaves.size();
- assert(isPowerOf2_32(NumLeaves) && NumLeaves > 1);
- if (NumLeaves == 2)
- return;
-
- const unsigned HalfLeaves = NumLeaves / 2;
- // Visit the sub-trees.
- interleaveLeafValues(SubLeaves.take_front(HalfLeaves));
- interleaveLeafValues(SubLeaves.drop_front(HalfLeaves));
-
- SmallVector<Value *, 8> Buffer;
- // a0 a1 a2 a3 b0 b1 b2 b3
- // -> a0 b0 a1 b1 a2 b2 a3 b3
- for (unsigned i = 0U; i < NumLeaves; ++i)
- Buffer.push_back(SubLeaves[i / 2 + (i % 2 ? HalfLeaves : 0)]);
-
- llvm::copy(Buffer, SubLeaves.begin());
-}
-
-static bool
-getVectorInterleaveFactor(IntrinsicInst *II, SmallVectorImpl<Value *> &Operands,
- SmallVectorImpl<Instruction *> &DeadInsts) {
- assert(isInterleaveIntrinsic(II->getIntrinsicID()));
-
- // Visit with BFS
- SmallVector<IntrinsicInst *, 8> Queue;
- Queue.push_back(II);
- while (!Queue.empty()) {
- IntrinsicInst *Current = Queue.front();
- Queue.erase(Queue.begin());
-
- // All the intermediate intrinsics will be deleted.
- DeadInsts.push_back(Current);
-
- for (unsigned I = 0; I < getIntrinsicFactor(Current); ++I) {
- Value *Op = Current->getOperand(I);
- if (auto *OpII = dyn_cast<IntrinsicInst>(Op))
- if (OpII->getIntrinsicID() == Intrinsic::vector_interleave2) {
- Queue.push_back(OpII);
- continue;
- }
-
- // If this is not a perfectly balanced tree, the leaf
- // result types would be different.
- if (!Operands.empty() && Op->getType() != Operands.back()->getType())
- return false;
-
- Operands.push_back(Op);
- }
- }
-
- const unsigned Factor = Operands.size();
- // Currently we only recognize factors 2...8 and other powers of 2.
- // FIXME: should we assert here instead?
- if (Factor <= 1 ||
- (!isPowerOf2_32(Factor) && Factor != getIntrinsicFactor(II)))
- return false;
-
- // Recursively interleaved factors need to have their values reordered
- // TODO: Remove once the loop vectorizer no longer recursively interleaves
- // factors 4 + 8
- if (isPowerOf2_32(Factor) && getIntrinsicFactor(II) == 2)
- interleaveLeafValues(Operands);
- return true;
-}
-
-static bool
-getVectorDeinterleaveFactor(IntrinsicInst *II,
- SmallVectorImpl<Value *> &Results,
- SmallVectorImpl<Instruction *> &DeadInsts) {
- assert(isDeinterleaveIntrinsic(II->getIntrinsicID()));
- using namespace PatternMatch;
- if (!II->hasNUses(getIntrinsicFactor(II)))
- return false;
-
- // Visit with BFS
- SmallVector<IntrinsicInst *, 8> Queue;
- Queue.push_back(II);
- while (!Queue.empty()) {
- IntrinsicInst *Current = Queue.front();
- Queue.erase(Queue.begin());
- assert(Current->hasNUses(getIntrinsicFactor(Current)));
-
- // All the intermediate intrinsics will be deleted from the bottom-up.
- DeadInsts.insert(DeadInsts.begin(), Current);
-
- SmallVector<ExtractValueInst *> EVs(getIntrinsicFactor(Current), nullptr);
- for (User *Usr : Current->users()) {
- if (!isa<ExtractValueInst>(Usr))
- return 0;
-
- auto *EV = cast<ExtractValueInst>(Usr);
- // Intermediate ExtractValue instructions will also be deleted.
- DeadInsts.insert(DeadInsts.begin(), EV);
- ArrayRef<unsigned> Indices = EV->getIndices();
- if (Indices.size() != 1)
- return false;
-
- if (!EVs[Indices[0]])
- EVs[Indices[0]] = EV;
- else
- return false;
- }
-
- // We have legal indices. At this point we're either going
- // to continue the traversal or push the leaf values into Results.
- for (ExtractValueInst *EV : EVs) {
- // Continue the traversal. We're playing safe here and matching only the
- // expression consisting of a perfectly balanced binary tree in which all
- // intermediate values are only used once.
- if (EV->hasOneUse() &&
- match(EV->user_back(),
- m_Intrinsic<Intrinsic::vector_deinterleave2>()) &&
- EV->user_back()->hasNUses(2)) {
- auto *EVUsr = cast<IntrinsicInst>(EV->user_back());
- Queue.push_back(EVUsr);
- continue;
- }
-
- // If this is not a perfectly balanced tree, the leaf
- // result types would be different.
- if (!Results.empty() && EV->getType() != Results.back()->getType())
- return false;
-
- // Save the leaf value.
- Results.push_back(EV);
- }
- }
-
- const unsigned Factor = Results.size();
- // Currently we only recognize factors of 2...8 and other powers of 2.
- // FIXME: should we assert here instead?
- if (Factor <= 1 ||
- (!isPowerOf2_32(Factor) && Factor != getIntrinsicFactor(II)))
- return 0;
-
- // Recursively interleaved factors need to have their values reordered
- // TODO: Remove once the loop vectorizer no longer recursively interleaves
- // factors 4 + 8
- if (isPowerOf2_32(Factor) && getIntrinsicFactor(II) == 2)
- interleaveLeafValues(Results);
- return true;
-}
-
static Value *getMask(Value *WideMask, unsigned Factor,
ElementCount LeafValueEC) {
if (auto *IMI = dyn_cast<IntrinsicInst>(WideMask)) {
- SmallVector<Value *, 8> Operands;
- SmallVector<Instruction *, 8> DeadInsts;
- if (getVectorInterleaveFactor(IMI, Operands, DeadInsts)) {
- assert(!Operands.empty());
- if (Operands.size() == Factor && llvm::all_equal(Operands))
- return Operands[0];
+ if (isInterleaveIntrinsic(IMI->getIntrinsicID()) &&
+ getIntrinsicFactor(IMI) == Factor && llvm::all_equal(IMI->args())) {
+ return IMI->getArgOperand(0);
}
}
@@ -830,13 +669,19 @@ bool InterleavedAccessImpl::lowerDeinterleaveIntrinsic(
if (!LoadedVal->hasOneUse() || !isa<LoadInst, VPIntrinsic>(LoadedVal))
return false;
- SmallVector<Value *, 8> DeinterleaveValues;
- SmallVector<Instruction *, 8> DeinterleaveDeadInsts;
- if (!getVectorDeinterleaveFactor(DI, DeinterleaveValues,
- DeinterleaveDeadInsts))
+ const unsigned Factor = getIntrinsicFactor(DI);
+ if (!DI->hasNUses(Factor))
return false;
-
- const unsigned Factor = DeinterleaveValues.size();
+ SmallVector<Value *, 8> DeinterleaveValues(Factor);
+ for (auto *User : DI->users()) {
+ auto *Extract = dyn_cast<ExtractValueInst>(User);
+ if (!Extract || Extract->getNumIndices() != 1)
+ return false;
+ unsigned Idx = Extract->getIndices()[0];
+ if (DeinterleaveValues[Idx])
+ return false;
+ DeinterleaveValues[Idx] = Extract;
+ }
if (auto *VPLoad = dyn_cast<VPIntrinsic>(LoadedVal)) {
if (VPLoad->getIntrinsicID() != Intrinsic::vp_load)
@@ -869,7 +714,9 @@ bool InterleavedAccessImpl::lowerDeinterleaveIntrinsic(
return false;
}
- DeadInsts.insert_range(DeinterleaveDeadInsts);
+ for (Value *V : DeinterleaveValues)
+ DeadInsts.insert(cast<Instruction>(V));
+ DeadInsts.insert(DI);
// We now have a target-specific load, so delete the old one.
DeadInsts.insert(cast<Instruction>(LoadedVal));
return true;
@@ -883,12 +730,8 @@ bool InterleavedAccessImpl::lowerInterleaveIntrinsic(
if (!isa<StoreInst, VPIntrinsic>(StoredBy))
return false;
- SmallVector<Value *, 8> InterleaveValues;
- SmallVector<Instruction *, 8> InterleaveDeadInsts;
- if (!getVectorInterleaveFactor(II, InterleaveValues, InterleaveDeadInsts))
- return false;
-
- const unsigned Factor = InterleaveValues.size();
+ SmallVector<Value *, 8> InterleaveValues(II->args());
+ const unsigned Factor = getIntrinsicFactor(II);
if (auto *VPStore = dyn_cast<VPIntrinsic>(StoredBy)) {
if (VPStore->getIntrinsicID() != Intrinsic::vp_store)
@@ -922,7 +765,7 @@ bool InterleavedAccessImpl::lowerInterleaveIntrinsic(
// We now have a target-specific store, so delete the old one.
DeadInsts.insert(cast<Instruction>(StoredBy));
- DeadInsts.insert_range(InterleaveDeadInsts);
+ DeadInsts.insert(II);
return true;
}
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-deinterleave-load.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-deinterleave-load.ll
index c2ae1ce491389..3e822d357b667 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-deinterleave-load.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-deinterleave-load.ll
@@ -293,31 +293,6 @@ define { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8> } @vector_deinterleave_load_fact
ret { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8> } %res3
}
-; TODO: Remove once recursive deinterleaving support is removed
-define { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8> } @vector_deinterleave_load_factor4_recursive(ptr %p) {
-; CHECK-LABEL: vector_deinterleave_load_factor4_recursive:
-; CHECK: # %bb.0:
-; CHECK-NEXT: vsetivli zero, 8, e8, mf2, ta, ma
-; CHECK-NEXT: vlseg4e8.v v8, (a0)
-; CHECK-NEXT: ret
- %vec = load <32 x i8>, ptr %p
- %d0 = call {<16 x i8>, <16 x i8>} @llvm.vector.deinterleave2.v32i8(<32 x i8> %vec)
- %d0.0 = extractvalue { <16 x i8>, <16 x i8> } %d0, 0
- %d0.1 = extractvalue { <16 x i8>, <16 x i8> } %d0, 1
- %d1 = call {<8 x i8>, <8 x i8>} @llvm.vector.deinterleave2.v16i8(<16 x i8> %d0.0)
- %t0 = extractvalue { <8 x i8>, <8 x i8> } %d1, 0
- %t2 = extractvalue { <8 x i8>, <8 x i8> } %d1, 1
- %d2 = call {<8 x i8>, <8 x i8>} @llvm.vector.deinterleave2.v16i8(<16 x i8> %d0.1)
- %t1 = extractvalue { <8 x i8>, <8 x i8> } %d2, 0
- %t3 = extractvalue { <8 x i8>, <8 x i8> } %d2, 1
-
- %res0 = insertvalue { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8> } undef, <8 x i8> %t0, 0
- %res1 = insertvalue { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8> } %res0, <8 x i8> %t1, 1
- %res2 = insertvalue { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8> } %res1, <8 x i8> %t2, 2
- %res3 = insertvalue { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8> } %res2, <8 x i8> %t3, 3
- ret { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8> } %res3
-}
-
define { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8> } @vector_deinterleave_load_factor5(ptr %p) {
; CHECK-LABEL: vector_deinterleave_load_factor5:
; CHECK: # %bb.0:
@@ -414,45 +389,3 @@ define { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <
%res7 = insertvalue { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8> } %res6, <8 x i8> %t6, 7
ret { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8> } %res7
}
-
-; TODO: Remove once recursive deinterleaving support is removed
-define {<2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>} @vector_deinterleave_load_factor8_recursive(ptr %ptr) {
-; CHECK-LABEL: vector_deinterleave_load_factor8_recursive:
-; CHECK: # %bb.0:
-; CHECK-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
-; CHECK-NEXT: vlseg8e32.v v8, (a0)
-; CHECK-NEXT: ret
- %vec = load <16 x i32>, ptr %ptr
- %d0 = call { <8 x i32>, <8 x i32> } @llvm.vector.deinterleave2.v16i32(<16 x i32> %vec)
- %d0.0 = extractvalue { <8 x i32>, <8 x i32> } %d0, 0
- %d0.1 = extractvalue { <8 x i32>, <8 x i32> } %d0, 1
- %d1 = call { <4 x i32>, <4 x i32> } @llvm.vector.deinterleave2.v8i32(<8 x i32> %d0.0)
- %d1.0 = extractvalue { <4 x i32>, <4 x i32> } %d1, 0
- %d1.1 = extractvalue { <4 x i32>, <4 x i32> } %d1, 1
- %d2 = call { <4 x i32>, <4 x i32> } @llvm.vector.deinterleave2.v8i32(<8 x i32> %d0.1)
- %d2.0 = extractvalue { <4 x i32>, <4 x i32> } %d2, 0
- %d2.1 = extractvalue { <4 x i32>, <4 x i32> } %d2, 1
-
- %d3 = call { <2 x i32>, <2 x i32> } @llvm.vector.deinterleave2.v4i32(<4 x i32> %d1.0)
- %t0 = extractvalue { <2 x i32>, <2 x i32> } %d3, 0
- %t4 = extractvalue { <2 x i32>, <2 x i32> } %d3, 1
- %d4 = call { <2 x i32>, <2 x i32> } @llvm.vector.deinterleave2.v4i32(<4 x i32> %d1.1)
- %t2 = extractvalue { <2 x i32>, <2 x i32> } %d4, 0
- %t6 = extractvalue { <2 x i32>, <2 x i32> } %d4, 1
- %d5 = call { <2 x i32>, <2 x i32> } @llvm.vector.deinterleave2.v4i32(<4 x i32> %d2.0)
- %t1 = extractvalue { <2 x i32>, <2 x i32> } %d5, 0
- %t5 = extractvalue { <2 x i32>, <2 x i32> } %d5, 1
- %d6 = call { <2 x i32>, <2 x i32> } @llvm.vector.deinterleave2.v4i32(<4 x i32> %d2.1)
- %t3 = extractvalue { <2 x i32>, <2 x i32> } %d6, 0
- %t7 = extractvalue { <2 x i32>, <2 x i32> } %d6, 1
-
- %res0 = insertvalue { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } undef, <2 x i32> %t0, 0
- %res1 = insertvalue { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } %res0, <2 x i32> %t1, 1
- %res2 = insertvalue { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } %res1, <2 x i32> %t2, 2
- %res3 = insertvalue { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } %res2, <2 x i32> %t3, 3
- %res4 = insertvalue { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } %res3, <2 x i32> %t4, 4
- %res5 = insertvalue { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } %res4, <2 x i32> %t5, 5
- %res6 = insertvalue { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } %res5, <2 x i32> %t6, 6
- %res7 = insertvalue { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } %res6, <2 x i32> %t7, 7
- ret { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } %res7
-}
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleave-store.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleave-store.ll
index c394e7aa2e3e8..a49eeed3605c5 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleave-store.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleave-store.ll
@@ -203,20 +203,6 @@ define void @vector_interleave_store_factor4(<4 x i32> %a, <4 x i32> %b, <4 x i3
ret void
}
-; TODO: Remove once recursive interleaving support is removed
-define void @vector_interleave_store_factor4_recursive(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c, <4 x i32> %d, ptr %p) {
-; CHECK-LABEL: vector_interleave_store_factor4_recursive:
-; CHECK: # %bb.0:
-; CHECK-NEXT: vsetivli zero, 4, e32, m1, ta, ma
-; CHECK-NEXT: vsseg4e32.v v8, (a0)
-; CHECK-NEXT: ret
- %v0 = call <8 x i32> @llvm.vector.interleave2.v8i32(<4 x i32> %a, <4 x i32> %c)
- %v1 = call <8 x i32> @llvm.vector.interleave2.v8i32(<4 x i32> %b, <4 x i32> %d)
- %v2 = call <16 x i32> @llvm.vector.interleave2.v16i32(<8 x i32> %v0, <8 x i32> %v1)
- store <16 x i32> %v2, ptr %p
- ret void
-}
-
define void @vector_interleave_store_factor5(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c, <4 x i32> %d, <4 x i32> %e, ptr %p) {
; CHECK-LABEL: vector_interleave_store_factor5:
; CHECK: # %bb.0:
@@ -260,23 +246,3 @@ define void @vector_interleave_store_factor8(<4 x i32> %a, <4 x i32> %b, <4 x i3
store <32 x i32> %v, ptr %p
ret void
}
-
-; TODO: Remove once recursive interleaving support is removed
-define void @vector_interleave_store_factor8_recursive(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c, <4 x i32> %d, <4 x i32> %e, <4 x i32> %f, <4 x i32> %g, <4 x i32> %h, ptr %p) {
-; CHECK-LABEL: vector_interleave_store_factor8_recursive:
-; CHECK: # %bb.0:
-; CHECK-NEXT: vsetivli zero, 4, e32, m1, ta, ma
-; CHECK-NEXT: vsseg8e32.v v8, (a0)
-; CHECK-NEXT: ret
- %v0 = call <8 x i32> @llvm.vector.interleave2.v8i32(<4 x i32> %a, <4 x i32> %e)
- %v1 = call <8 x i32> @llvm.vector.interleave2.v8i32(<4 x i32> %c, <4 x i32> %g)
- %v2 = call <16 x i32> @llvm.vector.interleave2.v16i32(<8 x i32> %v0, <8 x i32> %v1)
-
- %v3 = call <8 x i32> @llvm.vector.interleave2.v8i32(<4 x i32> %b, <4 x i32> %f)
- %v4 = call <8 x i32> @llvm.vector.interleave2.v8i32(<4 x i32> %d, <4 x i32> %h)
- %v5 = call <16 x i32> @llvm.vector.interleave2.v16i32(<8 x i32> %v3, <8 x i32> %v4)
-
- %v6 = call <32 x i32> @llvm.vector.interleave2.v32i32(<16 x i32> %v2, <16 x i32> %v5)
- store <32 x i32> %v6, ptr %p
- ret void
-}
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll
index 8ac4c7447c7d4..5e3ae2faf1a53 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll
@@ -302,15 +302,11 @@ define {<2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>} @vpload_factor4_intrinsics(p
; CHECK-NEXT: vlseg4e32.v v8, (a0)
; CHECK-NEXT: ret
%wide.masked.load = call <8 x i32> @llvm.vp.load.v8i32.p0(ptr %ptr, <8 x i1> splat (i1 true), i32 8)
- %d0 = call { <4 x i32>, <4 x i32> } @llvm.vector.deinterleave2.v8i32(<8 x i32> %wide.masked.load)
- %d0.0 = extractvalue { <4 x i32>, <4 x i32> } %d0, 0
- %d0.1 = extractvalue { <4 x i32>, <4 x i32> } %d0, 1
- %d1 = call { <2 x i32>, <2 x i32> } @llvm.vector.deinterleave2.v4i32(<4 x i32> %d0.0)
- %t0 = extractvalue { <2 x i32>, <2 x i32> } %d1, 0
- %t2 = extractvalue { <2 x i32>, <2 x i32> } %d1, 1
- %d2 = call { <2 x i32>, <2 x i32> } @llvm.vector.deinterleave2.v4i32(<4 x i32> %d0.1)
- %t1 = extractvalue { <2 x i32>, <2 x i32> } %d2, 0
- %t3 = extractvalue { <2 x i32>, <2 x i32> } %d2, 1
+ %d = call { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } @llvm.vector.deinterleave4.v8i32(<8 x i32> %wide.masked.load)
+ %t0 = extractvalue { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } %d, 0
+ %t1 = extractvalue { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } %d, 1
+ %t2 = extractvalue { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } %d, 2
+ %t3 = extractvalue { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } %d, 3
%res0 = insertvalue { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } poison, <2 x i32> %t0, 0
%res1 = insertvalue { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } %res0, <2 x i32> %t1, 1
diff --git a/llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-load.ll b/llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-load.ll
index 9344c52098684..b11db3d61f693 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-load.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-load.ll
@@ -380,31 +380,6 @@ define { <vscale x 8 x i8>, <vscale x 8 x i8>, <vscale x 8 x i8>, <vscale x 8 x
ret { <vscale x 8 x i8>, <vscale x 8 x i8>, <vscale x 8 x i8>, <vscale x 8 x i8> } %res3
}
-; TODO: Remove once recursive deinterleaving support is removed
-define { <vscale x 8 x i8>, <vscale x 8 x i8>, <vscale x 8 x i8>, <vscale x 8 x i8> } @vector_deinterleave_load_factor4_recursive(ptr %p) {
-; CHECK-LABEL: vector_deinterleave_load_factor4_recursive:
-; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli a1, zero, e8, m1, ta, ma
-; CHECK-NEXT: vlseg4e8.v v8, (a0)
-; CHECK-NEXT: ret
- %vec = load <vscale x 32 x i8>, ptr %p
- %d0 = call {<vscale x 16 x i8>, <vscale x 16 x i8>} @llvm.vector.deinterleave2.nxv32i8(<vscale x 32 x i8> %vec)
- %d0.0 = extractvalue { <vscale x 16 x i8>, <vscale x 16 x i8> } %d0, 0
- %d0.1 = extractvalue { <vscale x 16 x i8>, <vscale x 16 x i8> } %d0, 1
- %d1 = call {<vscale x 8 x i8>, <vscale x 8 x i8>} @llvm.vector.deinterleave2.nxv16i8(<vscale x 16 x i8> %d0.0)
- %t0 = extractvalue { <vscale x 8 x i8>, <vscale x 8 x i8> } %d1, 0
- %t2 = extractvalue { <vscale x 8 x i8>, <vscale x 8 x i8> } %d1, 1
- %d2 = call {<vscale x 8 x i8>, <vscale x 8 x i8>} @llvm.vector.deinterleave2.nxv16i8(<vscale x 16 x i8> %d0.1)
- %t1 = extractvalue { <vscale x 8 x i8>, <vscale x 8 x i8> } %d2, 0
- %t3 = extractvalue { <vscale x 8 x i8>, <vscale x 8 x i8> } %d2, 1
-
- %res0 = insertvalue { <vscale x 8 x i8>, <vscale x 8 x i8>, <vscale x 8 x i8>...
[truncated]
|
…access/remove-recursive
Gentle ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
…access/remove-recursive
Now that the loop vectorizer emits just a single llvm.vector.[de]interleaveN intrinsic after llvm#141865, we can remove the need to recognise recursively [de]interleaved intrinsics. No in-tree target currently has instructions to emit an interleaved access with a factor > 8, and I'm not aware of any other passes that will emit recursive interleave patterns, so this code is effectively dead. Some tests have been converted from the recursive form to a single intrinsic, and some others were deleted that are no longer needed, e.g. to do with the recursive tree. This closes off the work started in llvm#139893.
Now that the loop vectorizer emits just a single llvm.vector.[de]interleaveN intrinsic after #141865, we can remove the need to recognise recursively [de]interleaved intrinsics.
No in-tree target currently has instructions to emit an interleaved access with a factor > 8, and I'm not aware of any other passes that will emit recursive interleave patterns, so this code is effectively dead.
Some tests have been converted from the recursive form to a single intrinsic, and some others were deleted that are no longer needed, e.g. to do with the recursive tree.
This closes off the work started in #139893.