-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[ISel] Commute FMUL and inserting zero into vector lane #146096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-backend-aarch64 Author: Julian Nagele (juliannagele) ChangesWhen inserting zero into a vector that is the result of a multiplication we can do the insertion earlier, i.e. into an operand of the fmul instead. Full diff: https://github.com/llvm/llvm-project/pull/146096.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index c8b1eafd35495..867833f67b822 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -26268,11 +26268,43 @@ static SDValue removeRedundantInsertVectorElt(SDNode *N) {
return ExtractVec;
}
+static SDValue commuteInsertVectorEltFMul(SDNode *N, SelectionDAG &DAG) {
+ assert(N->getOpcode() == ISD::INSERT_VECTOR_ELT && "Unexpected node!");
+ SDValue InsertVec = N->getOperand(0);
+ SDValue InsertVal = N->getOperand(1);
+ SDValue InsertIdx = N->getOperand(2);
+
+ // Only handle constant 0 insertion...
+ if (!(isNullConstant(InsertVal) || isNullFPConstant(InsertVal)))
+ return SDValue();
+ // ... into the result of an FMUL.
+ if (InsertVec.getOpcode() != ISD::FMUL)
+ return SDValue();
+
+ // Insert into the operand of FMUL instead.
+ SDValue FMulOp = InsertVec.getOperand(0);
+
+ if (!InsertVec.hasOneUse() || !FMulOp.hasOneUse())
+ return SDValue();
+
+ SDValue InsertOp =
+ DAG.getNode(ISD::INSERT_VECTOR_ELT, SDLoc(N), FMulOp.getValueType(),
+ FMulOp, InsertVal, InsertIdx);
+ SDValue FMul =
+ DAG.getNode(ISD::FMUL, SDLoc(InsertVec), InsertVec.getValueType(),
+ InsertOp, InsertVec.getOperand(1));
+ DAG.ReplaceAllUsesWith(N, &FMul);
+ return FMul;
+}
+
static SDValue
performInsertVectorEltCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI) {
if (SDValue Res = removeRedundantInsertVectorElt(N))
return Res;
+ if (SDValue Res = commuteInsertVectorEltFMul(N, DCI.DAG))
+ return Res;
+
return performPostLD1Combine(N, DCI, true);
}
diff --git a/llvm/test/CodeGen/AArch64/arm64-vmul.ll b/llvm/test/CodeGen/AArch64/arm64-vmul.ll
index 937a17ca6c1e0..1e33f81b9b835 100644
--- a/llvm/test/CodeGen/AArch64/arm64-vmul.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-vmul.ll
@@ -1186,6 +1186,16 @@ define double @fmul_lane_d(double %A, <2 x double> %vec) nounwind {
ret double %res
}
+define <4 x float> @fmul_insert_zero(<4 x float> %A, <4 x float> %B) {
+; CHECK-LABEL: fmul_insert_zero:
+; CHECK: // %bb.0:
+; CHECK-NEXT: mov.s v0[3], wzr
+; CHECK-NEXT: fmul.4s v0, v0, v1
+; CHECK-NEXT: ret
+ %mul = fmul <4 x float> %A, %B
+ %mul_set_lane = insertelement <4 x float> %mul, float 0.000000e+00, i64 3
+ ret <4 x float> %mul_set_lane
+}
define <2 x float> @fmulx_lane_2s(<2 x float> %A, <2 x float> %B) nounwind {
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give context why this is better? Thanks
We've observed that doing the insert on an operand of the mul we potentially reduce the critical path, in particular in combination with #146538 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add maybe a bit more context, commuting the ops may allow the insert to happen earlier, while if we do it after the mul we need to wait for the mul (and its other operand) to be completed.
It should not increase the critial path I think, so in the worst case should not make things worse
OK sounds good. Like you said I don't think this would be worse (and I guess there is a chance to generate fma?). Could we add a test for a case that shows improvement? How do we know which operand to zero? If it needs trace info then that is only available later in MachineCombiner. (Sometimes I can see that it replies on other DAG optimizations, in which case it sounds OK, it just might not always be reliable). |
When inserting zero into a vector that is the result of a multiplication we can do the insertion earlier, i.e. into an operand of the fmul instead.