[ISel] Commute FMUL and inserting zero into vector lane #146096

juliannagele · 2025-06-27T15:39:55Z

When inserting zero into a vector that is the result of a multiplication we can do the insertion earlier, i.e. into an operand of the fmul instead.

llvmbot · 2025-06-27T15:40:29Z

@llvm/pr-subscribers-backend-aarch64

Author: Julian Nagele (juliannagele)

Changes

When inserting zero into a vector that is the result of a multiplication we can do the insertion earlier, i.e. into an operand of the fmul instead.

Full diff: https://github.com/llvm/llvm-project/pull/146096.diff

2 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+32)
(modified) llvm/test/CodeGen/AArch64/arm64-vmul.ll (+10)

diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index c8b1eafd35495..867833f67b822 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -26268,11 +26268,43 @@ static SDValue removeRedundantInsertVectorElt(SDNode *N) {
   return ExtractVec;
 }
 
+static SDValue commuteInsertVectorEltFMul(SDNode *N, SelectionDAG &DAG) {
+  assert(N->getOpcode() == ISD::INSERT_VECTOR_ELT && "Unexpected node!");
+  SDValue InsertVec = N->getOperand(0);
+  SDValue InsertVal = N->getOperand(1);
+  SDValue InsertIdx = N->getOperand(2);
+
+  // Only handle constant 0 insertion...
+  if (!(isNullConstant(InsertVal) || isNullFPConstant(InsertVal)))
+    return SDValue();
+  // ... into the result of an FMUL.
+  if (InsertVec.getOpcode() != ISD::FMUL)
+    return SDValue();
+
+  // Insert into the operand of FMUL instead.
+  SDValue FMulOp = InsertVec.getOperand(0);
+
+  if (!InsertVec.hasOneUse() || !FMulOp.hasOneUse())
+    return SDValue();
+
+  SDValue InsertOp =
+      DAG.getNode(ISD::INSERT_VECTOR_ELT, SDLoc(N), FMulOp.getValueType(),
+                  FMulOp, InsertVal, InsertIdx);
+  SDValue FMul =
+      DAG.getNode(ISD::FMUL, SDLoc(InsertVec), InsertVec.getValueType(),
+                  InsertOp, InsertVec.getOperand(1));
+  DAG.ReplaceAllUsesWith(N, &FMul);
+  return FMul;
+}
+
 static SDValue
 performInsertVectorEltCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI) {
   if (SDValue Res = removeRedundantInsertVectorElt(N))
     return Res;
 
+  if (SDValue Res = commuteInsertVectorEltFMul(N, DCI.DAG))
+    return Res;
+
   return performPostLD1Combine(N, DCI, true);
 }
 
diff --git a/llvm/test/CodeGen/AArch64/arm64-vmul.ll b/llvm/test/CodeGen/AArch64/arm64-vmul.ll
index 937a17ca6c1e0..1e33f81b9b835 100644
--- a/llvm/test/CodeGen/AArch64/arm64-vmul.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-vmul.ll
@@ -1186,6 +1186,16 @@ define double @fmul_lane_d(double %A, <2 x double> %vec) nounwind {
   ret double %res
 }
 
+define <4 x float> @fmul_insert_zero(<4 x float> %A, <4 x float> %B) {
+; CHECK-LABEL: fmul_insert_zero:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    mov.s v0[3], wzr
+; CHECK-NEXT:    fmul.4s v0, v0, v1
+; CHECK-NEXT:    ret
+  %mul = fmul <4 x float> %A, %B
+  %mul_set_lane = insertelement <4 x float> %mul, float 0.000000e+00, i64 3
+  ret <4 x float> %mul_set_lane
+}
 
 
 define <2 x float> @fmulx_lane_2s(<2 x float> %A, <2 x float> %B) nounwind {

davemgreen

Can you give context why this is better? Thanks

juliannagele · 2025-07-01T14:25:38Z

We've observed that doing the insert on an operand of the mul we potentially reduce the critical path, in particular in combination with #146538

fhahn

To add maybe a bit more context, commuting the ops may allow the insert to happen earlier, while if we do it after the mul we need to wait for the mul (and its other operand) to be completed.

It should not increase the critial path I think, so in the worst case should not make things worse

davemgreen · 2025-07-02T08:20:12Z

OK sounds good. Like you said I don't think this would be worse (and I guess there is a chance to generate fma?). Could we add a test for a case that shows improvement?

How do we know which operand to zero? If it needs trace info then that is only available later in MachineCombiner. (Sometimes I can see that it replies on other DAG optimizations, in which case it sounds OK, it just might not always be reliable).

juliannagele added 2 commits June 27, 2025 15:32

Add test for inserting zero into vector lane of fmul result

93a6be0

[ISel] Commute FMUL and inserting zero into vector lane

e08422d

juliannagele requested a review from fhahn June 27, 2025 15:40

llvmbot added the backend:AArch64 label Jun 27, 2025

juliannagele requested a review from aemerson June 27, 2025 15:40

davemgreen reviewed Jun 29, 2025

View reviewed changes

fhahn reviewed Jul 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ISel] Commute FMUL and inserting zero into vector lane #146096

[ISel] Commute FMUL and inserting zero into vector lane #146096

Uh oh!

juliannagele commented Jun 27, 2025

Uh oh!

llvmbot commented Jun 27, 2025

Uh oh!

davemgreen left a comment

Uh oh!

juliannagele commented Jul 1, 2025

Uh oh!

fhahn left a comment

Uh oh!

davemgreen commented Jul 2, 2025

Uh oh!

Uh oh!

[ISel] Commute FMUL and inserting zero into vector lane #146096

Are you sure you want to change the base?

[ISel] Commute FMUL and inserting zero into vector lane #146096

Uh oh!

Conversation

juliannagele commented Jun 27, 2025

Uh oh!

llvmbot commented Jun 27, 2025

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

juliannagele commented Jul 1, 2025

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

davemgreen commented Jul 2, 2025

Uh oh!

Uh oh!