Skip to content

Conversation

tomeksowi
Copy link
Member

Unordered floating-point comparisons are achieved by neg'ing the ordered counterparts. Avoid that by reversing both the FP comparison and the zero-comparison fused with the branch.

Also, minor codegen haircut.

Part of #84834, cc @dotnet/samsung

@tomeksowi tomeksowi marked this pull request as draft May 23, 2025 14:53
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label May 23, 2025
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 23, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@tomeksowi
Copy link
Member Author

No regressions.

Diffs are based on 94,874 contexts (52,265 MinOpts, 42,609 FullOpts).

MISSED contexts: 3 (0.00%)

Overall (-23,124 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
linux.riscv64.Checked.mch 111,279,548 -23,124 -0.03%
MinOpts (-604 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
linux.riscv64.Checked.mch 80,673,320 -604 -0.00%
FullOpts (-22,520 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
linux.riscv64.Checked.mch 30,606,228 -22,520 -0.05%
Example diffs
linux.riscv64.Checked.mch
-12 (-11.11%) : 86422.dasm - Microsoft.VisualBasic.CompilerServices.Operators:CompareSingle(float,float):int (FullOpts)
@@ -23,22 +23,19 @@ G_M44467_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 						;; size=16 bbWeight=1 PerfScore 9.00
 G_M44467_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             feq.s          a0, fa0, fa1
-            xori           a0, a0, 1
             sext.w         t6, a0
-            beqz           t6, G_M44467_IG06
-						;; size=16 bbWeight=1 PerfScore 7.50
+            bnez           t6, G_M44467_IG06
+						;; size=12 bbWeight=1 PerfScore 7.00
 G_M44467_IG03:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             flt.s          a0, fa0, fa1
-            xori           a0, a0, 1
             sext.w         t6, a0
-            beqz           t6, G_M44467_IG05
+            bnez           t6, G_M44467_IG05
             flt.s          a0, fa1, fa0
-            xori           a0, a0, 1
             sext.w         t6, a0
-            beqz           t6, G_M44467_IG04
+            bnez           t6, G_M44467_IG04
             addi           a0, zero, 0xD1FFAB1E
             j              G_M44467_IG07
-						;; size=40 bbWeight=0.50 PerfScore 8.50
+						;; size=32 bbWeight=0.50 PerfScore 8.00
 G_M44467_IG04:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             addi           a0, zero, 0xD1FFAB1E
             j              G_M44467_IG07
@@ -56,7 +53,7 @@ G_M44467_IG07:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             addi           sp, sp, 16
             ret						;; size=16 bbWeight=1 PerfScore 7.50
 
-; Total bytes of code 108, prolog size 16, PerfScore 34.75, instruction count 27, allocated bytes for code 108 (MethodHash=6af2524c) for method Microsoft.VisualBasic.CompilerServices.Operators:CompareSingle(float,float):int (FullOpts)
+; Total bytes of code 96, prolog size 16, PerfScore 33.75, instruction count 24, allocated bytes for code 96 (MethodHash=6af2524c) for method Microsoft.VisualBasic.CompilerServices.Operators:CompareSingle(float,float):int (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -67,7 +64,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 27 (0x0001b) Actual length = 108 (0x00006c)
+  Function Length   : 24 (0x00018) Actual length = 96 (0x000060)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-12 (-11.11%) : 86423.dasm - Microsoft.VisualBasic.CompilerServices.Operators:CompareDouble(double,double):int (FullOpts)
@@ -23,22 +23,19 @@ G_M37116_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 						;; size=16 bbWeight=1 PerfScore 9.00
 G_M37116_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             feq.d          a0, fa0, fa1
-            xori           a0, a0, 1
             sext.w         t6, a0
-            beqz           t6, G_M37116_IG06
-						;; size=16 bbWeight=1 PerfScore 7.50
+            bnez           t6, G_M37116_IG06
+						;; size=12 bbWeight=1 PerfScore 7.00
 G_M37116_IG03:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             flt.d          a0, fa0, fa1
-            xori           a0, a0, 1
             sext.w         t6, a0
-            beqz           t6, G_M37116_IG05
+            bnez           t6, G_M37116_IG05
             flt.d          a0, fa1, fa0
-            xori           a0, a0, 1
             sext.w         t6, a0
-            beqz           t6, G_M37116_IG04
+            bnez           t6, G_M37116_IG04
             addi           a0, zero, 0xD1FFAB1E
             j              G_M37116_IG07
-						;; size=40 bbWeight=0.50 PerfScore 8.50
+						;; size=32 bbWeight=0.50 PerfScore 8.00
 G_M37116_IG04:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             addi           a0, zero, 0xD1FFAB1E
             j              G_M37116_IG07
@@ -56,7 +53,7 @@ G_M37116_IG07:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             addi           sp, sp, 16
             ret						;; size=16 bbWeight=1 PerfScore 7.50
 
-; Total bytes of code 108, prolog size 16, PerfScore 34.75, instruction count 27, allocated bytes for code 108 (MethodHash=2f356f03) for method Microsoft.VisualBasic.CompilerServices.Operators:CompareDouble(double,double):int (FullOpts)
+; Total bytes of code 96, prolog size 16, PerfScore 33.75, instruction count 24, allocated bytes for code 96 (MethodHash=2f356f03) for method Microsoft.VisualBasic.CompilerServices.Operators:CompareDouble(double,double):int (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -67,7 +64,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 27 (0x0001b) Actual length = 108 (0x00006c)
+  Function Length   : 24 (0x00018) Actual length = 96 (0x000060)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-16 (-10.26%) : 47378.dasm - BringUpTest_JTrueGtFP:JTrueGtFP(float):int (FullOpts)
@@ -25,31 +25,27 @@ G_M56740_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             lui            a1, 0xD1FFAB1E
             fmv.w.x        ft4, a1
             flt.s          a1, ft4, fa0
-            xori           a1, a1, 1
             sext.w         t6, a1
-            beqz           t6, G_M56740_IG06
-						;; size=28 bbWeight=1 PerfScore 9.50
+            bnez           t6, G_M56740_IG06
+						;; size=24 bbWeight=1 PerfScore 9.00
 G_M56740_IG03:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             fmv.w.x        ft4, zero
             flt.s          a1, ft4, fa0
-            xori           a1, a1, 1
             sext.w         t6, a1
-            beqz           t6, G_M56740_IG05
+            bnez           t6, G_M56740_IG05
             lui            a1, 0xD1FFAB1E
             fmv.w.x        ft4, a1
             flt.s          a1, ft4, fa0
-            xori           a1, a1, 1
             sext.w         t6, a1
-            beqz           t6, G_M56740_IG04
+            bnez           t6, G_M56740_IG04
             auipc          t6, 0xD1FFAB1E
             flw            ft4, 0xD1FFAB1E(t6)
             flt.s          a1, ft4, fa0
-            xori           a1, a1, 1
             sext.w         t6, a1
-            bnez           t6, G_M56740_IG07
+            beqz           t6, G_M56740_IG07
             addi           a0, zero, 0xD1FFAB1E
             j              G_M56740_IG07
-						;; size=76 bbWeight=0.50 PerfScore 14.50
+						;; size=64 bbWeight=0.50 PerfScore 13.75
 G_M56740_IG04:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             addi           a0, zero, 0xD1FFAB1E
             j              G_M56740_IG07
@@ -69,7 +65,7 @@ G_M56740_IG07:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 RWD00  	dd	FF7FFFFFh		; -3.40282e+38
 
 
-; Total bytes of code 156, prolog size 16, PerfScore 42.75, instruction count 38, allocated bytes for code 156 (MethodHash=4c11225b) for method BringUpTest_JTrueGtFP:JTrueGtFP(float):int (FullOpts)
+; Total bytes of code 140, prolog size 16, PerfScore 41.50, instruction count 34, allocated bytes for code 140 (MethodHash=4c11225b) for method BringUpTest_JTrueGtFP:JTrueGtFP(float):int (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -80,7 +76,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 39 (0x00027) Actual length = 156 (0x00009c)
+  Function Length   : 35 (0x00023) Actual length = 140 (0x00008c)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+0 (0.00%) : 94768.dasm - Microsoft.Diagnostics.Tracing.ZippedETLWriter:PrepForWrite():System.Collections.Generic.List`1[System.String]:this (FullOpts)

No diffs found?

+0 (0.00%) : 94736.dasm - Microsoft.Diagnostics.Tracing.TraceEventNativeMethods:IsElevated():System.Nullable`1[ubyte] (FullOpts)

No diffs found?

+0 (0.00%) : 94480.dasm - Microsoft.Diagnostics.Tracing.EventPipeEventMetaDataHeader:.ctor(FastSerialization.PinnedStreamReader,int,int,int,int,int,System.String):this (FullOpts)

No diffs found?

Details

Size improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same size Improvements (bytes) Regressions (bytes)
linux.riscv64.Checked.mch 11,769 698 0 11,071 -23,124 +0

PerfScore improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same PerfScore Improvements (PerfScore) Regressions (PerfScore) PerfScore Overall in FullOpts
linux.riscv64.Checked.mch 11,769 684 0 11,085 -0.48% 0.00% -0.0076%

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
linux.riscv64.Checked.mch 94,874 52,265 42,609 3 (0.00%) 3 (0.00%)

jit-analyze output

@tomeksowi tomeksowi marked this pull request as ready for review May 26, 2025 07:28
@risc-vv
Copy link

risc-vv commented May 27, 2025

d609565 is being scheduled for building and testing

GIT: d6095657b1b72c093baf6b57a8b6f05a47ec5f56
REPO: dotnet/runtime
BRANCH: main

@JulieLeeMSFT JulieLeeMSFT requested a review from jakobbotsch June 23, 2025 15:57
@JulieLeeMSFT
Copy link
Member

@jakobbotsch, PTAL.

@jakobbotsch jakobbotsch reopened this Jun 24, 2025
Comment on lines 3144 to 3147
if (isUnordered)
{
if (tree->OperIs(GT_LT))
{
emit->emitIns_R_R_R(cmpSize == EA_4BYTE ? INS_fle_s : INS_fle_d, cmpSize, targetReg, regOp2, regOp1);
}
else if (tree->OperIs(GT_LE))
{
emit->emitIns_R_R_R(cmpSize == EA_4BYTE ? INS_flt_s : INS_flt_d, cmpSize, targetReg, regOp2, regOp1);
}
else if (tree->OperIs(GT_NE))
{
emit->emitIns_R_R_R(cmpSize == EA_4BYTE ? INS_feq_s : INS_feq_d, cmpSize, targetReg, regOp1, regOp2);
}
else if (tree->OperIs(GT_GT))
{
emit->emitIns_R_R_R(cmpSize == EA_4BYTE ? INS_fle_s : INS_fle_d, cmpSize, targetReg, regOp1, regOp2);
}
else if (tree->OperIs(GT_GE))
{
emit->emitIns_R_R_R(cmpSize == EA_4BYTE ? INS_flt_s : INS_flt_d, cmpSize, targetReg, regOp1, regOp2);
}
else
{
unreached();
}
emit->emitIns_R_R_I(INS_xori, EA_8BYTE, targetReg, targetReg, 1);
oper = GenTree::ReverseRelop(oper);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar swapping is done in a platform agnostic way in TryLowerConditionToFlagsNode based on GenCondition::PreferSwap. Can RISC-V use the same mechanism?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TryLowerConditionToFlagsNode is not called on RISC-V, it looks more geared for architectures with a special flag register for conditions. Floating comparisons on RISC-V are written to a normal GP register.

Maybe I can swap it in LowerCompare in a separate PR? Some of the stuff from #115039 also looks suitable for moving from codegen to lowering (there's similar swapping and negating going on when the integer comparison is not fed into a branch), which would simplify register allocation, containing immediates, and also sign-extension elimination in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, makes sense.

@jakobbotsch jakobbotsch merged commit a0397e7 into dotnet:main Jun 25, 2025
115 of 117 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jul 26, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-riscv Related to the RISC-V architecture area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants