Skip to content

[RISC-V] Reverse unordered FP comparisons in branches #115943

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 25, 2025

Conversation

tomeksowi
Copy link
Contributor

Unordered floating-point comparisons are achieved by neg'ing the ordered counterparts. Avoid that by reversing both the FP comparison and the zero-comparison fused with the branch.

Also, minor codegen haircut.

Part of #84834, cc @dotnet/samsung

@tomeksowi tomeksowi marked this pull request as draft May 23, 2025 14:53
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label May 23, 2025
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 23, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@tomeksowi
Copy link
Contributor Author

No regressions.

Diffs are based on 94,874 contexts (52,265 MinOpts, 42,609 FullOpts).

MISSED contexts: 3 (0.00%)

Overall (-23,124 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
linux.riscv64.Checked.mch 111,279,548 -23,124 -0.03%
MinOpts (-604 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
linux.riscv64.Checked.mch 80,673,320 -604 -0.00%
FullOpts (-22,520 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
linux.riscv64.Checked.mch 30,606,228 -22,520 -0.05%
Example diffs
linux.riscv64.Checked.mch
-12 (-11.11%) : 86422.dasm - Microsoft.VisualBasic.CompilerServices.Operators:CompareSingle(float,float):int (FullOpts)
@@ -23,22 +23,19 @@ G_M44467_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 						;; size=16 bbWeight=1 PerfScore 9.00
 G_M44467_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             feq.s          a0, fa0, fa1
-            xori           a0, a0, 1
             sext.w         t6, a0
-            beqz           t6, G_M44467_IG06
-						;; size=16 bbWeight=1 PerfScore 7.50
+            bnez           t6, G_M44467_IG06
+						;; size=12 bbWeight=1 PerfScore 7.00
 G_M44467_IG03:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             flt.s          a0, fa0, fa1
-            xori           a0, a0, 1
             sext.w         t6, a0
-            beqz           t6, G_M44467_IG05
+            bnez           t6, G_M44467_IG05
             flt.s          a0, fa1, fa0
-            xori           a0, a0, 1
             sext.w         t6, a0
-            beqz           t6, G_M44467_IG04
+            bnez           t6, G_M44467_IG04
             addi           a0, zero, 0xD1FFAB1E
             j              G_M44467_IG07
-						;; size=40 bbWeight=0.50 PerfScore 8.50
+						;; size=32 bbWeight=0.50 PerfScore 8.00
 G_M44467_IG04:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             addi           a0, zero, 0xD1FFAB1E
             j              G_M44467_IG07
@@ -56,7 +53,7 @@ G_M44467_IG07:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             addi           sp, sp, 16
             ret						;; size=16 bbWeight=1 PerfScore 7.50
 
-; Total bytes of code 108, prolog size 16, PerfScore 34.75, instruction count 27, allocated bytes for code 108 (MethodHash=6af2524c) for method Microsoft.VisualBasic.CompilerServices.Operators:CompareSingle(float,float):int (FullOpts)
+; Total bytes of code 96, prolog size 16, PerfScore 33.75, instruction count 24, allocated bytes for code 96 (MethodHash=6af2524c) for method Microsoft.VisualBasic.CompilerServices.Operators:CompareSingle(float,float):int (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -67,7 +64,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 27 (0x0001b) Actual length = 108 (0x00006c)
+  Function Length   : 24 (0x00018) Actual length = 96 (0x000060)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-12 (-11.11%) : 86423.dasm - Microsoft.VisualBasic.CompilerServices.Operators:CompareDouble(double,double):int (FullOpts)
@@ -23,22 +23,19 @@ G_M37116_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 						;; size=16 bbWeight=1 PerfScore 9.00
 G_M37116_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             feq.d          a0, fa0, fa1
-            xori           a0, a0, 1
             sext.w         t6, a0
-            beqz           t6, G_M37116_IG06
-						;; size=16 bbWeight=1 PerfScore 7.50
+            bnez           t6, G_M37116_IG06
+						;; size=12 bbWeight=1 PerfScore 7.00
 G_M37116_IG03:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             flt.d          a0, fa0, fa1
-            xori           a0, a0, 1
             sext.w         t6, a0
-            beqz           t6, G_M37116_IG05
+            bnez           t6, G_M37116_IG05
             flt.d          a0, fa1, fa0
-            xori           a0, a0, 1
             sext.w         t6, a0
-            beqz           t6, G_M37116_IG04
+            bnez           t6, G_M37116_IG04
             addi           a0, zero, 0xD1FFAB1E
             j              G_M37116_IG07
-						;; size=40 bbWeight=0.50 PerfScore 8.50
+						;; size=32 bbWeight=0.50 PerfScore 8.00
 G_M37116_IG04:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             addi           a0, zero, 0xD1FFAB1E
             j              G_M37116_IG07
@@ -56,7 +53,7 @@ G_M37116_IG07:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             addi           sp, sp, 16
             ret						;; size=16 bbWeight=1 PerfScore 7.50
 
-; Total bytes of code 108, prolog size 16, PerfScore 34.75, instruction count 27, allocated bytes for code 108 (MethodHash=2f356f03) for method Microsoft.VisualBasic.CompilerServices.Operators:CompareDouble(double,double):int (FullOpts)
+; Total bytes of code 96, prolog size 16, PerfScore 33.75, instruction count 24, allocated bytes for code 96 (MethodHash=2f356f03) for method Microsoft.VisualBasic.CompilerServices.Operators:CompareDouble(double,double):int (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -67,7 +64,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 27 (0x0001b) Actual length = 108 (0x00006c)
+  Function Length   : 24 (0x00018) Actual length = 96 (0x000060)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-16 (-10.26%) : 47378.dasm - BringUpTest_JTrueGtFP:JTrueGtFP(float):int (FullOpts)
@@ -25,31 +25,27 @@ G_M56740_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             lui            a1, 0xD1FFAB1E
             fmv.w.x        ft4, a1
             flt.s          a1, ft4, fa0
-            xori           a1, a1, 1
             sext.w         t6, a1
-            beqz           t6, G_M56740_IG06
-						;; size=28 bbWeight=1 PerfScore 9.50
+            bnez           t6, G_M56740_IG06
+						;; size=24 bbWeight=1 PerfScore 9.00
 G_M56740_IG03:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             fmv.w.x        ft4, zero
             flt.s          a1, ft4, fa0
-            xori           a1, a1, 1
             sext.w         t6, a1
-            beqz           t6, G_M56740_IG05
+            bnez           t6, G_M56740_IG05
             lui            a1, 0xD1FFAB1E
             fmv.w.x        ft4, a1
             flt.s          a1, ft4, fa0
-            xori           a1, a1, 1
             sext.w         t6, a1
-            beqz           t6, G_M56740_IG04
+            bnez           t6, G_M56740_IG04
             auipc          t6, 0xD1FFAB1E
             flw            ft4, 0xD1FFAB1E(t6)
             flt.s          a1, ft4, fa0
-            xori           a1, a1, 1
             sext.w         t6, a1
-            bnez           t6, G_M56740_IG07
+            beqz           t6, G_M56740_IG07
             addi           a0, zero, 0xD1FFAB1E
             j              G_M56740_IG07
-						;; size=76 bbWeight=0.50 PerfScore 14.50
+						;; size=64 bbWeight=0.50 PerfScore 13.75
 G_M56740_IG04:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             addi           a0, zero, 0xD1FFAB1E
             j              G_M56740_IG07
@@ -69,7 +65,7 @@ G_M56740_IG07:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 RWD00  	dd	FF7FFFFFh		; -3.40282e+38
 
 
-; Total bytes of code 156, prolog size 16, PerfScore 42.75, instruction count 38, allocated bytes for code 156 (MethodHash=4c11225b) for method BringUpTest_JTrueGtFP:JTrueGtFP(float):int (FullOpts)
+; Total bytes of code 140, prolog size 16, PerfScore 41.50, instruction count 34, allocated bytes for code 140 (MethodHash=4c11225b) for method BringUpTest_JTrueGtFP:JTrueGtFP(float):int (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -80,7 +76,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 39 (0x00027) Actual length = 156 (0x00009c)
+  Function Length   : 35 (0x00023) Actual length = 140 (0x00008c)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+0 (0.00%) : 94768.dasm - Microsoft.Diagnostics.Tracing.ZippedETLWriter:PrepForWrite():System.Collections.Generic.List`1[System.String]:this (FullOpts)

No diffs found?

+0 (0.00%) : 94736.dasm - Microsoft.Diagnostics.Tracing.TraceEventNativeMethods:IsElevated():System.Nullable`1[ubyte] (FullOpts)

No diffs found?

+0 (0.00%) : 94480.dasm - Microsoft.Diagnostics.Tracing.EventPipeEventMetaDataHeader:.ctor(FastSerialization.PinnedStreamReader,int,int,int,int,int,System.String):this (FullOpts)

No diffs found?

Details

Size improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same size Improvements (bytes) Regressions (bytes)
linux.riscv64.Checked.mch 11,769 698 0 11,071 -23,124 +0

PerfScore improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same PerfScore Improvements (PerfScore) Regressions (PerfScore) PerfScore Overall in FullOpts
linux.riscv64.Checked.mch 11,769 684 0 11,085 -0.48% 0.00% -0.0076%

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
linux.riscv64.Checked.mch 94,874 52,265 42,609 3 (0.00%) 3 (0.00%)

jit-analyze output

@tomeksowi tomeksowi marked this pull request as ready for review May 26, 2025 07:28
@risc-vv
Copy link

risc-vv commented May 27, 2025

d609565 is being scheduled for building and testing

GIT: d6095657b1b72c093baf6b57a8b6f05a47ec5f56
REPO: dotnet/runtime
BRANCH: main

@JulieLeeMSFT JulieLeeMSFT requested a review from jakobbotsch June 23, 2025 15:57
@JulieLeeMSFT
Copy link
Member

@jakobbotsch, PTAL.

@jakobbotsch jakobbotsch reopened this Jun 24, 2025
Comment on lines 3144 to 3147
if (isUnordered)
{
if (tree->OperIs(GT_LT))
{
emit->emitIns_R_R_R(cmpSize == EA_4BYTE ? INS_fle_s : INS_fle_d, cmpSize, targetReg, regOp2, regOp1);
}
else if (tree->OperIs(GT_LE))
{
emit->emitIns_R_R_R(cmpSize == EA_4BYTE ? INS_flt_s : INS_flt_d, cmpSize, targetReg, regOp2, regOp1);
}
else if (tree->OperIs(GT_NE))
{
emit->emitIns_R_R_R(cmpSize == EA_4BYTE ? INS_feq_s : INS_feq_d, cmpSize, targetReg, regOp1, regOp2);
}
else if (tree->OperIs(GT_GT))
{
emit->emitIns_R_R_R(cmpSize == EA_4BYTE ? INS_fle_s : INS_fle_d, cmpSize, targetReg, regOp1, regOp2);
}
else if (tree->OperIs(GT_GE))
{
emit->emitIns_R_R_R(cmpSize == EA_4BYTE ? INS_flt_s : INS_flt_d, cmpSize, targetReg, regOp1, regOp2);
}
else
{
unreached();
}
emit->emitIns_R_R_I(INS_xori, EA_8BYTE, targetReg, targetReg, 1);
oper = GenTree::ReverseRelop(oper);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar swapping is done in a platform agnostic way in TryLowerConditionToFlagsNode based on GenCondition::PreferSwap. Can RISC-V use the same mechanism?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TryLowerConditionToFlagsNode is not called on RISC-V, it looks more geared for architectures with a special flag register for conditions. Floating comparisons on RISC-V are written to a normal GP register.

Maybe I can swap it in LowerCompare in a separate PR? Some of the stuff from #115039 also looks suitable for moving from codegen to lowering (there's similar swapping and negating going on when the integer comparison is not fed into a branch), which would simplify register allocation, containing immediates, and also sign-extension elimination in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, makes sense.

@jakobbotsch jakobbotsch merged commit a0397e7 into dotnet:main Jun 25, 2025
115 of 117 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-riscv Related to the RISC-V architecture area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants