-
Notifications
You must be signed in to change notification settings - Fork 5.1k
[RISC-V] Reverse unordered FP comparisons in branches #115943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISC-V] Reverse unordered FP comparisons in branches #115943
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
No regressions. Diffs are based on 94,874 contexts (52,265 MinOpts, 42,609 FullOpts). MISSED contexts: 3 (0.00%) Overall (-23,124 bytes)
MinOpts (-604 bytes)
FullOpts (-22,520 bytes)
Example diffslinux.riscv64.Checked.mch-12 (-11.11%) : 86422.dasm - Microsoft.VisualBasic.CompilerServices.Operators:CompareSingle(float,float):int (FullOpts)@@ -23,22 +23,19 @@ G_M44467_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
;; size=16 bbWeight=1 PerfScore 9.00
G_M44467_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
feq.s a0, fa0, fa1
- xori a0, a0, 1
sext.w t6, a0
- beqz t6, G_M44467_IG06
- ;; size=16 bbWeight=1 PerfScore 7.50
+ bnez t6, G_M44467_IG06
+ ;; size=12 bbWeight=1 PerfScore 7.00
G_M44467_IG03: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
flt.s a0, fa0, fa1
- xori a0, a0, 1
sext.w t6, a0
- beqz t6, G_M44467_IG05
+ bnez t6, G_M44467_IG05
flt.s a0, fa1, fa0
- xori a0, a0, 1
sext.w t6, a0
- beqz t6, G_M44467_IG04
+ bnez t6, G_M44467_IG04
addi a0, zero, 0xD1FFAB1E
j G_M44467_IG07
- ;; size=40 bbWeight=0.50 PerfScore 8.50
+ ;; size=32 bbWeight=0.50 PerfScore 8.00
G_M44467_IG04: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
addi a0, zero, 0xD1FFAB1E
j G_M44467_IG07
@@ -56,7 +53,7 @@ G_M44467_IG07: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
addi sp, sp, 16
ret ;; size=16 bbWeight=1 PerfScore 7.50
-; Total bytes of code 108, prolog size 16, PerfScore 34.75, instruction count 27, allocated bytes for code 108 (MethodHash=6af2524c) for method Microsoft.VisualBasic.CompilerServices.Operators:CompareSingle(float,float):int (FullOpts)
+; Total bytes of code 96, prolog size 16, PerfScore 33.75, instruction count 24, allocated bytes for code 96 (MethodHash=6af2524c) for method Microsoft.VisualBasic.CompilerServices.Operators:CompareSingle(float,float):int (FullOpts)
; ============================================================
Unwind Info:
@@ -67,7 +64,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 27 (0x0001b) Actual length = 108 (0x00006c)
+ Function Length : 24 (0x00018) Actual length = 96 (0x000060)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -12 (-11.11%) : 86423.dasm - Microsoft.VisualBasic.CompilerServices.Operators:CompareDouble(double,double):int (FullOpts)@@ -23,22 +23,19 @@ G_M37116_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
;; size=16 bbWeight=1 PerfScore 9.00
G_M37116_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
feq.d a0, fa0, fa1
- xori a0, a0, 1
sext.w t6, a0
- beqz t6, G_M37116_IG06
- ;; size=16 bbWeight=1 PerfScore 7.50
+ bnez t6, G_M37116_IG06
+ ;; size=12 bbWeight=1 PerfScore 7.00
G_M37116_IG03: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
flt.d a0, fa0, fa1
- xori a0, a0, 1
sext.w t6, a0
- beqz t6, G_M37116_IG05
+ bnez t6, G_M37116_IG05
flt.d a0, fa1, fa0
- xori a0, a0, 1
sext.w t6, a0
- beqz t6, G_M37116_IG04
+ bnez t6, G_M37116_IG04
addi a0, zero, 0xD1FFAB1E
j G_M37116_IG07
- ;; size=40 bbWeight=0.50 PerfScore 8.50
+ ;; size=32 bbWeight=0.50 PerfScore 8.00
G_M37116_IG04: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
addi a0, zero, 0xD1FFAB1E
j G_M37116_IG07
@@ -56,7 +53,7 @@ G_M37116_IG07: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
addi sp, sp, 16
ret ;; size=16 bbWeight=1 PerfScore 7.50
-; Total bytes of code 108, prolog size 16, PerfScore 34.75, instruction count 27, allocated bytes for code 108 (MethodHash=2f356f03) for method Microsoft.VisualBasic.CompilerServices.Operators:CompareDouble(double,double):int (FullOpts)
+; Total bytes of code 96, prolog size 16, PerfScore 33.75, instruction count 24, allocated bytes for code 96 (MethodHash=2f356f03) for method Microsoft.VisualBasic.CompilerServices.Operators:CompareDouble(double,double):int (FullOpts)
; ============================================================
Unwind Info:
@@ -67,7 +64,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 27 (0x0001b) Actual length = 108 (0x00006c)
+ Function Length : 24 (0x00018) Actual length = 96 (0x000060)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -16 (-10.26%) : 47378.dasm - BringUpTest_JTrueGtFP:JTrueGtFP(float):int (FullOpts)@@ -25,31 +25,27 @@ G_M56740_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
lui a1, 0xD1FFAB1E
fmv.w.x ft4, a1
flt.s a1, ft4, fa0
- xori a1, a1, 1
sext.w t6, a1
- beqz t6, G_M56740_IG06
- ;; size=28 bbWeight=1 PerfScore 9.50
+ bnez t6, G_M56740_IG06
+ ;; size=24 bbWeight=1 PerfScore 9.00
G_M56740_IG03: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
fmv.w.x ft4, zero
flt.s a1, ft4, fa0
- xori a1, a1, 1
sext.w t6, a1
- beqz t6, G_M56740_IG05
+ bnez t6, G_M56740_IG05
lui a1, 0xD1FFAB1E
fmv.w.x ft4, a1
flt.s a1, ft4, fa0
- xori a1, a1, 1
sext.w t6, a1
- beqz t6, G_M56740_IG04
+ bnez t6, G_M56740_IG04
auipc t6, 0xD1FFAB1E
flw ft4, 0xD1FFAB1E(t6)
flt.s a1, ft4, fa0
- xori a1, a1, 1
sext.w t6, a1
- bnez t6, G_M56740_IG07
+ beqz t6, G_M56740_IG07
addi a0, zero, 0xD1FFAB1E
j G_M56740_IG07
- ;; size=76 bbWeight=0.50 PerfScore 14.50
+ ;; size=64 bbWeight=0.50 PerfScore 13.75
G_M56740_IG04: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
addi a0, zero, 0xD1FFAB1E
j G_M56740_IG07
@@ -69,7 +65,7 @@ G_M56740_IG07: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
RWD00 dd FF7FFFFFh ; -3.40282e+38
-; Total bytes of code 156, prolog size 16, PerfScore 42.75, instruction count 38, allocated bytes for code 156 (MethodHash=4c11225b) for method BringUpTest_JTrueGtFP:JTrueGtFP(float):int (FullOpts)
+; Total bytes of code 140, prolog size 16, PerfScore 41.50, instruction count 34, allocated bytes for code 140 (MethodHash=4c11225b) for method BringUpTest_JTrueGtFP:JTrueGtFP(float):int (FullOpts)
; ============================================================
Unwind Info:
@@ -80,7 +76,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 39 (0x00027) Actual length = 156 (0x00009c)
+ Function Length : 35 (0x00023) Actual length = 140 (0x00008c)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) +0 (0.00%) : 94768.dasm - Microsoft.Diagnostics.Tracing.ZippedETLWriter:PrepForWrite():System.Collections.Generic.List`1[System.String]:this (FullOpts)No diffs found? +0 (0.00%) : 94736.dasm - Microsoft.Diagnostics.Tracing.TraceEventNativeMethods:IsElevated():System.Nullable`1[ubyte] (FullOpts)No diffs found? +0 (0.00%) : 94480.dasm - Microsoft.Diagnostics.Tracing.EventPipeEventMetaDataHeader:.ctor(FastSerialization.PinnedStreamReader,int,int,int,int,int,System.String):this (FullOpts)No diffs found? DetailsSize improvements/regressions per collection
PerfScore improvements/regressions per collection
Context information
jit-analyze output |
d609565 is being scheduled for building and testingGIT: |
@jakobbotsch, PTAL. |
if (isUnordered) | ||
{ | ||
if (tree->OperIs(GT_LT)) | ||
{ | ||
emit->emitIns_R_R_R(cmpSize == EA_4BYTE ? INS_fle_s : INS_fle_d, cmpSize, targetReg, regOp2, regOp1); | ||
} | ||
else if (tree->OperIs(GT_LE)) | ||
{ | ||
emit->emitIns_R_R_R(cmpSize == EA_4BYTE ? INS_flt_s : INS_flt_d, cmpSize, targetReg, regOp2, regOp1); | ||
} | ||
else if (tree->OperIs(GT_NE)) | ||
{ | ||
emit->emitIns_R_R_R(cmpSize == EA_4BYTE ? INS_feq_s : INS_feq_d, cmpSize, targetReg, regOp1, regOp2); | ||
} | ||
else if (tree->OperIs(GT_GT)) | ||
{ | ||
emit->emitIns_R_R_R(cmpSize == EA_4BYTE ? INS_fle_s : INS_fle_d, cmpSize, targetReg, regOp1, regOp2); | ||
} | ||
else if (tree->OperIs(GT_GE)) | ||
{ | ||
emit->emitIns_R_R_R(cmpSize == EA_4BYTE ? INS_flt_s : INS_flt_d, cmpSize, targetReg, regOp1, regOp2); | ||
} | ||
else | ||
{ | ||
unreached(); | ||
} | ||
emit->emitIns_R_R_I(INS_xori, EA_8BYTE, targetReg, targetReg, 1); | ||
oper = GenTree::ReverseRelop(oper); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar swapping is done in a platform agnostic way in TryLowerConditionToFlagsNode
based on GenCondition::PreferSwap
. Can RISC-V use the same mechanism?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TryLowerConditionToFlagsNode
is not called on RISC-V, it looks more geared for architectures with a special flag register for conditions. Floating comparisons on RISC-V are written to a normal GP register.
Maybe I can swap it in LowerCompare in a separate PR? Some of the stuff from #115039 also looks suitable for moving from codegen to lowering (there's similar swapping and negating going on when the integer comparison is not fed into a branch), which would simplify register allocation, containing immediates, and also sign-extension elimination in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, makes sense.
Unordered floating-point comparisons are achieved by neg'ing the ordered counterparts. Avoid that by reversing both the FP comparison and the zero-comparison fused with the branch.
Also, minor codegen haircut.
Part of #84834, cc @dotnet/samsung