-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
arm64: Add return cases into optimise bools #112945
base: main
Are you sure you want to change the base?
arm64: Add return cases into optimise bools #112945
Conversation
jonathandavies-arm
commented
Feb 26, 2025
- Fixes JIT: simple range check is not branchless #96819
@a74nh @kunalspathak @dotnet/arm64-contrib |
@@ -1031,8 +1034,8 @@ bool OptBoolsDsc::optOptimizeCompareChainCondBlock() | |||
int op1Cost = cond1->GetCostEx(); | |||
int op2Cost = cond2->GetCostEx(); | |||
// The cost of combing three simple conditions is 32. | |||
int maxOp1Cost = op1IsCondChain ? 31 : 7; | |||
int maxOp2Cost = op2IsCondChain ? 31 : 7; | |||
int maxOp1Cost = op1IsCondChain ? 31 : 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't increase the maxOpCost values then these changes won't really have much of an effect.
This is one of the tests I added.
[MethodImpl(MethodImplOptions.NoInlining)]
static bool CmpOptimizeBoolsReturn(int a, int lower, int upper)
{
//ARM64-FULL-LINE: cmp {{w[0-9]+}}, {{w[0-9]+}}
//ARM64-FULL-LINE: ccmp {{w[0-9]+}}, {{w[0-9]+}}, nzc, ge
//ARM64-FULL-LINE: cset {{x[0-9]+}}, le
return a >= lower && a <= upper;
}
and when it gets to optOptimizeCompareChainCondBlock
cond2 is this and it has a opCost of 8.
N003 ( 8, 5) [000008] -----+----- * EQ int $101
N001 ( 3, 2) [000006] -----+----- +--* LCL_VAR int V01 arg1 u:1 (last use) $81
N002 ( 1, 2) [000007] -----+----- \--* CNS_INT int 1 $41
I'm increasing these values so that these bool optimisations can also happen on return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I originally added these costs, they were just best guesses to ensures we don't optimise for long chains. Tweaking it should be fine.
@@ -962,23 +962,26 @@ bool OptBoolsDsc::optOptimizeRangeTests() | |||
// | |||
bool OptBoolsDsc::optOptimizeCompareChainCondBlock() | |||
{ | |||
assert((m_b1 != nullptr) && (m_b2 != nullptr) && (m_b3 == nullptr)); | |||
assert((m_b1 != nullptr) && (m_b2 != nullptr)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As pointed by @jakobbotsch in #96819 (comment), should we still consider doing this based on the PGO information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"That would require us to have reliable PGO data at the point of "optimize bools" though, which we don't have yet"
- I think that's still the current state of PGO. So, not much we can do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think our PGO is in a good shape now (certainly much better than it was back then). cc @amanasifkhalid @AndyAyersMS for thoughts on using PGO information during optOptimizeBools
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we ought to use it as a matter of principle, and identify cases where the PGO data fails us as motivation for introducing another profile repair call site somewhere midway through compilation. I don't think it's all that common for the profile data to be completely wrecked post-morph, and in the cases where it is, there is usually a bigger underlying problem we've yet to solve (i.e. method call counts for OSR methods), but I don't think that should block us from leveraging PGO data at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to use PGO in optimize bools
, then it should be used in if conversion
too, given they are fairly interlinked.
I'd like to suggest doing that in a separate PR, as I suspect it'll require some back and too in deciding what the exact cut off points are. Whereas this PR should only be causing diffs to happen in return cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I view if-conversion and this optimization as orthogonal optimizations: converting short circuiting conditionals into bitwise ops can be beneficial even on platforms that do not benefit from if conversion.
With that said, I am ok with not switching to utilize PGO information in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an issue here: #113080
(including if conversion, to save opening two issues)
@jonathandavies-arm - there are bunch of failures in CI. Do you mind taking a look? |
src/coreclr/jit/optimizebools.cpp
Outdated
} | ||
else | ||
{ | ||
return false; | ||
GenTree* root = m_b3->firstStmt()->GetRootNode(); | ||
if (root->OperIsSimple()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this trying to test for? Why is this logic only necessary for unops/binops, and not any other node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m_b3 is non null when block 2 first node is a return. In this case block 3 is also a return node. If block 3 returns a 0 then first condition (found in block 1) returns 0 when its condition is true. When we OR the 2 conditions together and NE them it will produce an incorrect result. So when block 3 returns 0 we need to inverse the first condition.
If we look at this function we can see what happens without this change.
[MethodImpl(MethodImplOptions.NoInlining)]
private static bool GtAndGt(int x, int y)
{
return (x > 1 && y > 1);
}
It will have these blocks when it enters the optimize bools phase
Before PHASE Optimize bools
------------ BB01 [0000] [000..004) -> BB03(0.5),BB02(0.5) (cond), preds={} succs={BB02,BB03}
***** BB01 [0000]
STMT00000 ( 0x000[E-] ... 0x002 )
N004 ( 5, 6) [000003] -----+----- * JTRUE void $VN.Void
N003 ( 3, 4) [000002] J----+-N--- \--* LE int $100
N001 ( 1, 1) [000000] -----+----- +--* LCL_VAR int V00 arg0 u:1 (last use) $80
N002 ( 1, 2) [000001] -----+----- \--* CNS_INT int 1 $41
------------ BB02 [0001] [004..009) (return), preds={BB01} succs={}
***** BB02 [0001]
STMT00002 ( 0x004[E-] ... 0x008 )
N004 ( 9, 6) [000009] -----+----- * RETURN int $VN.Void
N003 ( 8, 5) [000008] -----+----- \--* GT int $101
N001 ( 3, 2) [000006] -----+----- +--* LCL_VAR int V01 arg1 u:1 (last use) $81
N002 ( 1, 2) [000007] -----+----- \--* CNS_INT int 1 $41
------------ BB03 [0002] [009..00B) (return), preds={BB01} succs={}
***** BB03 [0002]
STMT00001 ( 0x009[E-] ... 0x00A )
N002 ( 2, 3) [000005] -----+----- * RETURN int $VN.Void
N001 ( 1, 2) [000010] -----+----- \--* CNS_INT int 0 $40
After PHASE Optimize bools
------------ BB01 [0000] [000..009) (return), preds={} succs={}
This shows the block after optimize bools without this else setting reverseFirstCondition.
***** BB01 [0000]
STMT00002 ( 0x004[E-] ... 0x008 )
N010 ( 21, 14) [000009] -----+----- * RETURN int $VN.Void
N009 ( 20, 13) [000013] ----------- \--* NE int
N007 ( 15, 10) [000011] -------N--- +--* OR int
N003 ( 6, 4) [000002] -----+-N--- | +--* LE int $100
N001 ( 1, 1) [000000] -----+----- | | +--* LCL_VAR int V00 arg0 u:1 (last use) $80
N002 ( 1, 2) [000001] -----+----- | | \--* CNS_INT int 1 $41
N006 ( 8, 5) [000008] -----+----- | \--* GT int $101
N004 ( 3, 2) [000006] -----+----- | +--* LCL_VAR int V01 arg1 u:1 (last use) $81
N005 ( 1, 2) [000007] -----+----- | \--* CNS_INT int 1 $41
N008 ( 1, 2) [000012] ----------- \--* CNS_INT int 0
And this show it with the else setting reverseFirstCondition
------------ BB01 [0000] [000..009) (return), preds={} succs={}
***** BB01 [0000]
STMT00002 ( 0x004[E-] ... 0x008 )
N010 ( 21, 14) [000009] -----+----- * RETURN int $VN.Void
N009 ( 20, 13) [000013] ----------- \--* NE int
N007 ( 15, 10) [000011] -------N--- +--* AND int
N003 ( 6, 4) [000002] -----+-N--- | +--* GT int
N001 ( 1, 1) [000000] -----+----- | | +--* LCL_VAR int V00 arg0 u:1 (last use) $80
N002 ( 1, 2) [000001] -----+----- | | \--* CNS_INT int 1 $41
N006 ( 8, 5) [000008] -----+----- | \--* GT int $101
N004 ( 3, 2) [000006] -----+----- | +--* LCL_VAR int V01 arg1 u:1 (last use) $81
N005 ( 1, 2) [000007] -----+----- | \--* CNS_INT int 1 $41
N008 ( 1, 2) [000012] ----------- \--* CNS_INT int 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I suggest replacing this root->OperIsSimple()
by assert(root->OperIs(GT_RETURN))
.
Also, I have noticed that the code is not set up to handle Swift returns correctly. This is preexisting, but can you please fix this while you are here? I do not think these optimizations should run for Swift returns since they do not validate anything about the error return.
The fix is just to remove GT_SIWFT_ERROR_RET
in these lines:
runtime/src/coreclr/jit/optimizebools.cpp
Lines 1160 to 1172 in 78142b0
if (!testTree2->OperIs(GT_RETURN, GT_SWIFT_ERROR_RET)) | |
{ | |
return nullptr; | |
} | |
Statement* s3 = m_b3->firstStmt(); | |
if (s3->GetPrevStmt() != s3) | |
{ | |
return nullptr; | |
} | |
GenTree* testTree3 = s3->GetRootNode(); | |
if (!testTree3->OperIs(GT_RETURN, GT_SWIFT_ERROR_RET)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with changing this to an assert is that from running optboolsreturn.dll
I see that we sometimes see a GT_SELECT_INV
. Unfortunately this doesn't happen in a test and I can't see what code is reaching this code path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code needs to be moved to happen after optOptimizeBoolsChkBlkCond
has been called, since that is validating that the shape of the IR allows the optimization to kick in. You can move the existing checks on the targets of the blocks as well.
…BoolsChkBlkCond()
GenTree* cond2 = m_testInfo2.testTree->gtGetOp1(); | ||
|
||
if (m_b3 != nullptr) | ||
{ | ||
GenTree* root = m_b3->firstStmt()->GetRootNode(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GenTree* root = m_b3->firstStmt()->GetRootNode(); | |
assert(m_b1->FalseTargetIs(m_b2) && m_b1->TrueTargetIs(m_b3)); | |
GenTree* root = m_b3->firstStmt()->GetRootNode(); |