Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arm64: Add return cases into optimise bools #112945

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

jonathandavies-arm
Copy link
Contributor

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 26, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Feb 26, 2025
@jonathandavies-arm
Copy link
Contributor Author

@a74nh @kunalspathak @dotnet/arm64-contrib

@@ -1031,8 +1034,8 @@ bool OptBoolsDsc::optOptimizeCompareChainCondBlock()
int op1Cost = cond1->GetCostEx();
int op2Cost = cond2->GetCostEx();
// The cost of combing three simple conditions is 32.
int maxOp1Cost = op1IsCondChain ? 31 : 7;
int maxOp2Cost = op2IsCondChain ? 31 : 7;
int maxOp1Cost = op1IsCondChain ? 31 : 10;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't increase the maxOpCost values then these changes won't really have much of an effect.
This is one of the tests I added.

        [MethodImpl(MethodImplOptions.NoInlining)]
        static bool CmpOptimizeBoolsReturn(int a, int lower, int upper)
        {
            //ARM64-FULL-LINE: cmp {{w[0-9]+}}, {{w[0-9]+}}
            //ARM64-FULL-LINE: ccmp {{w[0-9]+}}, {{w[0-9]+}}, nzc, ge
            //ARM64-FULL-LINE: cset {{x[0-9]+}}, le
            return a >= lower && a <= upper;
        }

and when it gets to optOptimizeCompareChainCondBlock cond2 is this and it has a opCost of 8.

N003 (  8,  5) [000008] -----+-----                         *  EQ        int    $101
N001 (  3,  2) [000006] -----+-----                         +--*  LCL_VAR   int    V01 arg1         u:1 (last use) $81
N002 (  1,  2) [000007] -----+-----                         \--*  CNS_INT   int    1 $41

I'm increasing these values so that these bool optimisations can also happen on return.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I originally added these costs, they were just best guesses to ensures we don't optimise for long chains. Tweaking it should be fine.

@@ -962,23 +962,26 @@ bool OptBoolsDsc::optOptimizeRangeTests()
//
bool OptBoolsDsc::optOptimizeCompareChainCondBlock()
{
assert((m_b1 != nullptr) && (m_b2 != nullptr) && (m_b3 == nullptr));
assert((m_b1 != nullptr) && (m_b2 != nullptr));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As pointed by @jakobbotsch in #96819 (comment), should we still consider doing this based on the PGO information?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"That would require us to have reliable PGO data at the point of "optimize bools" though, which we don't have yet"

  • I think that's still the current state of PGO. So, not much we can do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think our PGO is in a good shape now (certainly much better than it was back then). cc @amanasifkhalid @AndyAyersMS for thoughts on using PGO information during optOptimizeBools

Copy link
Member

@amanasifkhalid amanasifkhalid Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we ought to use it as a matter of principle, and identify cases where the PGO data fails us as motivation for introducing another profile repair call site somewhere midway through compilation. I don't think it's all that common for the profile data to be completely wrecked post-morph, and in the cases where it is, there is usually a bigger underlying problem we've yet to solve (i.e. method call counts for OSR methods), but I don't think that should block us from leveraging PGO data at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to use PGO in optimize bools, then it should be used in if conversion too, given they are fairly interlinked.

I'd like to suggest doing that in a separate PR, as I suspect it'll require some back and too in deciding what the exact cut off points are. Whereas this PR should only be causing diffs to happen in return cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I view if-conversion and this optimization as orthogonal optimizations: converting short circuiting conditionals into bitwise ops can be beneficial even on platforms that do not benefit from if conversion.

With that said, I am ok with not switching to utilize PGO information in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an issue here: #113080

(including if conversion, to save opening two issues)

@kunalspathak
Copy link
Member

@jonathandavies-arm - there are bunch of failures in CI. Do you mind taking a look?

}
else
{
return false;
GenTree* root = m_b3->firstStmt()->GetRootNode();
if (root->OperIsSimple())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this trying to test for? Why is this logic only necessary for unops/binops, and not any other node?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_b3 is non null when block 2 first node is a return. In this case block 3 is also a return node. If block 3 returns a 0 then first condition (found in block 1) returns 0 when its condition is true. When we OR the 2 conditions together and NE them it will produce an incorrect result. So when block 3 returns 0 we need to inverse the first condition.

If we look at this function we can see what happens without this change.

[MethodImpl(MethodImplOptions.NoInlining)]
private static bool GtAndGt(int x, int y)
{
    return (x > 1 && y > 1);
}

It will have these blocks when it enters the optimize bools phase

Before PHASE Optimize bools

------------ BB01 [0000] [000..004) -> BB03(0.5),BB02(0.5) (cond), preds={} succs={BB02,BB03}

***** BB01 [0000]
STMT00000 ( 0x000[E-] ... 0x002 )
N004 (  5,  6) [000003] -----+-----                         *  JTRUE     void   $VN.Void
N003 (  3,  4) [000002] J----+-N---                         \--*  LE        int    $100
N001 (  1,  1) [000000] -----+-----                            +--*  LCL_VAR   int    V00 arg0         u:1 (last use) $80
N002 (  1,  2) [000001] -----+-----                            \--*  CNS_INT   int    1 $41


------------ BB02 [0001] [004..009) (return), preds={BB01} succs={}

***** BB02 [0001]
STMT00002 ( 0x004[E-] ... 0x008 )
N004 (  9,  6) [000009] -----+-----                         *  RETURN    int    $VN.Void
N003 (  8,  5) [000008] -----+-----                         \--*  GT        int    $101
N001 (  3,  2) [000006] -----+-----                            +--*  LCL_VAR   int    V01 arg1         u:1 (last use) $81
N002 (  1,  2) [000007] -----+-----                            \--*  CNS_INT   int    1 $41


------------ BB03 [0002] [009..00B) (return), preds={BB01} succs={}

***** BB03 [0002]
STMT00001 ( 0x009[E-] ... 0x00A )
N002 (  2,  3) [000005] -----+-----                         *  RETURN    int    $VN.Void
N001 (  1,  2) [000010] -----+-----                         \--*  CNS_INT   int    0 $40

After PHASE Optimize bools

------------ BB01 [0000] [000..009) (return), preds={} succs={}

This shows the block after optimize bools without this else setting reverseFirstCondition.

***** BB01 [0000]
STMT00002 ( 0x004[E-] ... 0x008 )
N010 ( 21, 14) [000009] -----+-----                         *  RETURN    int    $VN.Void
N009 ( 20, 13) [000013] -----------                         \--*  NE        int   
N007 ( 15, 10) [000011] -------N---                            +--*  OR        int   
N003 (  6,  4) [000002] -----+-N---                            |  +--*  LE        int    $100
N001 (  1,  1) [000000] -----+-----                            |  |  +--*  LCL_VAR   int    V00 arg0         u:1 (last use) $80
N002 (  1,  2) [000001] -----+-----                            |  |  \--*  CNS_INT   int    1 $41
N006 (  8,  5) [000008] -----+-----                            |  \--*  GT        int    $101
N004 (  3,  2) [000006] -----+-----                            |     +--*  LCL_VAR   int    V01 arg1         u:1 (last use) $81
N005 (  1,  2) [000007] -----+-----                            |     \--*  CNS_INT   int    1 $41
N008 (  1,  2) [000012] -----------                            \--*  CNS_INT   int    0

And this show it with the else setting reverseFirstCondition

------------ BB01 [0000] [000..009) (return), preds={} succs={}

***** BB01 [0000]
STMT00002 ( 0x004[E-] ... 0x008 )
N010 ( 21, 14) [000009] -----+-----                         *  RETURN    int    $VN.Void
N009 ( 20, 13) [000013] -----------                         \--*  NE        int   
N007 ( 15, 10) [000011] -------N---                            +--*  AND       int   
N003 (  6,  4) [000002] -----+-N---                            |  +--*  GT        int   
N001 (  1,  1) [000000] -----+-----                            |  |  +--*  LCL_VAR   int    V00 arg0         u:1 (last use) $80
N002 (  1,  2) [000001] -----+-----                            |  |  \--*  CNS_INT   int    1 $41
N006 (  8,  5) [000008] -----+-----                            |  \--*  GT        int    $101
N004 (  3,  2) [000006] -----+-----                            |     +--*  LCL_VAR   int    V01 arg1         u:1 (last use) $81
N005 (  1,  2) [000007] -----+-----                            |     \--*  CNS_INT   int    1 $41
N008 (  1,  2) [000012] -----------                            \--*  CNS_INT   int    0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I suggest replacing this root->OperIsSimple() by assert(root->OperIs(GT_RETURN)).

Also, I have noticed that the code is not set up to handle Swift returns correctly. This is preexisting, but can you please fix this while you are here? I do not think these optimizations should run for Swift returns since they do not validate anything about the error return.
The fix is just to remove GT_SIWFT_ERROR_RET in these lines:

if (!testTree2->OperIs(GT_RETURN, GT_SWIFT_ERROR_RET))
{
return nullptr;
}
Statement* s3 = m_b3->firstStmt();
if (s3->GetPrevStmt() != s3)
{
return nullptr;
}
GenTree* testTree3 = s3->GetRootNode();
if (!testTree3->OperIs(GT_RETURN, GT_SWIFT_ERROR_RET))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with changing this to an assert is that from running optboolsreturn.dll I see that we sometimes see a GT_SELECT_INV. Unfortunately this doesn't happen in a test and I can't see what code is reaching this code path.

Copy link
Member

@jakobbotsch jakobbotsch Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code needs to be moved to happen after optOptimizeBoolsChkBlkCond has been called, since that is validating that the shape of the IR allows the optimization to kick in. You can move the existing checks on the targets of the blocks as well.

GenTree* cond2 = m_testInfo2.testTree->gtGetOp1();

if (m_b3 != nullptr)
{
GenTree* root = m_b3->firstStmt()->GetRootNode();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
GenTree* root = m_b3->firstStmt()->GetRootNode();
assert(m_b1->FalseTargetIs(m_b2) && m_b1->TrueTargetIs(m_b3));
GenTree* root = m_b3->firstStmt()->GetRootNode();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JIT: simple range check is not branchless
6 participants