Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Consolidate layout passes into one phase #112004

Merged
merged 27 commits into from
Mar 3, 2025

Conversation

amanasifkhalid
Copy link
Member

@amanasifkhalid amanasifkhalid commented Jan 30, 2025

Part of #107749. This replaces our RPO layout and cold code motion phases with a single loop-aware RPO computation (with cold blocks ignored) that we feed into 3-opt as its initial layout. This has the nice property of needing to reorder the block list only once, and never leaving it in a temporarily invalid state (i.e. with EH regions broken up).

My initial plan was for this PR to be zero-diff since I split out all the invariant changes (like not reordering cold/handler blocks at all) into separate PRs, but I found it necessary to implement moving try regions around to avoid regressing layout quality. Suppose we have some initial layout that looks like this:

[BB01 (hot)]
[BB02 (cold)]
[BB03 (hot, try entry)]
[BB04 (hot, try exit)]
[BB05 (hot)]

When reordering the block list, we want to only reorder within try regions to avoid breaking their contiguity. Suppose 3-opt wants the block list to look like BB01 -> BB03 -> BB04 -> BB05 -> BB02. If we only reorder within regions, we cannot place BB03 after BB01, so the layout remains as-is, with the cold block inline. And if BB05 were somewhere else in the initial layout, we wouldn't be able to move it after BB04 because they're in different regions.

We can be a bit clever and remember the last hot block in each region, so that EH region boundaries don't trip us up. However, this only enables better movement within regions, and nested regions end up sinking down the method. We end up with the following layout:

[BB01 (hot)]
[BB05 (hot)]
[BB02 (cold)]
[BB03 (hot, try entry)]
[BB04 (hot, try exit)]

The hot part of the main method body is compact, but the try region is interleaved with cold code. After adding support for try region movement, we can now move try regions up to their ideal successors, when doing so doesn't break EH nesting invariants:

[BB01 (hot)]
[BB03 (hot, try entry)]
[BB04 (hot, try exit)]
[BB05 (hot)]
[BB02 (cold)]

Thus, this change has some churn, but it looks mostly like a PerfScore win.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 30, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

amanasifkhalid added a commit that referenced this pull request Feb 6, 2025
Follow-up to #111989. Now that we only run one pass of 3-opt, we can remove some cruft needed to maintain state across 3-opt passes. This is a meek attempt to reduce the size of #112004 by separating out some of the no-diff changes.
@amanasifkhalid
Copy link
Member Author

/azp run runtime-coreclr libraries-jitstress, runtime-coreclr libraries-pgo

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@amanasifkhalid
Copy link
Member Author

/azp run runtime-coreclr libraries-pgo

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@amanasifkhalid
Copy link
Member Author

/azp run runtime-coreclr libraries-pgo

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@amanasifkhalid
Copy link
Member Author

/azp run runtime-coreclr libraries-jitstress, runtime-coreclr libraries-pgo

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@amanasifkhalid amanasifkhalid marked this pull request as ready for review February 21, 2025 16:13
@Copilot Copilot bot review requested due to automatic review settings February 21, 2025 16:13

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

if (!block->hasHndIndex() && (!block->isBBWeightCold(this) || block->IsFirst()))
{
// Set the block's ordinal.
block->bbPreorderNum = numHotBlocks;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to repurpose bbPreorderNum for the ordinal instead of bbPostorderNum, since we can modify the former during the loop-aware RPO computation without breaking it. I probably should've split this out into a separate PR, though I've created enough review burden with this work as-is.

//
void Compiler::fgRebuildEHRegions()
void Compiler::fgFindTryRegionEnds()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code isn't anything new. Since we no longer need to move cold EH blocks back in line after layout, we can bring back the EH repair logic I introduced in #108634. But now that we don't reorder handler regions, I've simplified this code to only reset try region entries in the main method body.

@@ -5345,7 +5232,8 @@ bool Compiler::ThreeOptLayout::CompactHotJumps()
// If we aren't sure which successor is hotter, and we already fall into one of them,
// do nothing.
BasicBlock* const unlikelyTarget = unlikelyEdge->getDestinationBlock();
if ((unlikelyEdge->getLikelihood() == 0.5) && (unlikelyTarget->bbPostorderNum == (i + 1)))
if ((unlikelyEdge->getLikelihood() == 0.5) && isCandidateBlock(unlikelyTarget) &&
(unlikelyTarget->bbPreorderNum == (i + 1)))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to check if the unlikely target is in the candidate span of blocks before checking its ordinal. This isn't a correctness issue per se, but switching ordinals from bbPostorderNum to bbPreorderNum created churn here.

// If we moved this region within another region, recompute the try region end blocks.
if (parentIndex != EHblkDsc::NO_ENCLOSING_INDEX)
{
compiler->fgFindTryRegionEnds();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can probably do something cheaper than potentially iterating the whole main method body here, though this path seems cold enough to not noticeably affect TP diffs. I'll try something in a follow-up and see if it's worth it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this isn't costly I would be tempted to leave it as is, unless you think handling extra constraints will be easy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless you think handling extra constraints will be easy.

I've tried a few different shapes, and none of them are elegant, so I'm hesitant to complicate this any further. Since there's room to improve TP elsewhere, I'll leave this as is.

@amanasifkhalid
Copy link
Member Author

cc @dotnet/jit-contrib, @AndyAyersMS PTAL. Diffs look like a PerfScore improvement overall, except for in coreclr_tests and sometimes libraries_tests. We're getting a little bit of TP back, but I think there's further room for improvement by templating away some of the hot EH-specific checks. Thanks!

@amanasifkhalid
Copy link
Member Author

ping @AndyAyersMS

// If we moved this region within another region, recompute the try region end blocks.
if (parentIndex != EHblkDsc::NO_ENCLOSING_INDEX)
{
compiler->fgFindTryRegionEnds();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this isn't costly I would be tempted to leave it as is, unless you think handling extra constraints will be easy.

@amanasifkhalid amanasifkhalid merged commit 71772d1 into dotnet:main Mar 3, 2025
109 of 112 checks passed
@amanasifkhalid amanasifkhalid deleted the one-layout-phase branch March 3, 2025 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants