-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark Regressions from Profile Maintenance and Block Layout Changes #113108
Comments
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
Starting from the top, the Currently, profile synthesis will set the weight of handler region entries to something close to zero, but it won't mark them as rarely run. We probably don't want to change this, because marking them as rarely run would block inlining within handlers, which matters for finally regions. If we keep handler regions from being marked as rarely run after I wasn't able to repro the regression with EgorBot, but I'll take a look locally. |
I'm able to repro the regression on my Windows AMD machine, but the difference is quite inconsistent. From run to run, the difference is either just a nanosecond, or over 2x. For example:
I'm failing to find any codegen diffs along the call stack of this benchmark, and the potential fix would likely have its own slew of regressions, so the return on pursuing these regressions likely aren't as high as others that are directly related to layout churn. I'll keep this one in mind for later... |
PerfLabTests.CastingPerf2.CastingPerf.ScalarValueTypeObj regressed and then later improved due to flowgraph/profile churn. Its current regressed status is from #112060, and is tracked in #112657. This is also true for System.Linq.Tests.Perf_Enumerable.SingleWithPredicate_FirstElementMatches(input: Array). I'll remove these from the above chart. |
System.Collections.Tests.Perf_BitArray.BitArrayNot(Size: 512) improved on both platforms once late profile repair was enabled. On Viper Windows, it regressed again around 2/27/2025. I can't find an issue tracking this regression, but nothing in the commit range is flowgraph-related. |
I've annotated the most-regressed benchmarks with culprit PRs based on when the last notable regression occurred.
I looked at a few of the regressions from #103450, and for these cases, 3-opt is keen to make bottom-tested loops top-tested. For example, here's the initial layout for
Both loops are bottom-tested, and we fall out of the inner loop into the outer loop. 3-opt transforms this into the following:
It's the same case for
For hot loops, 3-opt finds it massively profitable to create fallthrough along a loop backedge, since the entry edges are considerably lower in weight, but this gives a worse loop shape overall. I think we'd be justified in inhibiting the creation of fallthrough along loop backedges if it would make the loop top-tested. A targeted implementation would be to detect loop backedges in |
By restricting 3-opt from creating fallthrough along loop backedges, and restricting hot jump compaction from considering loop exit edges (#113101), layout is looking better. For
I'm sure these two changes will incur pretty big diffs, but I think they're focused enough that (hopefully) the effects will be predictable. Checking if an edge is a loop exit/backedge requires iterating up to every loop edge in the method, so we don't want to do this lookup every time in |
With #113101, some of the 4-opt regressions have layouts that are further away from a locally-minimal cost. For example, without the change, this is what the layout of
With it, instead of
I probably shouldn't be trying to artificially inhibit 3-opt's cost model for such cases, like I am in #113101. For other 4-opt regressions, we have loops with unconditional back edges, and there doesn't seem to be some obviously optimal rotation of the loop. For example, from
In this case, both loop heads |
Part of #113108. When converting an unconditional jump to a test block into a duplicate test block, fgOptimizeBranch flips the condition (unnecessarily -- I plan to fix this someday), which means we need to derive the new true edge's likelihood from the cloned false edge, and vice versa. Now that late profile repair is enabled, if we get this wrong, the flipped edge likelihoods will flatten loop weights, and inflate non-loop weights. I suspect fixing this will also revert some of the regressions from late profile repair.
Part of #107749. This is part of my broader goal of phasing out fgReorderBlocks entirely, but there are a few transformations we may want to do right before block layout. In particular, I noticed in #113108 (comment) that some regressed benchmarks have loops with unconditional backedges; in the absence of better loop inversion, we might want to run our unconditional-to-conditional branch duplication optimization right before layout to improve branch behavior in loops. We can put such optimizations in this pre-layout phase.
Part of #107749. I've collated all the regression reports assigned to me since #103450 was merged. For now, I'm excluding reports on Tiger machine configs to skip JCC errata-related regressions.
The text was updated successfully, but these errors were encountered: