-
Notifications
You must be signed in to change notification settings - Fork 5.1k
JIT: Don't clone or unroll cold loops #115744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Don't clone or unroll cold loops #115744
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR prevents unnecessary JIT work by skipping loop unrolling and cloning for cold loops, as indicated by the updated logic in both functions.
- Added a cold loop check in the loop unrolling function in optimizer.cpp.
- Added the same cold loop check in the loop cloning function in loopcloning.cpp.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
src/coreclr/jit/optimizer.cpp | Added early exit for unrolling of cold loops |
src/coreclr/jit/loopcloning.cpp | Added early exit for cloning of cold loops |
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
@dotnet/jit-contrib PTAL. Diffs show some size increases in cases where cloning previously removed a bounds check, and RBO later removed the second loop body; now, we leave the bounds check in-place. For PGO collections, there are few PerfScore diffs since we're targeting cold code, and we try to reestablish profile consistency right before loop opts. If we're ok with the behavior diverging, I'm ok with reenabling cloning/unrolling cold loops for non-PGO scenarios since we aren't as confident about what is actually cold. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need some sort of a knob to replace all cold blocks with some minimal weight so none of the opts give up on them (with isRunRarely) and check that on large benchmarks. I suspect we loose a bit of perf by giving up on what we think is cold, while in fact it is not (e.g. unreliable static pgo, lack of context-oriented dynamic PGO that bakes weights for the first use during the start)
But the general question is: should we turn off opts for cold parts if the TP wins aren't too big?
I think that would be interesting to try. I think the simplest way to quickly try this is to make the threshold
If the optimization is typically size-increasing, and we are confident the code is cold, I think we ought to disable it regardless of the TP wins. Without hot/cold splitting, this means being able to place more hot code on each memory page. With my current splitting prototype, it also means being able to fit more hot code in each code heap. |
Leaving this open until after the Preview 5 snap, fyi |
/ba-g build timeouts, and jit-format broken |
Follow-up to #115734 (comment).