JIT: Move loop inversion to after loop recognition #115850

amanasifkhalid · 2025-05-21T21:11:05Z

Prerequisite to #113709. I expect diffs to go both ways: In some cases, loop canonicalization unlocks pattern-based loop inversion, whereas in other cases, we now recognize fewer loops due to loop inversion no longer introducing new cycles pre-canonicalization.

dotnet-policy-service · 2025-05-21T21:11:50Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copilot

Pull Request Overview

This PR moves the loop inversion phase to after loop recognition, adds immediate block compaction/removal for newly altered test blocks, and triggers a DFS rebuild with fresh loop analysis when any loops were inverted.

Add single-predecessor block compaction/removal in optInvertWhileLoop
Recompute the DFS tree and re-run loop finding after any loop inversions
Relocate the PHASE_INVERT_LOOPS call in the compilation pipeline

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
optimizer.cpp	Inserted block compaction/removal and DFS invalidation
compiler.cpp	Moved the loop inversion phase to a later point in `compCompile`

Comments suppressed due to low confidence (1)

src/coreclr/jit/compiler.cpp:4668

Add targeted tests that verify the new phase ordering and ensure that both block compaction and removal occur as expected after loop inversion.

DoPhase(this, PHASE_INVERT_LOOPS, &Compiler::optInvertLoops);

src/coreclr/jit/optimizer.cpp

amanasifkhalid · 2025-05-22T21:57:35Z

The diffs will be hard to parse for this, so I'm looking more at metrics. Here are some metric diffs for aspnet on win-x64:

Base:

Loops found: 29252
Loops inverted: 10694
Loops cloned: 1885
Loops unrolled: 12
Loops IV widened: 3047
Widened IVs: 3047
Unused IVs removed: 4539
Loops downward counted: 1873
Loops strength reduced: 1702
RBO: 30708
Jump threadings: 9735

Diff:

Loops found: 29085 (-167)
Loops inverted: 9074 (-1620)
Loops cloned: 3596 (+1711)
Loops unrolled: 12
Loops IV widened: 2999 (-48)
Widened IVs: 2999 (-48)
Unused IVs removed: 4498 (-41)
Loops downward counted: 1863 (-10)
Loops strength reduced: 1693 (-9)
RBO: 33859 (+3151)
Jump threadings: 9774 (+39)

We can see from the metrics that we're inverting fewer loops overall, but there are plenty of cases where we invert new loops, thus unblocking other loop opts -- in particular, we're doing a lot more cloning. Fewer loops found overall is due to loop inversion no longer introducing new cycles before loop recognition runs.

PerfScore diffs are overwhelmingly negative in non-PGO collections. This might be heuristic-derived profile weights for cloned loops inflating PerfScores, and/or something else...

AndyAyersMS · 2025-05-27T15:08:38Z

Diffs

Assuming the diffs are largely cloning related, it appears that extra cloning is pretty costly. It is hard to know how much of it is really beneficial. I wish we had better heuristics.

amanasifkhalid · 2025-05-27T15:12:39Z

It is hard to know how much of it is really beneficial.

Right, because of this, I've decided to flip my ordering and enable graph-based loop inversion with the existing phase ordering. Locally, the diffs are slightly easier to triage. Once that's in, hopefully it'll be easier to triage the diffs on this PR and see if there's anything actionable.

amanasifkhalid · 2025-06-04T21:24:58Z

CI failures indicate we will need the fix in #113935 to proceed. @EgorBo are you able to revive that work?

EgorBo · 2025-06-06T12:17:35Z

CI failures indicate we will need the fix in #113935 to proceed. @EgorBo are you able to revive that work?

Ah, sure, let me revive it

amanasifkhalid · 2025-06-13T21:48:21Z

I'm removing fgRenumberBlocks while I'm here to avoid opening another PR, FYI.

amanasifkhalid · 2025-06-14T14:56:41Z

Diffs show yet another round of large size increases, though most of this seems to be driven by coreclr_tests. In particular, it looks like we're doing a lot more loop cloning in our HW intrinsics code:

Top method regressions (bytes):
        3902 (49.82 % of base) : 308137.dasm - CompareVectorWithZero:TestVector512Equality() (FullOpts)
        3902 (49.82 % of base) : 308162.dasm - CompareVectorWithZero:TestVector512Inequality() (FullOpts)
        3062 (107.21 % of base) : 323865.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_SubtractionInt64:RunBroadcastAndMaskingScenario():this (FullOpts)
        3062 (107.21 % of base) : 323985.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_SubtractionUInt64:RunBroadcastAndMaskingScenario():this (FullOpts)
        3058 (107.07 % of base) : 321271.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_AdditionInt64:RunBroadcastAndMaskingScenario():this (FullOpts)
        3058 (107.07 % of base) : 321391.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_AdditionUInt64:RunBroadcastAndMaskingScenario():this (FullOpts)
        3032 (93.18 % of base) : 321266.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_AdditionInt64:RunMaskingValueScenario():this (FullOpts)
        3032 (93.18 % of base) : 321386.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_AdditionUInt64:RunMaskingValueScenario():this (FullOpts)
        3028 (92.94 % of base) : 323860.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_SubtractionInt64:RunMaskingValueScenario():this (FullOpts)
        3028 (92.94 % of base) : 323980.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_SubtractionUInt64:RunMaskingValueScenario():this (FullOpts)
        2970 (100.00 % of base) : 321971.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_DivisionInt64:RunBroadcastAndMaskingScenario():this (FullOpts)
        2970 (100.00 % of base) : 322091.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_DivisionUInt64:RunBroadcastAndMaskingScenario():this (FullOpts)
        2970 (100.00 % of base) : 323084.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_MultiplyInt64:RunBroadcastAndMaskingScenario():this (FullOpts)
        2970 (100.00 % of base) : 323204.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_MultiplyUInt64:RunBroadcastAndMaskingScenario():this (FullOpts)
        2902 (103.35 % of base) : 324099.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorUnaryOpTest__op_UnaryNegationInt64:RunBroadcastAndMaskingScenario():this (FullOpts)
        2902 (103.35 % of base) : 324213.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorUnaryOpTest__op_UnaryNegationUInt64:RunBroadcastAndMaskingScenario():this (FullOpts)
        2888 (47.56 % of base) : 128125.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (Tier0-FullOpts)
        2872 (82.62 % of base) : 322086.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_DivisionUInt64:RunMaskingValueScenario():this (FullOpts)
        2872 (82.62 % of base) : 323079.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_MultiplyInt64:RunMaskingValueScenario():this (FullOpts)
        2872 (82.62 % of base) : 323199.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_MultiplyUInt64:RunMaskingValueScenario():this (FullOpts)

Top method improvements (bytes):
        -526 (-8.20 % of base) : 321525.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_BitwiseAndSByte:RunMaskingValueScenario():this (FullOpts)
        -526 (-8.20 % of base) : 321755.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_BitwiseOrSByte:RunMaskingValueScenario():this (FullOpts)
        -526 (-8.20 % of base) : 322385.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_ExclusiveOrSByte:RunMaskingValueScenario():this (FullOpts)
        -524 (-8.19 % of base) : 321410.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_BitwiseAndByte:RunMaskingValueScenario():this (FullOpts)
        -524 (-8.19 % of base) : 321640.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_BitwiseOrByte:RunMaskingValueScenario():this (FullOpts)
        -524 (-8.19 % of base) : 322270.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_ExclusiveOrByte:RunMaskingValueScenario():this (FullOpts)
        -516 (-8.95 % of base) : 321874.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_DivisionByte:RunMaskingZeroScenario():this (FullOpts)
        -516 (-8.93 % of base) : 321994.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_DivisionSByte:RunMaskingZeroScenario():this (FullOpts)
        -492 (-16.12 % of base) : 299397.dasm - SmallLoop1:TestEntryPoint():int (FullOpts)
        -492 (-16.12 % of base) : 19722.dasm - SmallLoop1:TestEntryPoint():int (Tier0-FullOpts)
        -438 (-7.39 % of base) : 321870.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_DivisionByte:RunMaskingValueScenario():this (FullOpts)
        -438 (-7.37 % of base) : 321990.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_DivisionSByte:RunMaskingValueScenario():this (FullOpts)
        -434 (-7.66 % of base) : 321174.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_AdditionByte:RunMaskingZeroScenario():this (FullOpts)
        -434 (-7.64 % of base) : 321294.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_AdditionSByte:RunMaskingZeroScenario():this (FullOpts)
        -434 (-7.64 % of base) : 322987.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_MultiplyByte:RunMaskingZeroScenario():this (FullOpts)
        -434 (-7.61 % of base) : 323107.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_MultiplySByte:RunMaskingZeroScenario():this (FullOpts)
        -432 (-7.31 % of base) : 321922.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_DivisionInt16:RunMaskingZeroScenario():this (FullOpts)
        -432 (-7.31 % of base) : 322042.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_DivisionUInt16:RunMaskingZeroScenario():this (FullOpts)
        -426 (-7.51 % of base) : 323768.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_SubtractionByte:RunMaskingZeroScenario():this (FullOpts)
        -426 (-7.49 % of base) : 323888.dasm - JIT.HardwareIntrinsics.General._Vector512_1.VectorBinaryOpTest__op_SubtractionSByte:RunMaskingZeroScenario():this (FullOpts)

Top method regressions (percentages):
          86 (358.33 % of base) : 30264.dasm - System.Runtime.Intrinsics.Vector64:Create(double):System.Runtime.Intrinsics.Vector64`1[double] (Instrumented Tier1)
          86 (358.33 % of base) : 39176.dasm - System.Runtime.Intrinsics.Vector64:Create(double):System.Runtime.Intrinsics.Vector64`1[double] (Tier1)
          86 (358.33 % of base) : 305146.dasm - System.Runtime.Intrinsics.Vector64:Create[double](double):System.Runtime.Intrinsics.Vector64`1[double] (FullOpts)
          86 (358.33 % of base) : 29956.dasm - System.Runtime.Intrinsics.Vector64:Create[double](double):System.Runtime.Intrinsics.Vector64`1[double] (Tier0-FullOpts)
         188 (348.15 % of base) : 65714.dasm - System.Runtime.Intrinsics.Vector64:Narrow(System.Runtime.Intrinsics.Vector64`1[long],System.Runtime.Intrinsics.Vector64`1[long]):System.Runtime.Intrinsics.Vector64`1[int] (Instrumented Tier1)
         188 (348.15 % of base) : 65777.dasm - System.Runtime.Intrinsics.Vector64:Narrow(System.Runtime.Intrinsics.Vector64`1[ulong],System.Runtime.Intrinsics.Vector64`1[ulong]):System.Runtime.Intrinsics.Vector64`1[uint] (Instrumented Tier1)
         146 (347.62 % of base) : 54822.dasm - System.Runtime.Intrinsics.Vector64`1[double]:op_Addition(System.Runtime.Intrinsics.Vector64`1[double],System.Runtime.Intrinsics.Vector64`1[double]):System.Runtime.Intrinsics.Vector64`1[double] (Tier1)
          40 (333.33 % of base) : 344367.dasm - SwitchTest:TestEntryPoint():int (FullOpts)
          40 (333.33 % of base) : 124230.dasm - SwitchTest:TestEntryPoint():int (Tier0-FullOpts)
          82 (292.86 % of base) : 30256.dasm - System.Runtime.Intrinsics.Vector64:Create(float):System.Runtime.Intrinsics.Vector64`1[float] (Instrumented Tier1)
          82 (292.86 % of base) : 39215.dasm - System.Runtime.Intrinsics.Vector64:Create(float):System.Runtime.Intrinsics.Vector64`1[float] (Tier1)
          82 (292.86 % of base) : 30262.dasm - System.Runtime.Intrinsics.Vector64:Create[float](float):System.Runtime.Intrinsics.Vector64`1[float] (Instrumented Tier1)
          82 (292.86 % of base) : 54956.dasm - System.Runtime.Intrinsics.Vector64:Create[float](float):System.Runtime.Intrinsics.Vector64`1[float] (Tier1)
         204 (291.43 % of base) : 65655.dasm - System.Runtime.Intrinsics.Vector64:Narrow(System.Runtime.Intrinsics.Vector64`1[double],System.Runtime.Intrinsics.Vector64`1[double]):System.Runtime.Intrinsics.Vector64`1[float] (Instrumented Tier1)
          68 (283.33 % of base) : 30252.dasm - System.Runtime.Intrinsics.Vector64:Create(int):System.Runtime.Intrinsics.Vector64`1[int] (Instrumented Tier1)
          68 (283.33 % of base) : 39238.dasm - System.Runtime.Intrinsics.Vector64:Create(uint):System.Runtime.Intrinsics.Vector64`1[uint] (Instrumented Tier1)
          68 (283.33 % of base) : 305145.dasm - System.Runtime.Intrinsics.Vector64:Create[int](int):System.Runtime.Intrinsics.Vector64`1[int] (FullOpts)
          68 (283.33 % of base) : 29952.dasm - System.Runtime.Intrinsics.Vector64:Create[int](int):System.Runtime.Intrinsics.Vector64`1[int] (Tier0-FullOpts)
          68 (283.33 % of base) : 55016.dasm - System.Runtime.Intrinsics.Vector64:Create[uint](uint):System.Runtime.Intrinsics.Vector64`1[uint] (Instrumented Tier1)
          68 (283.33 % of base) : 55288.dasm - System.Runtime.Intrinsics.Vector64:Create[uint](uint):System.Runtime.Intrinsics.Vector64`1[uint] (Tier1)

Diffs in our non-test collections, particularly the ones with Dynamic PGO enabled, are much less dramatic. Also, the TP improvement pays for #116017, which is nice. @AndyAyersMS are you ok with this going into Preview 6?

AndyAyersMS

Sure, let's take this.

amanasifkhalid · 2025-06-14T17:22:26Z

/ba-g unrelated wasm build failure, and a known issue

Move loop inversion to after loop recognition

a8f13aa

Copilot AI review requested due to automatic review settings May 21, 2025 21:11

github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 21, 2025

dotnet-policy-service bot assigned amanasifkhalid May 21, 2025

Copilot AI reviewed May 21, 2025

View reviewed changes

src/coreclr/jit/optimizer.cpp Show resolved Hide resolved

src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved

build-analysis bot mentioned this pull request May 22, 2025

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

3 tasks

amanasifkhalid mentioned this pull request May 22, 2025

JIT: Unprotect handler entry after finally removal #115907

Merged

Merge branch 'main' into move-loop-inversion

5107d7e

amanasifkhalid mentioned this pull request May 27, 2025

JIT: Graph-based loop inversion #116017

Merged

Merge from main

2d94818

AndyAyersMS approved these changes Jun 4, 2025

View reviewed changes

build-analysis bot mentioned this pull request Jun 4, 2025

[linux-x64] [mono-aot] Test Runtime_101731.TestConvertToInt64NativeSingle(3.4028235E+38) returns exit code 22 #112557

Open

Merge branch 'main' into move-loop-inversion

c951e44

build-analysis bot mentioned this pull request Jun 13, 2025

System.Data.Common.Tests Assert failure on Linx x64 CI test run #108070

Open

amanasifkhalid added 2 commits June 13, 2025 17:46

Fix profile consistency

d669fad

Remove fgRenumberBlocks

81ce3f6

build-analysis bot mentioned this pull request Jun 14, 2025

Test failure: System.Reflection.Metadata.ApplyUpdateTest.TestGenericAddStaticField #115318

Open

AndyAyersMS approved these changes Jun 14, 2025

View reviewed changes

amanasifkhalid merged commit b146d75 into dotnet:main Jun 14, 2025
106 of 109 checks passed

amanasifkhalid deleted the move-loop-inversion branch June 14, 2025 17:22

DrewScoggins mentioned this pull request Jun 17, 2025

[Perf] Windows/x64: 52 Regressions on 6/14/2025 5:22:46 PM +00:00 #116754

Open

amanasifkhalid mentioned this pull request Jun 18, 2025

[Perf] Windows/x64: 38 Regressions on 1/21/2025 8:48:11 PM +00:00 #111912

Closed

AndyAyersMS mentioned this pull request Jun 19, 2025

[Perf] Linux/arm64: 5 Improvements on 6/14/2025 9:46:56 PM +00:00 dotnet/perf-autofiling-issues#57934

Closed

amanasifkhalid mentioned this pull request Jun 19, 2025

[Perf] Linux/x64: 48 Regressions on 2/21/2025 4:40:21 PM +00:00 #112913

Closed

AndyAyersMS mentioned this pull request Jun 19, 2025

[Perf] Linux/arm64: 20 Regressions on 6/14/2025 9:46:56 PM +00:00 dotnet/perf-autofiling-issues#57910

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

JIT: Move loop inversion to after loop recognition #115850

JIT: Move loop inversion to after loop recognition #115850

Uh oh!

amanasifkhalid commented May 21, 2025

Uh oh!

dotnet-policy-service bot commented May 21, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

amanasifkhalid commented May 22, 2025

Uh oh!

AndyAyersMS commented May 27, 2025

Uh oh!

amanasifkhalid commented May 27, 2025 •

edited

Loading

Uh oh!

amanasifkhalid commented Jun 4, 2025

Uh oh!

EgorBo commented Jun 6, 2025

Uh oh!

amanasifkhalid commented Jun 13, 2025

Uh oh!

amanasifkhalid commented Jun 14, 2025

Uh oh!

AndyAyersMS left a comment

Uh oh!

amanasifkhalid commented Jun 14, 2025

Uh oh!

Uh oh!

Uh oh!

JIT: Move loop inversion to after loop recognition #115850

JIT: Move loop inversion to after loop recognition #115850

Uh oh!

Conversation

amanasifkhalid commented May 21, 2025

Uh oh!

dotnet-policy-service bot commented May 21, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

amanasifkhalid commented May 22, 2025

Uh oh!

AndyAyersMS commented May 27, 2025

Uh oh!

amanasifkhalid commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amanasifkhalid commented Jun 4, 2025

Uh oh!

EgorBo commented Jun 6, 2025

Uh oh!

amanasifkhalid commented Jun 13, 2025

Uh oh!

amanasifkhalid commented Jun 14, 2025

Uh oh!

AndyAyersMS left a comment

Choose a reason for hiding this comment

Uh oh!

amanasifkhalid commented Jun 14, 2025

Uh oh!

Uh oh!

Uh oh!

amanasifkhalid commented May 27, 2025 •

edited

Loading