JIT: relax another check in conditional escape analysis #117295

AndyAyersMS · 2025-07-04T00:16:48Z

Proceed with CEA cloning even when the allocation site dominates the assignment to the enumerator var. The cloning will be a bit wasteful as the original code will become unreachable, but the code is set up to specialize the clone and so this is the less risky fix.

Also restore recognition of IEnumerable<T>.GetEnumerator as this gives a useful inlining boost in some cases.

Fixes #117204 cases that were not fixed by #117222.

Proceed with CEA cloning even when the allocation site dominates the assignment to the enumerator var. The cloning will be a bit wasteful as the original code will become unreachable, but the code is set up to specialize the clone and so this is the less risky fix. Also restore recognition of `IEnumerable<T>.GetEnumerator` as this gives a useful inlining boost in some cases. Fixes dotnet#117204 cases that were not fixed by dotnet#117222.

dotnet-policy-service · 2025-07-04T00:17:38Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copilot

Pull Request Overview

This PR relaxes a dominance check in the JIT’s conditional escape analysis so that cloning proceeds even when the allocation site dominates the enumerator assignment (the slow path becomes unreachable and is cleaned up later), and restores recognition of IEnumerable<T>.GetEnumerator as a named intrinsic to help inlining.

Allow cloning in ObjectAllocator::CheckCanClone despite dominance, updating comments and the diagnostic dump.
Add a new NI_System_Collections_Generic_IEnumerable_GetEnumerator intrinsic and wire it up in the importer.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
objectalloc.cpp	Removed the early `return false` on dominance, updated comments and `JITDUMP` text to indicate cloning happens anyway.
namedintrinsiclist.h	Introduced `NI_System_Collections_Generic_IEnumerable_GetEnumerator` and retitled the intrinsics comment block.
importercalls.cpp	Added a `lookupNamedIntrinsic` branch for `IEnumerable\`1.GetEnumerator`.

Comments suppressed due to low confidence (2)

src/coreclr/jit/objectalloc.cpp:3848

Add a unit test that exercises the case where the allocation site dominates the def block to verify the clone path is specialized correctly and the original path is removed.

    // For now we're going to just go ahead and clone, despite the

src/coreclr/jit/importercalls.cpp:10423

Consider adding a JIT or runtime test that verifies IEnumerable<T>.GetEnumerator is recognized as an intrinsic and inlined as expected.

                    else if (strcmp(className, "IEnumerable`1") == 0)

src/coreclr/jit/objectalloc.cpp

AndyAyersMS · 2025-07-04T00:18:28Z

@dotnet/jit-contrib PTAL

Locally this clears up all the windows regressions except for System.Collections.IterateForEach<Int32>.ConcurrentBag(Size: 512), and there the codegen looks better so it may be JCC errata holding it back. Let's see what the perf lab thinks.

AndyAyersMS · 2025-07-04T01:20:40Z

Seems like one issue in ConcurrentBag is that the stack-allocated enumerator fields end up being marked DNER because they are EH live, so the inner loop is:

G_M13297_IG31:        ; offs=0x000210, size=0x001A, bbWeight=514.37, PerfScore 4757.97, gcrefRegs=0001 {rax}, byrefRegs=0000 {}, loop=IG31, BB117 [0129], byref, isz

IN008f: 000210 mov      edx, dword ptr [V49 rbp-0x2C]
IN0090: 000213 mov      r8d, dword ptr [V49 rbp-0x2C]
IN0091: 000217 inc      r8d
IN0092: 00021A mov      dword ptr [V49 rbp-0x2C], r8d
IN0093: 00021E mov      edx, dword ptr [rax+4*rdx+0x10]
IN0094: 000222 mov      dword ptr [V01 rbp-0x1C], edx
IN0095: 000225 cmp      ecx, dword ptr [V49 rbp-0x2C]
IN0096: 000228 jg       SHORT G_M13297_IG31

If I remove the "suppress explicit zero init" logic here, we get better code and perf.

G_M13297_IG31:        ; offs=0x000210, size=0x0015, bbWeight=512.29, PerfScore 2689.51, gcrefRegs=0001 {rax}, byrefRegs=0000 {}, loop=IG31, BB117 [0129], byref, isz

IN0090: 000210 lea      r8d, [rcx+0x01]
IN0091: 000214 mov      ecx, ecx
IN0092: 000216 mov      ecx, dword ptr [rax+4*rcx+0x10]
IN0093: 00021A mov      dword ptr [V01 rbp-0x1C], ecx
IN0094: 00021D cmp      edx, r8d
IN0095: 000220 mov      ecx, r8d
IN0096: 000223 jg       SHORT G_M13297_IG31

we still have the annoying intertwined variables (common for enumerators) but this at least avoids a lot of memory traffic.

I will likely look at this as a follow-on PR, seems like in general we either might want to always do explicit zero init of stack allocations or at least do them for these enumerator cases.

For reference the "baseline" inner loop (going back to before the GDV cleanup) is

G_M13297_IG31:        ; offs=0x000224, size=0x001A, bbWeight=521.83, PerfScore 6131.48, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {}, BB107 [0116], byref, isz

IN0093: 000224 lea      edx, [rax+0x01]
IN0094: 000227 mov      dword ptr [rsi+0x14], edx
IN0095: 00022A cmp      eax, dword ptr [rcx+0x08]
IN0096: 00022D jae      SHORT G_M13297_IG37
IN0097: 00022F mov      eax, eax
IN0098: 000231 mov      eax, dword ptr [rcx+4*rax+0x10]
IN0099: 000235 mov      dword ptr [rsi+0x10], eax
IN009a: 000238 mov      eax, dword ptr [rsi+0x10]
IN009b: 00023B mov      dword ptr [V01 rbp-0x1C], eax

G_M13297_IG32:        ; offs=0x00023E, size=0x000C, bbWeight=522.83, PerfScore 4182.62, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, loop=IG31, BB02 [0001], byref, isz

IN009c: 00023E mov      eax, dword ptr [rsi+0x14]
IN009d: 000241 mov      rcx, gword ptr [rsi+0x08]
IN009e: 000245 cmp      eax, dword ptr [rcx+0x08]
IN009f: 000248 jl       SHORT G_M13297_IG31

Where the enumerator is on the heap. Somehow this performs better than the top-most version (which is what we have after this PR) on my local box.

AndyAyersMS · 2025-07-06T16:07:49Z

@dotnet/jit-contrib ping

Copilot AI review requested due to automatic review settings July 4, 2025 00:16

github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 4, 2025

dotnet-policy-service bot assigned AndyAyersMS Jul 4, 2025

Copilot AI reviewed Jul 4, 2025

View reviewed changes

src/coreclr/jit/objectalloc.cpp Show resolved Hide resolved

Merge branch 'main' into Fix117204Part2

2220699

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

JIT: relax another check in conditional escape analysis #117295

JIT: relax another check in conditional escape analysis #117295

AndyAyersMS commented Jul 4, 2025

Uh oh!

dotnet-policy-service bot commented Jul 4, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

AndyAyersMS commented Jul 4, 2025

Uh oh!

AndyAyersMS commented Jul 4, 2025

Uh oh!

AndyAyersMS commented Jul 6, 2025

Uh oh!

Uh oh!

JIT: relax another check in conditional escape analysis #117295

Are you sure you want to change the base?

JIT: relax another check in conditional escape analysis #117295

Conversation

AndyAyersMS commented Jul 4, 2025

Uh oh!

dotnet-policy-service bot commented Jul 4, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

AndyAyersMS commented Jul 4, 2025

Uh oh!

AndyAyersMS commented Jul 4, 2025

Uh oh!

AndyAyersMS commented Jul 6, 2025

Uh oh!

Uh oh!