-
Notifications
You must be signed in to change notification settings - Fork 5.1k
JIT: relax another check in conditional escape analysis #117295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Proceed with CEA cloning even when the allocation site dominates the assignment to the enumerator var. The cloning will be a bit wasteful as the original code will become unreachable, but the code is set up to specialize the clone and so this is the less risky fix. Also restore recognition of `IEnumerable<T>.GetEnumerator` as this gives a useful inlining boost in some cases. Fixes dotnet#117204 cases that were not fixed by dotnet#117222.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR relaxes a dominance check in the JIT’s conditional escape analysis so that cloning proceeds even when the allocation site dominates the enumerator assignment (the slow path becomes unreachable and is cleaned up later), and restores recognition of IEnumerable<T>.GetEnumerator
as a named intrinsic to help inlining.
- Allow cloning in
ObjectAllocator::CheckCanClone
despite dominance, updating comments and the diagnostic dump. - Add a new
NI_System_Collections_Generic_IEnumerable_GetEnumerator
intrinsic and wire it up in the importer.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
objectalloc.cpp | Removed the early return false on dominance, updated comments and JITDUMP text to indicate cloning happens anyway. |
namedintrinsiclist.h | Introduced NI_System_Collections_Generic_IEnumerable_GetEnumerator and retitled the intrinsics comment block. |
importercalls.cpp | Added a lookupNamedIntrinsic branch for IEnumerable\ 1.GetEnumerator`. |
Comments suppressed due to low confidence (2)
src/coreclr/jit/objectalloc.cpp:3848
- Add a unit test that exercises the case where the allocation site dominates the def block to verify the clone path is specialized correctly and the original path is removed.
// For now we're going to just go ahead and clone, despite the
src/coreclr/jit/importercalls.cpp:10423
- Consider adding a JIT or runtime test that verifies
IEnumerable<T>.GetEnumerator
is recognized as an intrinsic and inlined as expected.
else if (strcmp(className, "IEnumerable`1") == 0)
@dotnet/jit-contrib PTAL Locally this clears up all the windows regressions except for |
Seems like one issue in G_M13297_IG31: ; offs=0x000210, size=0x001A, bbWeight=514.37, PerfScore 4757.97, gcrefRegs=0001 {rax}, byrefRegs=0000 {}, loop=IG31, BB117 [0129], byref, isz
IN008f: 000210 mov edx, dword ptr [V49 rbp-0x2C]
IN0090: 000213 mov r8d, dword ptr [V49 rbp-0x2C]
IN0091: 000217 inc r8d
IN0092: 00021A mov dword ptr [V49 rbp-0x2C], r8d
IN0093: 00021E mov edx, dword ptr [rax+4*rdx+0x10]
IN0094: 000222 mov dword ptr [V01 rbp-0x1C], edx
IN0095: 000225 cmp ecx, dword ptr [V49 rbp-0x2C]
IN0096: 000228 jg SHORT G_M13297_IG31 If I remove the "suppress explicit zero init" logic here, we get better code and perf. G_M13297_IG31: ; offs=0x000210, size=0x0015, bbWeight=512.29, PerfScore 2689.51, gcrefRegs=0001 {rax}, byrefRegs=0000 {}, loop=IG31, BB117 [0129], byref, isz
IN0090: 000210 lea r8d, [rcx+0x01]
IN0091: 000214 mov ecx, ecx
IN0092: 000216 mov ecx, dword ptr [rax+4*rcx+0x10]
IN0093: 00021A mov dword ptr [V01 rbp-0x1C], ecx
IN0094: 00021D cmp edx, r8d
IN0095: 000220 mov ecx, r8d
IN0096: 000223 jg SHORT G_M13297_IG31 we still have the annoying intertwined variables (common for enumerators) but this at least avoids a lot of memory traffic. I will likely look at this as a follow-on PR, seems like in general we either might want to always do explicit zero init of stack allocations or at least do them for these enumerator cases. For reference the "baseline" inner loop (going back to before the GDV cleanup) is G_M13297_IG31: ; offs=0x000224, size=0x001A, bbWeight=521.83, PerfScore 6131.48, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {}, BB107 [0116], byref, isz
IN0093: 000224 lea edx, [rax+0x01]
IN0094: 000227 mov dword ptr [rsi+0x14], edx
IN0095: 00022A cmp eax, dword ptr [rcx+0x08]
IN0096: 00022D jae SHORT G_M13297_IG37
IN0097: 00022F mov eax, eax
IN0098: 000231 mov eax, dword ptr [rcx+4*rax+0x10]
IN0099: 000235 mov dword ptr [rsi+0x10], eax
IN009a: 000238 mov eax, dword ptr [rsi+0x10]
IN009b: 00023B mov dword ptr [V01 rbp-0x1C], eax
G_M13297_IG32: ; offs=0x00023E, size=0x000C, bbWeight=522.83, PerfScore 4182.62, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, loop=IG31, BB02 [0001], byref, isz
IN009c: 00023E mov eax, dword ptr [rsi+0x14]
IN009d: 000241 mov rcx, gword ptr [rsi+0x08]
IN009e: 000245 cmp eax, dword ptr [rcx+0x08]
IN009f: 000248 jl SHORT G_M13297_IG31 Where the enumerator is on the heap. Somehow this performs better than the top-most version (which is what we have after this PR) on my local box. |
@dotnet/jit-contrib ping |
Proceed with CEA cloning even when the allocation site dominates the assignment to the enumerator var. The cloning will be a bit wasteful as the original code will become unreachable, but the code is set up to specialize the clone and so this is the less risky fix.
Also restore recognition of
IEnumerable<T>.GetEnumerator
as this gives a useful inlining boost in some cases.Fixes #117204 cases that were not fixed by #117222.