Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: enhance escape analysis to understand local struct fields #113141

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

AndyAyersMS
Copy link
Member

@AndyAyersMS AndyAyersMS commented Mar 4, 2025

Implement a very simplistic "field sensitive" analysis for gc structs where we model each struct as simply its gc field(s).

That is, given a gc struct G with GC field f, we model

G.f = ... 

as an assignment to G and

   = G.f

as a read from G. Since we know G itself cannot escape, "escaping" of G means G.f can escape.

Note this conflates the fields of G: either they all escape or none of them do. We could make this more granular but it's not clear that doing so is worthwhile, and it requires more up-front work to determine how to track each field's status.

If the struct or struct fields are only consumed locally, we may be able to prove the gc referents don't escape.

This is a subset/elaboration of #112543 that does not try and reason interprocedurally.

Contributes to #104936 / #108913
Fixes #113236 (once the right inlines happen)

Implement a very simplistic "field sensitive" analysis for `Span<T>`
where we model the span as simply its byref field.

If the span is only consumed locally, and the array does not otherwise
escape, then the array does not escape.

This is a subset of dotnet#112543 that does not try and reason interprocedurally.

Contributes to dotnet#104936 / dotnet#108913
@Copilot Copilot bot review requested due to automatic review settings March 4, 2025 22:37
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This PR adds new tests to the JIT for enhanced escape analysis concerning Span capture. It includes the introduction of a ref struct (SpanKeeper) for capturing spans and several new test cases and helper methods that exercise various span capture patterns.

  • Introduces a new ref struct (SpanKeeper) to model captured spans.
  • Adds new test methods (e.g., SpanCaptureArray1, SpanCaptureArrayT, SpanEscapeArrayOutParam2) for Span capture scenarios.
  • Includes supporting helper methods (e.g., Use(Span), TrashStack) to drive the tests.

Reviewed Changes

File Description
src/tests/JIT/opt/ObjectStackAllocation/ObjectStackAllocationTests.cs Added tests and helper methods to verify proper handling of Span captures in varying contexts

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (1)

src/tests/JIT/opt/ObjectStackAllocation/ObjectStackAllocationTests.cs:519

  • The output parameter 'b' in SpanEscapeArrayOutParam2Helper is used before it is explicitly assigned. Consider initializing 'b' (e.g., using 'b = default;') before setting its fields to ensure proper assignment.
b.span = x;
@AndyAyersMS
Copy link
Member Author

@EgorBo @jakobbotsch PTAL
cc @dotnet/jit-contrib

Note for full-blown field-sensitive handling of structs we would need to build connection graph edges when fields are loaded; since we know loads from spans can't cause escapes we bypass that for now.

Comment on lines 1434 to 1436
newLayout = m_compiler->typGetBlkLayout(layout->GetSize());
JITDUMP("Changing layout of span V%02u to block\n", lclNum);
lclVarDsc->ChangeLayout(newLayout);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for this we want a layout with proper padding information, otherwise the promotion will be suboptimal compared to the original layout.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok... maybe an "erase GC" method on a layout?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it even necessary for us to change the type here? Seems like the original byref type should be ok to now point to a stack-allocated thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok... maybe an "erase GC" method on a layout?

I think you can do something like:

ClassLayoutBuilder builder(lcl->GetLayout()->GetSize());
builder.CopyInfoFrom(0, lcl->GetLayout(), /* copyPadding */ true);
builder.SetGCPtrType(OFFSETOF__CORINFO_Span__reference / TARGET_POINTER_SIZE, TYP_I_IMPL);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I suppose if we generalized this PR to arbitrary structs we would indeed want something that only copies the padding information, so maybe introducing another parameter to CopyInfoFrom to ignore the GC information would be the way to go.

@jakobbotsch
Copy link
Member

Note for full-blown field-sensitive handling of structs we would need to build connection graph edges when fields are loaded; since we know loads from spans can't cause escapes we bypass that for now.

Maybe I misunderstand this, but what about e.g. MemoryMarshal.GetReference API? Or even ref span[0]?

Can the handling of this PR be generalized to treat any struct with references in it as one big reference that causes unification to happen between everything that read/wrote the struct?

@AndyAyersMS
Copy link
Member Author

Note for full-blown field-sensitive handling of structs we would need to build connection graph edges when fields are loaded; since we know loads from spans can't cause escapes we bypass that for now.

Maybe I misunderstand this, but what about e.g. MemoryMarshal.GetReference API? Or even ref span[0]?

Can the handling of this PR be generalized to treat any struct with references in it as one big reference that causes unification to happen between everything that read/wrote the struct?

Indeed, looks like we need to do some modelling of the loads. I think we can just conflate the struct with its fields. Let me see how this works out.

We conflate a struct with its fields, since we know the struct itself
can't escape. We rely on accesses not being able to walk from one local
to another. We try to be properly conservative around byrefs.
Comment on lines +787 to +793
// Allow a source GC_REF to match a custom GC_BYREF
//
if ((layout2->GetGCPtrType(i) == TYP_REF) && (layout1->GetGCPtrType(i) == TYP_BYREF) &&
layout1->IsCustomLayout())
{
continue;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the definition of layout compatibility seems quite questionable to me. Why is it necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GCStruct a;
a.obj = ..;   // doesn't escape

GCStruct b;
b.obj = ..;   // escapes

// Assignment is legal, but layouts are not compatible w/o the above
// a.obj must be BYREF as it may refer to stack or heap

a = b

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was tempted to retain the class handle in new layout, but didn't.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note allowing b = a would be incompatible with the results of escape analysis, which is why this is one-sided (source can be byref when dest is ref, but not vice-versa).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So are you saying that for the

object aObj = ...; // doesn't escape

object bObj = ...; // escapes

aObj = bObj;

case, today we will produce a STORE_LCL_VAR that stores a TYP_REF into a TYP_BYREF?

Copy link
Member Author

@AndyAyersMS AndyAyersMS Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we will retype aObj appearances to be byrefs, as it will be "possibly stack pointing". And aObj itself in the local table.

This has been there since forever.

Copy link
Member

@jakobbotsch jakobbotsch Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it makes it somewhat more palatable. But it seems like we need to separate the notions of "user facing" layout compatibility, as used by e.g. Unsafe.BitCast, and "implementation-specific assignment compatibility" which is non-symmetric and where we rely on internal coreclr implementation details. As you probably know there could be a potential future where byrefs pointing at the method table would be illegal... I guess in that case we would need a third kind of GC pointer. Or we would have to switch object stack allocation to only kick in when it can reason about and change all accesses of the object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to suggestions on how to express this better. Perhaps the presence of a custom layout for the dest might be enough?

We could get here with custom layouts on both sides:

GCStruct a;
a.obj = ..;   // doesn't escape, may point at heap [byref]

GCStruct b;
b.obj = ..;   // doesn't escape, never points at heap [nongc]

a = b

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just suggest to rename this to ClassLayout::AssignmentCompatible(ClassLayout* dest, ClassLayout* src) and allow the byref <- ref assignments generally. And then switch Unsafe.BitCast to a version that keeps the stricter behavior.
Should probably be a separate PR...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Maybe BitCast doesn't even need any changes, I was under the assumption that it threw an exception on incompatible layouts, but it does not seem like it is the case)

{
gcType = TYPE_GC_BYREF;
}
SetGCPtr(startSlot + slot, gcType);
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code feels very object stack allocation specific. Put it in objectalloc.cpp?

Comment on lines +10970 to +10973
// Get a layout like an existing layout, with all gc refs removed
ClassLayout* typGetNonGCLayout(ClassLayout* existingLayout);
// Get a layout like an existing layout, with all gc refs changed to byrefs
ClassLayout* typGetByrefLayout(ClassLayout* existingLayout);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like something that only stack allocation is ever going to use, so it would make more sense to me to keep it in objectalloc.h/cpp.

Comment on lines +463 to +487
if (prefix != nullptr)
{
char* newName = nullptr;
char* newShortName = nullptr;

if (layoutName != nullptr)
{
size_t len = strlen(prefix) + strlen(layoutName) + 1;
newName = m_compiler->getAllocator(CMK_DebugOnly).allocate<char>(len);
sprintf_s(newName, len, "%s%s", prefix, layoutShortName);
}

if (layoutShortName != nullptr)
{
size_t len = strlen(prefix) + strlen(layoutName) + 1;
newShortName = m_compiler->getAllocator(CMK_DebugOnly).allocate<char>(len);
sprintf_s(newShortName, len, "%s%s", prefix, layoutShortName);
}

SetName(newName, newShortName);
}
else
{
SetName(layoutName, layoutShortName);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can use printfAlloc.

@AndyAyersMS
Copy link
Member Author

Assert failure(PID 3416 [0x00000d58], Thread: 2092 [0x082c]): Assertion failed 'unreached' in 'ManagedSequential:LayoutClassObjectBaseIsManagedSequential()' during 'Allocate Objects' (IL size 36; hash 0xc0408010; FullOpts)

    File: D:\a\_work\1\s\src\coreclr\jit\objectalloc.cpp:1460
    Image: C:\h\w\A5540979\p\corerun.exe

Wonder if this is what I was seeing locally, somehow a managed type makes it to a UDIV.

@AndyAyersMS
Copy link
Member Author

We are trying to retype after stack allocation

Allocating V02 on the stack
...
Changing the type of V00 from ref to long
Changing the type of V02 from ref to long
...
STMT00009 ( ??? ... ??? )
               [000027] DA-X-------                         *  STORE_LCL_VAR int    V04 tmp3         
               [000015] ---X-------                         \--*  CAST      int <- long
               [000014] ---X-------                            \--*  SUB       long  
               [000011] ---X-------                               +--*  FIELD_ADDR byref  ManagedSequential+LayoutClassObjectBase:l1
               [000010] -----------                               |  \--*  LCL_VAR   ref    V00 loc0         
               [000013] -----------                               \--*  LCL_VAR   byref  V03 tmp2

and don't expect to see a cast.

@hez2010
Copy link
Contributor

hez2010 commented Mar 11, 2025

I see a regression where previously stack allocated array now becomes heap allocated (coreclr_tests), is it expected?

@@ -26,10 +26,10 @@
 ;  V15 tmp13        [V15,T14] (  2,  2   )   byref  ->  rbx         single-def "constrained call requires dereference for 'this' right before call"
 ;* V16 tmp14        [V16    ] (  0,  0   )   byref  ->  zero-ref    "constrained call requires dereference for 'this' right before call"
 ;  V17 tmp15        [V17,T15] (  2,  2   )   byref  ->  rbx         single-def "constrained call requires dereference for 'this' right before call"
-;* V18 tmp16        [V18    ] (  0,  0   )    long  ->  zero-ref    class-hnd exact "NewArr temp" <MyInt[]>
-;  V19 tmp17        [V19,T13] (  4,  2   )    long  ->  rsi         class-hnd exact single-def "Inline stloc first use temp" <MyInt[]>
+;* V18 tmp16        [V18    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "NewArr temp" <MyInt[]>
+;  V19 tmp17        [V19,T13] (  4,  2   )     ref  ->  rsi         class-hnd exact single-def "Inline stloc first use temp" <MyInt[]>
 ;  V20 tmp18        [V20,T16] (  2,  2   )     ref  ->  rax         class-hnd single-def "Strict ordering of exceptions for Array store" <MyInt>
-;* V21 tmp19        [V21    ] (  0,  0   )    long  ->  zero-ref    "constrained call requires dereference for 'this' right before call"
+;* V21 tmp19        [V21    ] (  0,  0   )   byref  ->  zero-ref    "constrained call requires dereference for 'this' right before call"
 ;* V22 tmp20        [V22    ] (  0,  0   )   byref  ->  zero-ref    "constrained call requires dereference for 'this' right before call"
 ;  V23 tmp21        [V23,T17] (  2,  2   )   byref  ->  rbx         single-def "constrained call requires dereference for 'this' right before call"
 ;* V24 tmp22        [V24    ] (  0,  0   )   byref  ->  zero-ref    "constrained call requires dereference for 'this' right before call"
@@ -38,27 +38,22 @@
 ;  V27 tmp25        [V27,T19] (  2,  2   )   byref  ->  rbx         single-def "constrained call requires dereference for 'this' right before call"
 ;* V28 tmp26        [V28    ] (  0,  0   )   byref  ->  zero-ref    "constrained call requires dereference for 'this' right before call"
 ;  V29 tmp27        [V29,T20] (  2,  2   )   byref  ->  rbx         single-def "constrained call requires dereference for 'this' right before call"
-;  V30 tmp28        [V30    ] (  3,  1.50)  struct (24) [rsp+0x20]  do-not-enreg[XS] must-init addr-exposed "stack allocated array temp" <MyInt[]>
-;  V31 tmp29        [V31,T01] (  5,  5   )     int  ->  rax         "Single return block return value"
-;  V32 tmp30        [V32,T21] (  2,  2   )     ref  ->  rcx         single-def "argument with side effect"
-;  V33 tmp31        [V33,T09] (  3,  3   )     ref  ->  rcx         single-def "arr expr"
-;  V34 tmp32        [V34,T22] (  2,  2   )     ref  ->  rcx         single-def "argument with side effect"
-;  V35 tmp33        [V35,T10] (  3,  3   )     ref  ->  rcx         single-def "arr expr"
-;  V36 tmp34        [V36,T23] (  2,  2   )     ref  ->  rcx         single-def "argument with side effect"
-;  V37 tmp35        [V37,T11] (  3,  3   )     ref  ->  rbx         single-def "arr expr"
-;  V38 tmp36        [V38,T12] (  3,  3   )     ref  ->  rbx         single-def "arr expr"
+;  V30 tmp28        [V30,T01] (  5,  5   )     int  ->  rax         "Single return block return value"
+;  V31 tmp29        [V31,T21] (  2,  2   )     ref  ->  rcx         single-def "argument with side effect"
+;  V32 tmp30        [V32,T09] (  3,  3   )     ref  ->  rcx         single-def "arr expr"
+;  V33 tmp31        [V33,T22] (  2,  2   )     ref  ->  rcx         single-def "argument with side effect"
+;  V34 tmp32        [V34,T10] (  3,  3   )     ref  ->  rcx         single-def "arr expr"
+;  V35 tmp33        [V35,T23] (  2,  2   )     ref  ->  rcx         single-def "argument with side effect"
+;  V36 tmp34        [V36,T11] (  3,  3   )     ref  ->  rbx         single-def "arr expr"
+;  V37 tmp35        [V37,T12] (  3,  3   )     ref  ->  rbx         single-def "arr expr"
 ;
-; Lcl frame size = 56
+; Lcl frame size = 40
 
 G_M56432_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rsi
        push     rbx
-       sub      rsp, 56
-       vxorps   xmm4, xmm4, xmm4
-       vmovdqa  xmmword ptr [rsp+0x20], xmm4
-       xor      eax, eax
-       mov      qword ptr [rsp+0x30], rax
-						;; size=23 bbWeight=1 PerfScore 5.83
+       sub      rsp, 40
+						;; size=6 bbWeight=1 PerfScore 2.25
 G_M56432_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        mov      rcx, 0xD1FFAB1E      ; <unknown class>
        call     CORINFO_HELP_NEWSFAST
@@ -119,21 +114,28 @@ G_M56432_IG03:        ; bbWeight=0.50, gcrefRegs=0008 {rbx}, byrefRegs=0000 {},
        test     eax, eax
        jne      G_M56432_IG11
        mov      rcx, 0xD1FFAB1E      ; MyInt[]
-       mov      qword ptr [rsp+0x20], rcx
-       lea      rcx, [rsp+0x20]
-       mov      dword ptr [rcx+0x08], 1
-       lea      rsi, [rsp+0x20]
-       mov      rcx, 0xD1FFAB1E      ; <unknown method>
-       call     [System.Activator:CreateInstance[System.__Canon]():System.__Canon]
+       mov      edx, 1
+       call     CORINFO_HELP_NEWARR_1_OBJ
        ; gcrRegs +[rax]
        ; gcr arg pop 0
-       mov      gword ptr [rsi+0x10], rax
+       mov      rsi, rax
+       ; gcrRegs +[rsi]
+       mov      rcx, 0xD1FFAB1E      ; <unknown method>
+       call     [System.Activator:CreateInstance[System.__Canon]():System.__Canon]
+       ; gcr arg pop 0
+       lea      rcx, bword ptr [rsi+0x10]
+       ; byrRegs +[rcx]
+       mov      rdx, rax
+       ; gcrRegs +[rdx]
+       call     CORINFO_HELP_ASSIGN_REF
+       ; gcrRegs -[rax rdx]
+       ; byrRegs -[rcx]
        mov      rcx, gword ptr [rsi+0x10]
        ; gcrRegs +[rcx]
        mov      r11, 0xD1FFAB1E      ; code for <unknown method>
        mov      edx, 100
        call     [r11]<unknown method>
-       ; gcrRegs -[rax rcx]
+       ; gcrRegs -[rcx]
        ; gcr arg pop 0
        mov      rcx, gword ptr [rbx+0x10]
        ; gcrRegs +[rcx]
@@ -141,7 +143,7 @@ G_M56432_IG03:        ; bbWeight=0.50, gcrefRegs=0008 {rbx}, byrefRegs=0000 {},
        ; gcrRegs +[r8]
        xor      edx, edx
        call     CORINFO_HELP_ARRADDR_ST
-       ; gcrRegs -[rcx r8]
+       ; gcrRegs -[rcx rsi r8]
        ; gcr arg pop 0
        mov      rcx, gword ptr [rbx+0x10]
        ; gcrRegs +[rcx]
@@ -187,7 +189,7 @@ G_M56432_IG03:        ; bbWeight=0.50, gcrefRegs=0008 {rbx}, byrefRegs=0000 {},
        ; gcr arg pop 0
        mov      rcx, rbx
        ; gcrRegs +[rcx]
-						;; size=283 bbWeight=0.50 PerfScore 30.25
+						;; size=282 bbWeight=0.50 PerfScore 29.88
 G_M56432_IG04:        ; bbWeight=0.50, isz, extend
        mov      rdx, rsi
        ; gcrRegs +[rdx]
@@ -337,7 +339,7 @@ G_M56432_IG07:        ; bbWeight=0.50, gcrefRegs=0008 {rbx}, byrefRegs=0000 {},
        mov      eax, 4
 						;; size=95 bbWeight=0.50 PerfScore 10.50
 G_M56432_IG08:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, epilog, nogc
-       add      rsp, 56
+       add      rsp, 40
        pop      rbx
        pop      rsi
        ret      
@@ -379,7 +381,7 @@ G_M56432_IG09:        ; bbWeight=0.50, gcVars=0000000000000000 {}, gcrefRegs=000
        mov      eax, 3
 						;; size=95 bbWeight=0.50 PerfScore 10.50
 G_M56432_IG10:        ; bbWeight=0.50, epilog, nogc, extend
-       add      rsp, 56
+       add      rsp, 40
        pop      rbx
        pop      rsi
        ret      
@@ -418,7 +420,7 @@ G_M56432_IG11:        ; bbWeight=0.50, gcVars=0000000000000000 {}, gcrefRegs=000
        mov      eax, 2
 						;; size=81 bbWeight=0.50 PerfScore 7.50
 G_M56432_IG12:        ; bbWeight=0.50, epilog, nogc, extend
-       add      rsp, 56
+       add      rsp, 40
        pop      rbx
        pop      rsi
        ret      
@@ -457,7 +459,7 @@ G_M56432_IG13:        ; bbWeight=0.50, gcVars=0000000000000000 {}, gcrefRegs=000
        mov      eax, 1
 						;; size=81 bbWeight=0.50 PerfScore 7.50
 G_M56432_IG14:        ; bbWeight=0.50, epilog, nogc, extend
-       add      rsp, 56
+       add      rsp, 40
        pop      rbx
        pop      rsi
        ret      
@@ -468,7 +470,7 @@ G_M56432_IG15:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
        int3     
 						;; size=6 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 1098, prolog size 23, PerfScore 126.83, instruction count 214, allocated bytes for code 1098 (MethodHash=7754238f) for method test:TestEntryPoint():int (FullOpts)
+; Total bytes of code 1080, prolog size 6, PerfScore 122.88, instruction count 211, allocated bytes for code 1080 (MethodHash=7754238f) for method test:TestEntryPoint():int (FullOpts)

@AndyAyersMS
Copy link
Member Author

I see a regression where previously stack allocated array now becomes heap allocated (coreclr_tests), is it expected?

No. Have you looked into it? If not, I'll get to it in a day or two.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Escape Analysis: Elide an array allocation in BitConverter.GetBytes pattern
3 participants