Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cDAC] Stack walk support more Frame types #112997

Open
wants to merge 67 commits into
base: main
Choose a base branch
from

Conversation

max-charlamb
Copy link
Contributor

@max-charlamb max-charlamb commented Feb 27, 2025

Builds on #111759
Contributes to #110758

Adds cDAC stack walking support for the following Frame types:

  • TransitionFrame (Base class)
    • FramedMethodFrame
    • CLRToCOMMethodFrame
    • PInvokeCalliFrame
    • StubDispatchFrame
    • CallCountingHelperFrame
    • ExternalMethodFrame
    • DynamicHelperFrame
  • FuncEvalFrame
  • ResumableFrame (Base class)
    • RedirectedThreadFrame
  • FaultingExceptionFrame
  • HijackFrame

Expands cDAC Frame handling to be platform specific

In line with current cDAC stack walking support, only AMD64 and ARM64 support was added. When bringing up support for other architectures, platform specific frame handlers will be required. This was done to exactly match the existing DAC stack walking behavior.

Callouts

  1. Data descriptor naming convention.
    • FaulingExceptionFrame's stores a T_CONTEXT directly. ResumableFrame's store a pointer to a T_CONTEXT (PT_CONTEXT). To make this difference clear I named the direct struct as TargetContext and the pointer as TargetContextPtr. This distinction is important when reading the data descriptors. The former is a direct offset from the Frame's address, the latter is a pointer read at an offset from the Frame's address.
    • So far, we have avoided adding logic to the IData classes. For that reason, the classes expose a TargetPointer instead of the struct directly. However, this naming convention is a bit confusing because both TargetPointers (TargetContext and TargetContextPtr) end up pointing to the beginning of a T_CONTEXT.
  2. Using reflection to read/write a registers to a context.
    • Some data structures (CalleeSavedRegisters and HijackArgs) store a bag of registers values. To prevent hard coding the number/names of these register value, I have opted to use reflection to write the corresponding struct values from their string names.
    • This isn't my favorite choice, but the alternatives are hardcoding these values on the cDAC side or creating a large switch statement in each IPlatformContext to avoid reflection.
  3. Some data descriptors have very different representation on different platforms. CalleeSavedRegisters/HijackArgs
    • If this is common, maybe it would be better to have a separate platform specific <arch>datadescriptor.h that is imported based on platform.

Testing

Tested using amd64 Windows. Verified byte for byte correctness with DAC stack walking.

How produce call stacks with uncommon Frames

FuncEvalFrame

FuncEvalFrame's are generated during debugger immediate execution.

Debug any program in VS. Break and call a blocking function (Console.ReadLine()) attach a debugger non-invasively to stack walk or create a dump.
For example, using cdb, cdb -pv -p <processId>

HijackFrame

class HijackTest()
{
    public volatile bool flag;
    public volatile int num;

    // Set breakpoint at ThreadSuspend::SuspendEE then step out and look at the main thread stack
    // (bu coreclr!ThreadSuspend::SuspendEE)
    // Note: HijackFrames are not used on Windows if CET is enabled. Either test on non-Windows
    // or disable CET by modifying Thread::AreShadowStacksEnabled to return false.
    [MethodImpl(MethodImplOptions.AggressiveOptimization)]
    public void Test()
    {
        // start other thread that will force a GC collection.
        Task.Run(Work);

        // run loop checking volatile variable to generate non-interruptible code.
        while (!flag)
        {
            TestLoop();
        }
    }
    public void Work()
    {
        Thread.Sleep(500);
        GC.Collect();
    }

    [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.AggressiveOptimization)]
    public void TestLoop()
    {
        num++; num++; num++; num++; num++;
        num++; num++; num++; num++; num++;
        num++; num++; num++; num++; num++;
        num++; num++; num++; num++; num++;
        num++; num++; num++; num++; num++;
        num++; num++; num++; num++; num++;
    }
}

RedirectedThreadFrame (ResumableFrame)

class RedirectedThreadFrame()
{
    public volatile bool flag;
    public volatile int num;

    // Configure WinDBG to break on clr exceptions
    // (sxe clr)
    [MethodImpl(MethodImplOptions.AggressiveOptimization)]
    public void Test()
    {
        var cts = new CancellationTokenSource();
        cts.CancelAfter(500);
        
        // use ControlledExecution with a cancellation token to trigger a
        // thread abort with a try/catch
        ControlledExecution.Run(Work, cts.Token);
        while (!flag)
        {
            TestLoop();
        }
    }

    [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.AggressiveOptimization)]
    public void TestLoop()
    {
        for (int i = 0; i < 20; i++)
        {
            num++;
        }
    }

    [MethodImpl(MethodImplOptions.AggressiveOptimization)]
    public void Work()
    {
        try
        {
            while (!flag)
            {
                TestLoop();
            }
        }
        catch (Exception e)
        {
            Console.WriteLine(e);
        }
    }
}

FaultingExceptionFrame

class FaultingExceptionTest()
{
    public volatile bool flag;
    public volatile int num;

    // Set breakpoint on ThrowControlForThread
    // (bu coreclr!ThrowControlForThread)
    [MethodImpl(MethodImplOptions.AggressiveOptimization)]
    public void Test()
    {
        Console.ReadLine();
        var cts = new CancellationTokenSource();
        cts.CancelAfter(500);
        ControlledExecution.Run(Work, cts.Token);
        
        while (!flag)
        {
            TestLoop();
        }
    }

    [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.AggressiveOptimization)]
    public void TestLoop()
    {
        for (int i = 0; i < 20; i++)
        {
            if (num > 10000)
            {
                // important to call another function here
                Console.WriteLine("num is greater than 10000");
            }
            num++;
        }
    }

    [MethodImpl(MethodImplOptions.AggressiveOptimization)]
    public void Work()
    {
        try
        {
            while (!flag)
            {
                TestLoop();
            }
        }
        catch (Exception e)
        {
            Console.WriteLine(e);
        }
    }
}

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This PR extends cDAC stack walking support by introducing new frame types and platform‐specific logic for AMD64 and ARM64. Key changes include new implementations for frames such as FramedMethodFrame, FaultingExceptionFrame, FuncEvalFrame, and HijackFrame; updated platform handlers (ARM64 and AMD64); and corresponding updates to DataType, context handling, and documentation.

Reviewed Changes

File Description
Data/Frames/FramedMethodFrame.cs Implements frame creation for FramedMethodFrame.
Data/Frames/FaultingExceptionFrame.cs Adds FaultingExceptionFrame with conditional context pointer initialization.
Data/Frames/DebuggerEval.cs Introduces DebuggerEval frame support.
Data/Frames/FuncEvalFrame.cs Implements FuncEvalFrame for function evaluation.
Contracts/StackWalk/FrameHandling/IPlatformFrameHandler.cs Declares interface for platform-specific frame handling methods.
Data/Frames/CalleeSavedRegisters.cs Reads callee-saved registers into a dictionary.
Contracts/StackWalk/FrameHandling/FrameIterator.cs Introduces frame iteration and type resolution logic.
Contracts/StackWalk/FrameHandling/ARM64FrameHandler.cs Provides ARM64-specific frame handling logic.
Contracts/StackWalk/FrameHandling/AMD64FrameHandler.cs Provides AMD64-specific frame handling logic.
Contracts/StackWalk/Context/ContextHolder.cs Renames and updates context holder implementation.
Contracts/StackWalk/Context/ARM64Context.cs, AMD64Context.cs Updates context structures with ToString overrides and public visibility.
docs/design/datacontracts/StackWalk.md Updates documentation to reflect new frame and data contract definitions.
Abstractions/DataType.cs Adds new DataType enum members for various frame types.
Contracts/StackWalk/Context/IPlatformAgnosticContext.cs Updates context creation to use the new ContextHolder.
Contracts/StackWalk/FrameIterator.cs (removed) Removes deprecated frame iterator implementation.

Copilot reviewed 27 out of 27 changed files in this pull request and generated 1 comment.

@max-charlamb max-charlamb changed the title [cDAC] Stackwalking more Frame types [cDAC] Stack walk support more Frame types Mar 4, 2025

Global variables used:
| Global Name | Type | Purpose |
| --- | --- | --- |
| For each FrameType `<frameType>`, `<frameType>##Identifier` | FrameIdentifier enum value | Identifier used to determine concrete type of Frames |
| For each FrameType `<frameType>`, `<frameType>##Identifier` | `FrameIdentifier` enum value | Identifier used to determine concrete type of Frames |

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bunch of architecture specific goo that isn't just data reading in the files like AMD64FrameHandler.cs and ARM64FrameHandler.cs. That all needs to be documented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added docs on each type of Frame Context updating mechanism and their platform specific details.

value = default;
return false;
}
value = new((ulong)field.GetValue(Context)!);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it should use Unsafe.As to basically bitcast the boxed field value to an nuint instead of doing a managed cast. You'd need to make sure you're not doing an unsafe overread first though, which I think is like... if IsPrimitive check sizeof(X) >= sizeof(TargetNUInt)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also convention for a managed TryXXX method to not throw on failure, I think? So in this case if the field's value were not convertible to ulong, we'd throw instead of returning false. I'm not sure if that's desirable for diagnostics scenarios, presumably we already have exception handlers for the environment where this is running...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This should not throw, we can let the caller decide to ignore or handle errors.

I reworked this to be less generic and only work for ulong and uint, the nuint values we care about on the target platform.

I'd appreciate if you could take another look for safety. I want to be particularly careful around this use of reflection.

UpdateCalleeSavedRegistersFromOtherContext(otherContextHolder);

_holder.InstructionPointer = otherContextHolder.Context.InstructionPointer;
_holder.StackPointer = otherContextHolder.Context.StackPointer;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FramePointer not being updated here caught my attention; is that because a 'software exception frame' is exception handling occurring within an outer frame and the frame pointer is shared? Do we want a comment? I can imagine this being obvious to someone who knows more about this area

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why the SEF does not update the FP. I created these handlers based off the runtime code and verified that the output is the same as the DAC.

For now, I added docs explaining what each Frame does to update the context, but I'd like to leave "why" out of scope for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Diagnostics-coreclr enhancement Product code improvement that does NOT require public API changes/additions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants