Skip to content

Enable new exception handling on win-x86 #115957

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 26, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 2 additions & 100 deletions docs/design/coreclr/botr/clr-abi.md
Original file line number Diff line number Diff line change
@@ -218,14 +218,10 @@ This section describes the conventions the JIT needs to follow when generating c

## Funclets

For all platforms except Windows/x86 on CoreCLR, all managed EH handlers (finally, fault, filter, filter-handler, and catch) are extracted into their own 'funclets'. To the OS they are treated just like first class functions (separate PDATA and XDATA (`RUNTIME_FUNCTION` entry), etc.). The CLR currently treats them just like part of the parent function in many ways. The main function and all funclets must be allocated in a single code allocation (see hot cold splitting). They 'share' GC info. Only the main function prolog can be hot patched.
For all platforms, managed EH handlers (finally, fault, filter, filter-handler, and catch) are extracted into their own 'funclets'. To the OS they are treated just like first class functions (separate PDATA and XDATA (`RUNTIME_FUNCTION` entry), etc.). The CLR currently treats them just like part of the parent function in many ways. The main function and all funclets must be allocated in a single code allocation (see hot cold splitting). They 'share' GC info. Only the main function prolog can be hot patched.

The only way to enter a handler funclet is via a call. In the case of an exception, the call is from the VM's EH subsystem as part of exception dispatch/unwind. In the non-exceptional case, this is called local unwind or a non-local exit. In C# this is accomplished by simply falling-through/out of a try body or an explicit goto. In IL this is always accomplished via a LEAVE opcode, within a try body, targeting an IL offset outside the try body. In such cases the call is from the JITed code of the parent function.

For Windows/x86 on CoreCLR, all handlers are generated within the method body, typically in lexical order. A nested try/catch is generated completely within the EH region in which it is nested. These handlers are essentially "in-line funclets", but they do not look like normal functions: they do not have a normal prolog or epilog, although they do have special entry/exit and register conventions. Also, nested handlers are not un-nested as for funclets: the code for a nested handler is generated within the handler in which it is nested.

For Windows/x86 on NativeAOT and Linux/x86, funclets are used just like on other platforms.

## Cloned finallys

RyuJIT attempts to speed the normal control flow by 'inlining' a called finally along the 'normal' control flow (i.e., leaving a try body in a non-exceptional manner via C# fall-through). This optimization is supported on all architectures.
@@ -234,7 +230,7 @@ RyuJIT attempts to speed the normal control flow by 'inlining' a called finally

In order to have proper forward progress and `Thread.Abort` semantics, there are restrictions on where a call-to-finally can be, and what the call site must look like. The return address can **NOT** be in the corresponding try body (otherwise the VM would think the finally protects itself). The return address **MUST** be within any outer protected region (so exceptions from the finally body are properly handled).

JIT64, and RyuJIT for non-x86, creates something similar to a jump island: a block of code outside the try body that calls the finally and then branches to the final target of the leave/non-local-exit. This jump island is then marked in the EH tables as if it were a cloned finally. The cloned finally clause prevents a Thread.Abort from firing before entering the handler. By having the return address outside of the try body we satisfy the other constraint.
RyuJIT creates something similar to a jump island: a block of code outside the try body that calls the finally and then branches to the final target of the leave/non-local-exit. This jump island is then marked in the EH tables as if it were a cloned finally. The cloned finally clause prevents a Thread.Abort from firing before entering the handler. By having the return address outside of the try body we satisfy the other constraint.

## ThreadAbortException considerations

@@ -388,100 +384,6 @@ Any register value changes made in the funclet are lost. If a funclet wants to m

Funclets are not required to preserve non-volatile registers.

## Windows/x86 EH considerations

The Windows/x86 model is somewhat different than non-Windows/x86 model. Windows/X86-specific concerns are mentioned here.

### catch / filter-handler regions

When leaving a `catch` or `filter-handler` region, the JIT calls the helper `CORINFO_JIT_ENDCATCH` (implemented in the VM by the `JIT_EndCatch` function) before transferring control to the target location. The code to call to `CORINFO_JIT_ENDCATCH` is within the catch region itself.

### finally / fault regions

"finally" clauses are invoked in the non-exceptional code by the generated JIT code, and in the exceptional case by the VM. "fault" clauses are only executed in exceptional cases by the VM.

On entry to the finally or fault, the top of the stack is the address that should be jumped to on exit from the finally, using a "pop eax; jmp eax" sequence. A simple 'ret' could be used, but we avoid it to avoid potentially creating an unbalanced processor call/ret buffer stack, and messing up call/ret prediction.

There are no register or other stack arguments to a 'finally' or 'fault'.

### ShadowSP slots

X86 exception handlers (e.g., catch, finally) do not establish their own frames. They don't (really) have prologs and epilogs. However, they do use the stack, and need to restore the stack pointer of the enclosing exception handling region when the handler completes executing.

To implement this requirement, for any function with EH, we create a frame-local variable to store a stack of "Shadow SP" values, or ShadowSP slots. In the JIT, the local var is called lvaShadowSPslotsVar, and in dumps it is called "EHSlots". The variable is created in lvaMarkLocalVars() and is sized as follows:
1. 1 slot is reserved for the VM (for ICodeManager::FixContext(ppEndRegion)).
2. 1 slot for each handler nesting level (total: ehMaxHndNestingCount).
3. 1 slot for a filter (we do this even if there aren't any filters; size optimization opportunity to not do this if there are no filters?)
4. 1 slot for zero termination

Note that the since a slot on x86 is 4 bytes, the minimum size is 16 bytes. The idea is to have 1 slot for each handler that could be possibly be invoked at the same time. For example, for:

```cs
try {
...
} catch {
try {
...
} catch {
...
}
}
```

When the inner 'catch' is running, the outer 'catch' is also conceptually "on the stack", or in the middle of execution. So the maximum handler nesting count would be 2.

The ShadowSP slots are filled in from the highest address downwards to the lowest address. The highest slot is reserved. The first address with a zero is a zero terminator. So, we always zero terminate by setting the second-to-highest slot to zero in the function prolog (if we didn't zero initialize all locals anyway).

When calling a finally, we set the appropriate level to 0xFC (aka "finally call") and zero terminate the next-lower address.

Thus, calling a finally from JIT generated code looks like:

```asm
mov dword ptr [L_02+0x4 ebp-10H], 0 // This must happen before the 0xFC is written
mov dword ptr [L_02+0x8 ebp-0CH], 252 // 0xFC
push G_M52300_IG07
jmp SHORT G_M52300_IG04
```

In this case, `G_M52300_IG07` is not the address after the 'jmp', so a simple 'call' wouldn't work.

The code this finally returns to looks like this:

```asm
mov dword ptr [L_02+0x8 ebp-0CH], 0
jmp SHORT G_M52300_IG05
```

In this case, it zeros out the ShadowSP slot that it previously set to 0xFC, then jumps to the address that is the actual target of the leave from the finally.

The JIT does this "end finally restore" by creating a GT_END_LFIN tree node, with the appropriate EH region ID as an operand, that generates this code.

In the case of an exceptional 'finally' invocation, the VM sets up the 'return address' to whatever address it wants the JIT to return to.

For catch handlers, the VM is completely in control of filling and reading the ShadowSP slots; the JIT just makes sure there is enough space.

### ShadowSP slots frame location

The ShadowSP slots are required to live in a very particular location, reported via the GC info header. Note that the GC info header does not contain an actual pointer or offset to the ShadowSP slots variable. Instead, the VM calculates the location from other data that does exist in the GC info header, as a negative offset from the EBP frame pointer (which must be established in functions with EH) using the function `GetFirstBaseSPslotPtr()` / `GetStartShadowSPSlotsOffset()`. The VM thus assumes the following frame layout:

1. callee-saved registers <= EBP points to the top of this range
2. GS cookie
3. 1 slot if localloc is used (Saved localloc SP?)
4. 1 slot for CORINFO_GENERICS_CTXT_FROM_PARAMTYPEARG -- assumed for any function with EH, to avoid adding a flag to the GC info about whether it exists or not.
5. ShadowSP slots

(note, these don't have to be in this order for this calculation, but they possibly do need to be in this order for other calculations.) See also `GetEndShadowSPSlotsOffset()`.

The VM walks the ShadowSP slots in the function `GetHandlerFrameInfo()`, and sets it in various functions such as `EECodeManager::FixContext()`.

### JIT implementation: finally

An aside on the JIT implementation for x86.

The JIT creates BBJ_CALLFINALLY/BBJ_ALWAYS pairs for calling the 'finally' clause. The BBJ_CALLFINALLY block will have a series of CORINFO_JIT_ENDCATCH calls appended at the end, if we need to "leave" a series of nested catches before calling the finally handler (due to a single 'leave' opcode attempting to leave multiple levels of different types of handlers). Then, a GT_END_LFIN statement with EH region ID as an argument is added to the step block where the finally returns to. This is used to generate code to zero out the appropriate level of the ShadowSP slot array after the finally has been executed and the final EH nesting depth is known. The BBJ_CALLFINALLY block itself generates the code to insert the 0xFC value into the ShadowSP slot array. If the 'finally' is invoked by the VM, in exceptional cases, then the VM itself updates the ShadowSP slot array before invoking the 'finally'.

At the end of a finally or filter, a GT_RETFILT is inserted. For a finally, this is a TYP_VOID which is just a placeholder. For a filter, it takes an argument which evaluates to the return value from the filter. On legacy JIT, this tree triggers the generation of both the return value load (for filters) and the "funclet" exit sequence, which is either a "pop eax; jmp eax" for a finally, or a "ret" for a filter. When processing the BBJ_EHFINALLYRET or BBJ_EHFILTERRET block itself (at the end of code generation for the block), nothing is generated. In RyuJIT, the GT_RETFILT only loads up the return value (for filters) and does nothing for finally, and the block type processing after all the tree processing triggers the exit sequence to be generated. There is no real difference between these, except to centralize all "exit sequence" generation in the same place.

# EH Info, GC Info, and Hot & Cold Splitting

All GC info offsets and EH info offsets treat the function and funclets as if it was one big method body. Thus all offsets are relative to the start of the main method. Funclets are assumed to always be at the end of (after) all of the main function code. Thus if the main function has any cold code, all funclets must be cold. Or conversely, if there is any hot funclet code, all of the main method must be hot.
5 changes: 1 addition & 4 deletions src/coreclr/clr.featuredefines.props
Original file line number Diff line number Diff line change
@@ -2,6 +2,7 @@
<PropertyGroup>
<FeatureCoreCLR>true</FeatureCoreCLR>
<FeaturePerfTracing>true</FeaturePerfTracing>
<FeatureEHFunclets>true</FeatureEHFunclets>
<ProfilingSupportedBuild>true</ProfilingSupportedBuild>
</PropertyGroup>

@@ -22,10 +23,6 @@
<FeatureObjCMarshal>true</FeatureObjCMarshal>
</PropertyGroup>

<PropertyGroup Condition="!('$(TargetsWindows)' == 'true' AND '$(Platform)' == 'x86')">
<FeatureEHFunclets>true</FeatureEHFunclets>
</PropertyGroup>

<PropertyGroup Condition="('$(Platform)' == 'x64' OR '$(Platform)' == 'arm64') AND ('$(Configuration)' == 'debug' OR '$(Configuration)' == 'checked')">
<FeatureInterpreter>true</FeatureInterpreter>
</PropertyGroup>
4 changes: 1 addition & 3 deletions src/coreclr/clrdefinitions.cmake
Original file line number Diff line number Diff line change
@@ -206,9 +206,7 @@ if(CLR_CMAKE_TARGET_WIN32)
endif(CLR_CMAKE_TARGET_ARCH_AMD64 OR CLR_CMAKE_TARGET_ARCH_I386)
endif(CLR_CMAKE_TARGET_WIN32)

if (NOT CLR_CMAKE_TARGET_ARCH_I386 OR NOT CLR_CMAKE_TARGET_WIN32)
add_compile_definitions($<$<NOT:$<BOOL:$<TARGET_PROPERTY:IGNORE_DEFAULT_TARGET_ARCH>>>:FEATURE_EH_FUNCLETS>)
endif (NOT CLR_CMAKE_TARGET_ARCH_I386 OR NOT CLR_CMAKE_TARGET_WIN32)
add_compile_definitions($<$<NOT:$<BOOL:$<TARGET_PROPERTY:IGNORE_DEFAULT_TARGET_ARCH>>>:FEATURE_EH_FUNCLETS>)

if (CLR_CMAKE_TARGET_WIN32 AND (CLR_CMAKE_TARGET_ARCH_AMD64 OR CLR_CMAKE_TARGET_ARCH_ARM64))
add_definitions(-DFEATURE_SPECIAL_USER_MODE_APC)
5 changes: 5 additions & 0 deletions src/coreclr/inc/gc_unwind_x86.h
Original file line number Diff line number Diff line change
@@ -409,4 +409,9 @@ bool IsInNoGCRegion(hdrInfo * infoPtr,
PTR_CBYTE table,
unsigned curOffset);

unsigned FindFirstInterruptiblePoint(hdrInfo * infoPtr,
PTR_CBYTE table,
unsigned offs,
unsigned endOffs);

#endif // _UNWIND_X86_H
10 changes: 5 additions & 5 deletions src/coreclr/inc/jiteeversionguid.h
Original file line number Diff line number Diff line change
@@ -37,11 +37,11 @@

#include <minipal/guid.h>

constexpr GUID JITEEVersionIdentifier = { /* 7ce8764d-ac60-4e05-a6e4-448c1eb8cf35 */
0x7ce8764d,
0xac60,
0x4e05,
{0xa6, 0xe4, 0x44, 0x8c, 0x1e, 0xb8, 0xcf, 0x35}
constexpr GUID JITEEVersionIdentifier = { /* cb23fc5b-e31d-49cb-b4e0-b5666496b4fe */
0xcb23fc5b,
0xe31d,
0x49cb,
{0xb4, 0xe0, 0xb5, 0x66, 0x64, 0x96, 0xb4, 0xfe}
};

#endif // JIT_EE_VERSIONING_GUID_H
11 changes: 9 additions & 2 deletions src/coreclr/inc/readytorun.h
Original file line number Diff line number Diff line change
@@ -19,10 +19,15 @@
// src/coreclr/nativeaot/Runtime/inc/ModuleHeaders.h
// If you update this, ensure you run `git grep MINIMUM_READYTORUN_MAJOR_VERSION`
// and handle pending work.
#define READYTORUN_MAJOR_VERSION 13
#define READYTORUN_MINOR_VERSION 0x0001
#define READYTORUN_MAJOR_VERSION 14
#define READYTORUN_MINOR_VERSION 0x0000

// Remove the x86 special case once the general minimum version is bumped
#ifdef TARGET_X86
#define MINIMUM_READYTORUN_MAJOR_VERSION 14
#else
#define MINIMUM_READYTORUN_MAJOR_VERSION 13
#endif

// R2R Version 2.1 adds the InliningInfo section
// R2R Version 2.2 adds the ProfileDataInfo section
@@ -43,6 +48,8 @@
// R2R Version 13 removes usage of PSPSym, changes ABI for funclets to match NativeAOT, changes register for
// exception parameter on AMD64, and redefines generics instance context stack slot in GCInfo v4
// to be SP/FP relative
// R2R Version 13.1 added long/ulong to float helper calls
// R2R Version 14 changed x86 code generation to use funclets

struct READYTORUN_CORE_HEADER
{
6 changes: 3 additions & 3 deletions src/coreclr/jit/emitinl.h
Original file line number Diff line number Diff line change
@@ -585,15 +585,15 @@ inline bool insIsCMOV(instruction ins)
* false. Returns the final result of the callback.
*/
template <typename Callback>
bool emitter::emitGenNoGCLst(Callback& cb, bool skipAllPrologsAndEpilogs /* = false */)
bool emitter::emitGenNoGCLst(Callback& cb, bool skipMainPrologsAndEpilogs /* = false */)
{
for (insGroup* ig = emitIGlist; ig; ig = ig->igNext)
{
if (skipAllPrologsAndEpilogs)
if (skipMainPrologsAndEpilogs)
{
if (ig == emitPrologIG)
continue;
if (ig->igFlags & (IGF_EPILOG | IGF_FUNCLET_PROLOG | IGF_FUNCLET_EPILOG))
if (ig->igFlags & IGF_EPILOG)
continue;
}
if ((ig->igFlags & IGF_NOGCINTERRUPT) && ig->igSize > 0)
2 changes: 1 addition & 1 deletion src/coreclr/jit/emitpub.h
Original file line number Diff line number Diff line change
@@ -43,7 +43,7 @@ unsigned emitEndCodeGen(Compiler* comp,
unsigned emitGetEpilogCnt();

template <typename Callback>
bool emitGenNoGCLst(Callback& cb, bool skipAllPrologsAndEpilogs = false);
bool emitGenNoGCLst(Callback& cb, bool skipMainPrologsAndEpilogs = false);

void emitBegProlog();
unsigned emitGetPrologOffsetEstimate();
2 changes: 1 addition & 1 deletion src/coreclr/jit/gcencode.cpp
Original file line number Diff line number Diff line change
@@ -2223,7 +2223,7 @@ size_t GCInfo::gcMakeRegPtrTable(BYTE* dest, int mask, const InfoHdr& header, un
if (header.noGCRegionCnt != 0)
{
NoGCRegionEncoder encoder(mask != 0 ? dest : NULL);
compiler->GetEmitter()->emitGenNoGCLst(encoder, /* skipAllPrologsAndEpilogs = */ true);
compiler->GetEmitter()->emitGenNoGCLst(encoder, /* skipMainPrologsAndEpilogs = */ true);
totalSize += encoder.totalSize;
if (mask != 0)
dest += encoder.totalSize;
2 changes: 1 addition & 1 deletion src/coreclr/jit/gcinfo.cpp
Original file line number Diff line number Diff line change
@@ -585,7 +585,7 @@ void GCInfo::gcCountForHeader(UNALIGNED unsigned int* pUntrackedCount,
if (compiler->codeGen->GetInterruptible())
{
NoGCRegionCounter counter;
compiler->GetEmitter()->emitGenNoGCLst(counter, /* skipAllPrologsAndEpilogs = */ true);
compiler->GetEmitter()->emitGenNoGCLst(counter, /* skipMainPrologsAndEpilogs = */ true);
noGCRegionCount = counter.noGCRegionCount;
}

3 changes: 0 additions & 3 deletions src/coreclr/jit/targetx86.h
Original file line number Diff line number Diff line change
@@ -52,9 +52,6 @@
// target
#define FEATURE_EH 1 // To aid platform bring-up, eliminate exceptional EH clauses (catch, filter,
// filter-handler, fault) and directly execute 'finally' clauses.
#if !defined(UNIX_X86_ABI)
#define FEATURE_EH_WINDOWS_X86 1 // Enable support for SEH regions
#endif
#define ETW_EBP_FRAMED 1 // if 1 we cannot use EBP as a scratch register and must create EBP based
// frames for most methods
#define CSE_CONSTS 1 // Enable if we want to CSE constants
4 changes: 2 additions & 2 deletions src/coreclr/nativeaot/Runtime/inc/ModuleHeaders.h
Original file line number Diff line number Diff line change
@@ -11,8 +11,8 @@ struct ReadyToRunHeaderConstants
{
static const uint32_t Signature = 0x00525452; // 'RTR'

static const uint32_t CurrentMajorVersion = 13;
static const uint32_t CurrentMinorVersion = 1;
static const uint32_t CurrentMajorVersion = 14;
static const uint32_t CurrentMinorVersion = 0;
};

struct ReadyToRunHeader
4 changes: 2 additions & 2 deletions src/coreclr/tools/Common/Internal/Runtime/ModuleHeaders.cs
Original file line number Diff line number Diff line change
@@ -15,8 +15,8 @@ internal struct ReadyToRunHeaderConstants
{
public const uint Signature = 0x00525452; // 'RTR'

public const ushort CurrentMajorVersion = 13;
public const ushort CurrentMinorVersion = 1;
public const ushort CurrentMajorVersion = 14;
public const ushort CurrentMinorVersion = 0;
}
#if READYTORUN
#pragma warning disable 0169
20 changes: 0 additions & 20 deletions src/coreclr/vm/eetwain.cpp
Original file line number Diff line number Diff line change
@@ -1290,16 +1290,6 @@ bool EECodeManager::EnumGcRefs( PREGDISPLAY pRD,
if (relOffsetOverride != NO_OVERRIDE_OFFSET)
{
// We've been given an override offset for GC Info
#ifdef _DEBUG
GcInfoDecoder _gcInfoDecoder(
gcInfoToken,
DECODE_CODE_LENGTH
);

// We only use override offset for wantsReportOnlyLeaf
_ASSERTE(_gcInfoDecoder.WantsReportOnlyLeaf());
#endif // _DEBUG

curOffs = relOffsetOverride;

#ifdef TARGET_ARM
@@ -2505,16 +2495,6 @@ bool InterpreterCodeManager::EnumGcRefs(PREGDISPLAY pContext,
if (relOffsetOverride != NO_OVERRIDE_OFFSET)
{
// We've been given an override offset for GC Info
#ifdef _DEBUG
InterpreterGcInfoDecoder _gcInfoDecoder(
gcInfoToken,
DECODE_CODE_LENGTH
);

// We only use override offset for wantsReportOnlyLeaf
_ASSERTE(_gcInfoDecoder.WantsReportOnlyLeaf());
#endif // _DEBUG

curOffs = relOffsetOverride;

LOG((LF_GCINFO, LL_INFO1000, "Adjusted GC reporting offset to provided override offset. Now reporting GC refs for %s at offset %04x.\n",
Loading
Oops, something went wrong.
Loading
Oops, something went wrong.