Skip to content

Add CoreCLR support for android GC bridge #116310

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Jul 2, 2025

Conversation

BrzVlad
Copy link
Member

@BrzVlad BrzVlad commented Jun 4, 2025

This change adds runtime support for the GCBridge api described in #115506 and to be used on android. It includes most of initial work from #114184.

When the GCBridge feature is used, at the start of the application JavaMarshal.Initialize is called. This will provide to the runtime a native callback (markCrossReferences) to be called during the GC when the collection takes place. During GC, we compute the set of strongly connected components containing bridge objects that are dead in the .NET space. These SCCs are passed to the callback so the .NET android implementation would reproduce the links between the java counterparts in order to determine whether the .NET object needs to be collected or not (The constraint is that the C# peer keeps the Java Peer alive and vice verse. We make no effort to handle finalization, so a resurrected C# object can have the Java Peer collected). Once the .NET Android runtime does the java collection it will report back to the runtime with the list of bridge objects that can be freed and with the previously passed SCC related pointers to be freed.

A bridge object is an object that can have a JavaPeer. The CoreCLR runtime has no insight into this, the only thing it understands are cross reference handles. These are GCHandles that have an additional pointer associated with them, so additional information related to the java peer can be attached. Objects that have a cross reference handle allocated, will always survive the current GC collection, because we can't collect them until we get permission from the Java world. Once the cross reference gchandle is freed, the associated object becomes ordinary, detached from any java peer, and it is free for collection in the .NET heap.

At the end of mark phase, during GC, we iterate over all cross reference handles. When we encounter a handle with target that hasn't yet been marked, we add it to a list (these objects will have to be marked so they remain alive after this collection, given we need to probe the java world first). Once we obtained the set of dead bridge objects, we apply the tarjan algorithm (this algorithm is ported directly from mono's implementation). This algorithm will operate on the dead object graph, reachable from the initial set of dead bridge objects. In order to implement this secondary scanning mechanism, for objects that we reach, we hijack the object header with a ScanData that contains all information relevant to the SCC building algorithm. Once we finished building the SCCs, while still in the GC, we callback into the .NET Android via TriggerClientBridgeProcessing that will end up calling the mark cross reference callback provided by JavaMarshal.Initialize. This callback will have to dispatch the neceesary work for another thread to run, since it needs to return quickly, for the C# GC to continue its execution.

Because the world gets resumed without having decided yet whether the bridge objects will be alive or dead, for weak references, we would need to wait for the java bridge processing to finish before we can resolve the Target. Aside from the general problem of resurrecting a C# peer that has the Java Peer collected, this mechanism will be used internally by the .NET Android in order to manually manage liveness of these bridge objects, in the scenario of calling Dispose on an object. This synchronisation will be used at the core of .NET Android Runtime interop. In order to implement this, weak refs for bridge objects are not nulled during GC (these objects are promoted during collection) but rather at the finishing stage of bridge processing. This change is conservative and adds bridge waiting only for WeakReference, not when using GCHandle, following the existing approach in COM.

This PR adds a few tests in the runtime tests. The tests have a native counterpart that acts as the client bridge, not doing anything, just doing random sleep instead of doing the Java collection. The test creates a set of objects with certain links between then, creates weak refs to the BridgeObjects and then doesn't reference anything else. Depending on the built graph, it expects a certain number of SCCs and cross refs constructed by the tarjan algorithm, and then reports all bridge objects as alive or dead. The test will also check to see if the Target for all the weak refs is the expected one.

The gcbridge doesn't consume much memory. A collection for a heavier app can end with hundreds of SCCs and xrefs. For such a scenario, the gcbridge is expected to consume hundreds of KBs. Most of this memory is represented by data for ScanData, ColorData and stacks used by tarjan algorithm. These data structures have their capacity increased when necessary, so for most collections there is no new memory allocation, the existing storage is reused. For a few other data structures, like xrefs arrays and data allocated to be passed to the bridge client, new allocations from scratch happen at each collection. While the gc bridge can end up consuming hundreds of KB for heavy scenarios, maybe a few MB in extreme theoretical cases, less than 10% of this memory is expected to be allocated during collection, the rest should be reused.

@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jun 4, 2025
@BrzVlad BrzVlad added area-Interop-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jun 4, 2025
BrzVlad added 13 commits June 12, 2025 09:54
From Aaron's implementation
Checking if the object is promoted was validating the next object header in debug builds. During bridge tarjan computation, we patch the object header for some objects in order to store data used by the bridge algorithm, so we need to disable this validation.
HANDLE_MAX_INTERNAL_TYPES value

new instead of malloc

assert for allocation failure

Reuse memory for ColorData and ScanData between collections. We still do alloc/free for other type of data, for example for arrays representing edges between SCCs.

Actually print class name when enabling tarjan bridge logs.

Add separate IsPromoted method to the gc interface

Rename TriggerGCBridge to TriggerClientBridgeProcessing to be more specific about what it is doing.
@jkotas
Copy link
Member

jkotas commented Jun 26, 2025

/azp run coreclr-release-outerloop-nightly

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@Maoni0
Copy link
Member

Maoni0 commented Jun 27, 2025

sorry, I didn't get a chance to look at the commits related to my feedback till now. the only comment I have is please move the GetHighPrecisionTimeStamp impl to gccommon.cpp so all the code in the gc dir can share it instead of multiple files defining their own duplicated copies. see log_init_error_to_host as an example.

@vitek-karas
Copy link
Member

@Maoni0 @jkotas - is this ready? Could it be approved and merged?

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be best for @Maoni0 and @AaronRobinsonMSFT to sign-off on this one, but they are oof currently. If there is any additional feedback, it can be incorporated once they are back.

@mangod9
Copy link
Member

mangod9 commented Jun 30, 2025

/azp run runtime-coreclr outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mangod9
Copy link
Member

mangod9 commented Jun 30, 2025

I have just triggered an outerloop run given the surface area of this change. if that is looking good we can certainly merge and do subsequent changes subsequently if needed

@mangod9
Copy link
Member

mangod9 commented Jul 1, 2025

outerloop is good. Assume browser-wasm are known issues?

@BrzVlad
Copy link
Member Author

BrzVlad commented Jul 1, 2025

Yes, failures are unrelated

@BrzVlad BrzVlad merged commit a90680f into dotnet:main Jul 2, 2025
155 of 169 checks passed
jonathanpeppers added a commit to dotnet/android that referenced this pull request Jul 16, 2025
Replaces: #10185
Implements: dotnet/runtime#115506
Builds on top of: dotnet/runtime#116310

### Description

This PR implements GC bridge for CoreCLR using the `JavaMarshal` APIs
introduced in dotnet/runtime#116310. The code
in this PR is CoreCLR specific and while it will build with other
runtimes on .NET 10, the `ManagedValueManager` class will throw on any
other runtime other than CoreCLR. In the future, the same GC bridge
mechanism should be also supported by Native AOT, so this code might
be reused for that platform as well at some point.

The code of the GC bridge is placed in 3 main locations:

#### `ManagedValueManager.cs`

Code in this class interfaces with the `JavaMarshal` APIs.

This class keeps a dictionary of mapping between .NET and Java objects
`RegisteredInstances`.

The class carefuly manages the lifetimes of the bridge objects and
their associated native memory ("GC bridge context" -
`HandleContext`). The implementation follows these rules:

* Do not access the `Target` of the reference tracking `GCHandles`.
  Doing this could cause a race condition with the GC collecting
  handles in background thread. Instead, always access the peers via
  `WeakReference<IJavaPeerable>.Target` which blocks if there is an
  ongoing bridge processing.
* Do not modify the `RegisteredInstances` during _bridge processing_.
  Doing this would require taking a lock on `RegisteredInstances`
  which might already be locked in some other method of
  `ManagedValueManager` called from another thread (for example the
  `AddPeer` method) which might be blocked waiting on
  `WeakReference<IJavaPeerable>.Target` to return. For this reason, we
  have a queue of known dead weak references stored in the
  `RegisteredInstances` method which we fill at the end of bridge
  processing. This queue needs to be periodically emptied before calls
  to `AddPeer` and others to make sure that weak references stored in
  `RegisteredInstances` aren't leaking.
* Do not trust the context pointers coming from the GC to be
  `HandleContext*`. Anyone can call the
  `JavaMarshal.CreateReferenceTrackingHandle(...)` method and "poison"
  the contexts that will be passed to us by the GC with pointers to
  memory we don't own and can't guarantee the size of the memory or
  even the fact that the memory won't be freed before the GC passes
  the pointer to our bridge processing callback. We keep a static
  dictionary of all the contexts and their associated GCHandles in
  `HandleContext` to validate the pointers we receive and also to map
  the contexts to their corresponding handles before calling
  `JavaMarshal.FinishBridgeProcessing`.

#### `gc-bridge.hh+cc`

This static class contains the main callback for the GC bridge
(`GCBridge::mark_cross_references`). The GC expect this method to
return immediately and do all the bridge processing in a separate
thread. The input to this method is a pointer
(`MarkCrossReferencesArgs *args`) which needs to be later passed to
`JavaMarshal.FinishBridgeProcessing(...)` in order to be freed.

This class also contains a background thread which waits for the next
bridge processing event using a `std::binary_semaphore`. Once the
thread is signaled there is a new bridge procesing event, it uses an
`BridgeProcessing` object to process it.

#### `bridge-processing.hh+cc`

The `BridgeProcessing` class implements processing of a single bridge
processing event. The code in this class is based on the Mono bridge
processing algorithm implemented in
[`src/native/mono/monodroid/osbridge.cc`][0].

[0]: https://github.com/dotnet/android/blob/8c0ca82e407761fac7ea959fa4d4819fa6e4eeac/src/native/mono/monodroid/osbridge.cc

Co-authored-by: Jonathan Peppers <jonathan.peppers@microsoft.com>
Co-authored-by: Marek Habersack <grendel@twistedcode.net>
Maoni0 added a commit that referenced this pull request Jul 22, 2025
#116310 breaks the backward compatibility for the standalone GC - it can no longer work with previous versions of the runtime on both windows and linux. this is easily reproducible by just loading a standalone GC dll built from main with say 8.0 runtime because it will fail as soon as it hits FinalizeLoad when coreclr is trying to load the standalone dll.

both msvc and clang put the 2 methods, both named IsPromoted next to each other instead of maintaining the order as declared. renaming the new one to IsPromoted2 worked. but I also just made IsPromoted2 on IGCHeapInternal instead since it's only used by the GC side.
@github-actions github-actions bot locked and limited conversation to collaborators Aug 2, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants