Skip to content

Remove canceled AsyncOperations from channel queues #116021

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 2, 2025

Conversation

stephentoub
Copy link
Member

Fixes #761

Canceled {WaitTo}Read/WriteAsync operations currently become nops but still sit in their queue until they're dequeued by a corresponding write or read that's trying to fulfill one of the waiters. This was done intentionally, as it left the code simpler and it was deemed unlikely to be an issue to have these objects sitting around for a bit. But it has since proven disruptive for some applications, which experience a temporary but meaningful "leak". To address that, this PR switches the queues from being array-based to being doubly-linked lists, with cancellation not only marking the operation as canceled but also removing it from the corresponding list.

Doing so required adding a few fields to the operation instance, so compensating changes were also made to try to reduce the size of the instance. This includes propagating a change that had been made to ManualResetValueTaskSourceCore to also apply here: rather than having a field for ExecutionContext and a separate field for a SynchronizationContext/TaskScheduler, use just a single object field and allocate a tuple when both are needed. Needing to store an ExecutionContext is rare, so this is almost always a win. I also utilize some more .NET Core APIs, in particular an overload of CancellationToken.Register that passes in the relevant cancellation token in order to avoid needing to store that same token on the instance. The net result is, at least on .NET Core, with this change the AsyncOperation instance actually gets a bit smaller than it was previously.

As new null checks were being added, I also switched the whole project over to using is / is not rather than == / !=.

As part of this change, to help flow a bit of state about the purpose of each instance, I refactored the AsyncOperation type in order to have concrete types per scenario (e.g. ReadAsync vs WriteAsync vs WaitToReadAsync vs WaitToWriteAsync) and also pushed a bunch of members from the generic down to a non-generic base, which should help a bit further with throughput but also Native AOT size.

All in all, this should address the memory issue while also cleaning up the code a bit and having a small positive impact on overall performance. Running our microbenchmarks locally shows things to be net neutral or slightly beneficial.

Example (noting that this never tries to write anything, which would clear out all of the referenced instances):

using System.Threading.Channels;

Channel<int> c = Channel.CreateUnbounded<int>();

for (int i = 0; ; i++)
{
    CancellationTokenSource cts = new();
    var vt = c.Reader.ReadAsync(cts.Token);
    cts.Cancel();
    await ((Task)vt.AsTask()).ConfigureAwait(ConfigureAwaitOptions.SuppressThrowing);

    if (i % 100_000 == 0)
    {
        Console.WriteLine($"Working set: {Environment.WorkingSet:N0}b");
    }
}

Before:

Working set: 22,728,704b
Working set: 111,177,728b
Working set: 182,833,152b
Working set: 261,984,256b
Working set: 328,019,968b
Working set: 399,495,168b
Working set: 469,082,112b
Working set: 547,909,632b
Working set: 615,636,992b
Working set: 682,627,072b
Working set: 756,797,440b
Working set: 832,098,304b
Working set: 902,123,520b
Working set: 979,820,544b
Working set: 1,046,126,592b
Working set: 1,113,554,944b
Working set: 1,180,667,904b
Working set: 1,253,609,472b
Working set: 1,319,989,248b
Working set: 1,386,434,560b

After:

Working set: 23,281,664b
Working set: 36,900,864b
Working set: 37,466,112b
Working set: 37,691,392b
Working set: 37,699,584b
Working set: 37,707,776b
Working set: 37,707,776b
Working set: 37,707,776b
Working set: 37,728,256b
Working set: 37,744,640b
Working set: 37,761,024b
Working set: 37,769,216b
Working set: 37,769,216b
Working set: 37,769,216b
Working set: 37,769,216b
Working set: 37,781,504b
Working set: 37,785,600b
Working set: 37,797,888b
Working set: 37,814,272b
Working set: 37,834,752b

@stephentoub stephentoub added this to the 10.0.0 milestone May 27, 2025
@Copilot Copilot AI review requested due to automatic review settings May 27, 2025 16:39
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@stephentoub stephentoub requested a review from Copilot May 27, 2025 16:45
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a memory leak by removing canceled AsyncOperations from channel queues and refactors various channel types to use a linked list–based approach rather than an array-based deque. Key changes include replacing the deque fields with linked-list heads for blocked and waiting operations, propagating cancellation callbacks via a new helper pattern, and updating invariant assertions (e.g. using “is not null”) consistently across the channel implementations.

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
UnboundedPriorityChannel.cs Replaces deque-based blocked reader/waiter storage with linked-list head fields and updates invariant checks.
UnboundedChannel.cs Similar refactoring as UnboundedPrioritizedChannel, updating blocked reader/waiter handling and null checks.
SingleConsumerUnboundedChannel.cs Migrates AsyncOperation types to concrete blockage types and adjusts cancellation callbacks.
BoundedChannel.cs Adopts the linked-list pattern for blocked operations and waiters with corresponding invariant and completion changes.
ChannelUtilities.* Introduces helper methods (TryDequeue, Enqueue, Remove, etc.) for managing the newly used circular doubly-linked lists.
AsyncOperation.netstandard.cs / netcoreapp.cs Updates the AsyncOperation implementation to align with the new cancellation and work item queuing patterns.
SingleProducerSingleConsumerQueue.cs Updates Debug.Assert checks to the new “is not null” pattern.
Comments suppressed due to low confidence (2)

src/libraries/System.Threading.Channels/src/System/Threading/Channels/UnboundedPriorityChannel.cs:332

  • Ensure that the backing field used in the lazy initialization of the CancellationCallbackDelegate is clearly defined and consistently used across similar channel types to avoid potential misinitialization in future refactorings.
private Action<object?, CancellationToken> CancellationCallbackDelegate => field ??= (state, cancellationToken) => { ... }

src/libraries/System.Threading.Channels/src/System/Threading/Channels/BoundedChannel.cs:692

  • Review the locking strategy in the cancellation callback to ensure that all modifications to the blocked/waiting linked lists are safely synchronized without introducing deadlocks; the current approach appears consistent but should be validated under concurrent load.
ChannelUtilities.UnsafeQueueUserWorkItem(static state => { lock (state.Key.SyncObj) { switch (state.Value) { ... } } }, new KeyValuePair<BoundedChannel<T>, AsyncOperation>(this, op));

@PranavSenthilnathan
Copy link
Member

LGTM apart from the last comment

Canceled {WaitTo}Read/WriteAsync operations currently become nops but still sit in their queue until they're dequeued by a corresponding write or read that's trying to fulfill one of the waiters. This was done intentionally, as it left the code simpler and it was deemed unlikely to be an issue to have these objects sitting around for a bit. But it has since proven disruptive for some applications, which experience a temporary but meaningful "leak". To address that, this PR switches the queues from being array-based to being doubly-linked lists, with cancellation not only marking the operation as canceled but also removing it from the corresponding list.

Doing so required adding a few fields to the operation instance, so compensating changes were also made to try to reduce the size of the instance. This includes propagating a change that had been made to ManualResetValueTaskSourceCore to also apply here: rather than having a field for ExecutionContext and a separate field for a SynchronizationContext/TaskScheduler, use just a single object field and allocate a tuple when both are needed. Needing to store an ExecutionContext is rare, so this is almost always a win. I also utilize some more .NET Core APIs, in particular an overload of CancellationToken.Register that passes in the relevant cancellation token in order to avoid needing to store that same token on the instance. The net result is, at least on .NET Core, with this change the AsyncOperation instance actually gets a bit smaller than it was previously.

As new null checks were being added, I also switched the whole project over to using is / is not rather than == / !=.

As part of this change, to help flow a bit of state about the purpose of each instance, I refactored the AsyncOperation type in order to have concrete types per scenario (e.g. ReadAsync vs WriteAsync vs WaitToReadAsync vs WaitToWriteAsync) and also pushed a bunch of members from the generic down to a non-generic base, which should help a bit further with throughput but also Native AOT size.

All in all, this should address the memory issue while also cleaning up the code a bit and having a small positive impact on overall performance. Running our microbenchmarks locally shows things to be net neutral or slightly beneficial.
@stephentoub stephentoub force-pushed the cleanupcancellation branch from f1febfe to 2416856 Compare June 2, 2025 13:07
@stephentoub
Copy link
Member Author

LGTM apart from the last comment

Can you please approve it?

@stephentoub stephentoub disabled auto-merge June 2, 2025 15:48
@stephentoub
Copy link
Member Author

/ba-g unrelated infra crash

@stephentoub stephentoub merged commit 2d5a2ee into dotnet:main Jun 2, 2025
84 of 86 checks passed
@stephentoub stephentoub deleted the cleanupcancellation branch June 2, 2025 15:59
@willdean
Copy link
Contributor

willdean commented Jun 2, 2025

... fixes a memory leak ...

We always knew we'd persuade you eventually!

@stephentoub
Copy link
Member Author

... fixes a memory leak ...

We always knew we'd persuade you eventually!

Copilot's words, not mine ;-)

My words:
"But it has since proven disruptive for some applications, which experience a temporary but meaningful "leak"."

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possible CancellationTokenRegistration leak in AsyncOperation<T>
4 participants