Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QuicConnection.ConnectAsync() connection failure causes memory leak. #113030

Open
xlievo opened this issue Mar 1, 2025 · 7 comments
Open

QuicConnection.ConnectAsync() connection failure causes memory leak. #113030

xlievo opened this issue Mar 1, 2025 · 7 comments
Labels
area-System.Net.Quic untriaged New issue has not been triaged by the area owner

Comments

@xlievo
Copy link

xlievo commented Mar 1, 2025

Description

In an automatic reconnection use case, the following code is executed in a timer. If the connection fails, it may cause a memory leak, possibly due to underlying handles not being released.

try
{
    var quicCon = await QuicConnection.ConnectAsync(QuicOptions, cancel.Token);
}
catch
{
    //It will attempt to reconnect again.
}

Reproduction Steps

internal class Program
{
    static async Task Main()
    {
        await Task.Run(async () =>
        {
            if (!QuicConnection.IsSupported)
            {
                throw new PlatformNotSupportedException("QUIC protocol is not supported on this platform.");
            }

            CancellationTokenSource cancel = new();
            using var pingTimer = new PeriodicTimer(TimeSpan.FromSeconds(1));

            while (await pingTimer.WaitForNextTickAsync(cancel.Token))
            {
                try
                {
                    await using var quicCon = await QuicConnection.ConnectAsync(new()
                    {
                        RemoteEndPoint = new IPEndPoint(IPAddress.Loopback, 9999),
                        DefaultCloseErrorCode = 0x0A,
                        DefaultStreamErrorCode = 0x0B,
                        ClientAuthenticationOptions = new SslClientAuthenticationOptions
                        {
                            ApplicationProtocols = [SslApplicationProtocol.Http3],
                            RemoteCertificateValidationCallback = (sender, certificate, chain, errors) => true,
                            TargetHost = "localhost"
                        },
                    }, cancel.Token);

                    break;
                }
                catch
                {
                    //It will attempt to reconnect again.
                }
            }
        });

        Console.WriteLine("exit;");
        Console.ReadLine();
    }
}

Expected behavior

Normal memory.

Actual behavior

The memory will continue to grow without any signs of being reclaimed.

Regression?

No response

Known Workarounds

No response

Configuration

dotnet 9.0 win-x64 AOT publish

Other information

No response

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Mar 1, 2025
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Mar 1, 2025
@huoyaoyuan huoyaoyuan added area-System.Net.Quic and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Mar 1, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

@xlievo
Copy link
Author

xlievo commented Mar 1, 2025

And QuicConnection.ConnectAsync() cannot be used on Android either. #111019. Although these are two separate issues, I am only concerned about the progress and in which future version they might be fixed...

@rzikm rzikm self-assigned this Mar 3, 2025
@rzikm
Copy link
Member

rzikm commented Mar 3, 2025

VS shows this

Image

At the same time, looking at the QUIC_PERF_COUNTER_CONN_ACTIVE perf counter, it seems like we are closing the connections properly on our end, so this seems like a bug in MsQuic itself. cc: @nibanks

Can you open an issue at microsoft/msquic repo?

@rzikm rzikm removed their assignment Mar 3, 2025
@nibanks
Copy link

nibanks commented Mar 3, 2025

You'll notice that this function allocates memory from a pool (i.e., lookaside list):

DATAPATH_RX_IO_BLOCK*
CxPlatSocketAllocRxIoBlock(
    _In_ CXPLAT_SOCKET_PROC* SocketProc
    )
{
    CXPLAT_DATAPATH_PARTITION* DatapathProc = SocketProc->DatapathProc;
    DATAPATH_RX_IO_BLOCK* IoBlock;
    CXPLAT_POOL* OwningPool;

    if (SocketProc->Parent->UseRio) {
        OwningPool = &DatapathProc->RioRecvPool;
    } else {
        OwningPool = &DatapathProc->RecvDatagramPool.Base;
    }

    IoBlock = CxPlatPoolAlloc(OwningPool);

    if (IoBlock != NULL) {
        IoBlock->Route.State = RouteResolved;
        IoBlock->OwningPool = OwningPool;
        IoBlock->ReferenceCount = 0;
        IoBlock->SocketProc = SocketProc;
    }

    return IoBlock;
}

You'll be getting this from a global, per-processor pool DatapathProc->RecvDatagramPool.Base. This means the memory will purposefully be held on to, to optimize future datapath allocations.

In otherwords, this is by design.

@nibanks
Copy link

nibanks commented Mar 3, 2025

I will also say, that in more recent versions of msquic.dll, we've added support for CxPlatAddDynamicPoolAllocator which adds logic to periodically prune these pools, so eventually (8 allocs per second are pruned) they will drain if not being used. I think this logic is only in main right now.

@rzikm
Copy link
Member

rzikm commented Mar 3, 2025

In otherwords, this is by design.

I get that there is a pool, but the private bytes of the process grew from some 30 MB to close to 1 GB, which seems a bit excessive to me.

Is there an upper limit on the pool? the two heap snapshots I took show difference of 8000+ live instances allocated from that location.

@rzikm
Copy link
Member

rzikm commented Mar 3, 2025

What I mean to say, is that even with a pool, the app would reach a steady-state where the private bytes do not increase anymore (modulo noise), but the memory on the repro application keeps on growing. That makes me suspect that maybe the instances are not returned to the pool correctly, or there is some other bug at play.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Net.Quic untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

4 participants