Skip to content

_kubernetesService.CleanupResourcesAsync Taking surprisingly long time in tests. #8424

Closed
@afscrome

Description

@afscrome

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

Whilst working on #8423, the two tests I added are taking a surprisingly long time. In these tests I started up the shortest possible container / executable (i.e. run echo SOMETHING then exist), waited for them to exit and asserted on the logs output. I didn't expect these to take 10-15 seconds.
Image

After some more digging, these tests are spending large amounts of time in _kubernetesService.StopServerAsync. In the case of the container resource, the method times out. For the executable, the clean up does complete, but seems to get caught up on could not remove process's standard error file and could not remove process's standard output file.

Expected Behavior

Since the resources have all fully exited by the time CleanupResourcesAsync is called, I wouldn't expect too much clean-up to be required - particularly in the case of the executable.

I'd also expect some kind of error to be logged if clean-up failed - particularly in the context of automated tests as a failure to clean-up resources in one test, could lead to a later test to fail due to the lack of clean-up.

Steps To Reproduce

Run either of the following two tests. (They're essentially the same, except one works with containers, and the other with executables)

    [Fact]
    public async Task FinalExecutableLogsAreCaptured()
    {
        var builder = DistributedApplicationTestingBuilder.Create();
        builder.Services.AddLogging(logging => logging
               .SetMinimumLevel(LogLevel.Debug)
               .AddFakeLogging()
               .AddXunit(output));

        var executable = builder.AddExecutable("shortlived", "pwsh", ".")
           .WithArgs("-Command", "Write-Host \"Where We're Going, We Don't Need Roads\"");

        FakeLogCollector? logCollector = null;
        await using (var app = await builder.BuildAsync())
        {
            logCollector = app.Services.GetFakeLogCollector();

            await app.StartAsync();
            await app.WaitForHealthyAsync(executable)
                .WaitAsync(TimeSpan.FromSeconds(10));
        }

        var resourceLogs = logCollector.GetSnapshot().Where(x => x.Category == $"{builder.Environment.ApplicationName}.Resources.{executable.Resource.Name}");
        var finalLog = resourceLogs.Last();

        Assert.EndsWith("Where We're Going, We Don't Need Roads", finalLog.Message);
    }

    [Fact]
    [RequiresDocker]
    public async Task FinalContainerLogsAreCaptured()
    {
        var builder = DistributedApplicationTestingBuilder.Create();
        builder.Services.AddLogging(logging => logging
               .SetMinimumLevel(LogLevel.Debug)
               .AddFakeLogging()
               .AddXunit(output));

        var container = builder.AddContainer("shortlived", "mcr.microsoft.com/dotnet/runtime")
           .WithEntrypoint("sh")
           .WithArgs("-c", "echo \"So Long, Partner\"");

        FakeLogCollector? logCollector = null;
        await using (var app = await builder.BuildAsync())
        {
            logCollector = app.Services.GetFakeLogCollector();

            await app.StartAsync();
            await app.ResourceNotifications.WaitForResourceAsync(container.Resource.Name, KnownResourceStates.Exited)
                .WaitAsync(TimeSpan.FromSeconds(10));
        }

        var resourceLogs = logCollector.GetSnapshot().Where(x => x.Category == $"{builder.Environment.ApplicationName}.Resources.{container.Resource.Name}");
        var finalLog = resourceLogs.Last();

        Assert.EndsWith("So Long, Partner", finalLog.Message);
    }

Exceptions (if any)

Executable

...
| [2025-03-30T21:59:04] Aspire.Hosting.Dcp.DcpExecutor Debug: Starting _kubernetesService.CleanupResourcesAsync
| [2025-03-30T21:59:04] Aspire.Hosting.Dcp.dcp.start-apiserver.api-server.adminHttpHandler Information: API server changed status       {"OldStatus": "Running", "NewStatus": "CleaningResources"}
| [2025-03-30T21:59:07] Aspire.Hosting.Dcp.dcpctrl.ExecutableReconciler Error: could not remove process's standard output file  {"Executable": {"name":"shortlived-tupmvdws"}, "Reconciliation": 8, "path": "C:\\Users\\alex\\AppData\\Local\\Temp\\aspire.mln2bwdd.mur\\shortlived-tupmvdws_out_4cf6218c-1ad2-4212-bdb5-e056b024b623", "error": "remove C:\\Users\\alex\\AppData\\Local\\Temp\\aspire.mln2bwdd.mur\\shortlived-tupmvdws_out_4cf6218c-1ad2-4212-bdb5-e056b024b623: The process cannot access the file because it is being used by another process.\ncontext deadline exceeded"}
| [2025-03-30T21:59:10] Aspire.Hosting.Dcp.dcpctrl.ExecutableReconciler Error: could not remove process's standard error file   {"Executable": {"name":"shortlived-tupmvdws"}, "Reconciliation": 8, "path": "C:\\Users\\alex\\AppData\\Local\\Temp\\aspire.mln2bwdd.mur\\shortlived-tupmvdws_err_4cf6218c-1ad2-4212-bdb5-e056b024b623", "error": "remove C:\\Users\\alex\\AppData\\Local\\Temp\\aspire.mln2bwdd.mur\\shortlived-tupmvdws_err_4cf6218c-1ad2-4212-bdb5-e056b024b623: The process cannot access the file because it is being used by another process.\ncontext deadline exceeded"}
| [2025-03-30T21:59:11] Aspire.Hosting.Dcp.DcpExecutor Debug: Finished _kubernetesService.CleanupResourcesAsync

Cotnainer

...
| [2025-03-30T21:59:16] Aspire.Hosting.Dcp.DcpExecutor Debug: Starting _kubernetesService.CleanupResourcesAsync
| [2025-03-30T21:59:16] Aspire.Hosting.Dcp.dcp.start-apiserver.api-server.adminHttpHandler Information: API server changed status       {"OldStatus": "Running", "NewStatus": "CleaningResources"}
| [2025-03-30T21:59:16] Aspire.Hosting.Dcp.dcpctrl.ContainerReconciler Information: no more Container resources are being watched, cancelling container watch       {"Container": {"name":"shortlived-dhhcjwvq"}, "Reconciliation": 11}
| [2025-03-30T21:59:17] Aspire.Hosting.Dcp.dcpproc.monitor Information: monitor process exited, shutting down   {"pid": 16352}
| [2025-03-30T21:59:26] Aspire.Hosting.Dcp.DcpExecutor Debug: Timed out stopping DCP Executor
| System.Threading.Tasks.TaskCanceledException: A task was canceled.
|    at Polly.Retry.RetryResilienceStrategy`1.ExecuteCore[TState](Func`3 callback, ResilienceContext context, TState state)
|    at Polly.Outcome`1.GetResultOrRethrow()
|    at Polly.ResiliencePipeline.ExecuteAsync[TResult](Func`2 callback, CancellationToken cancellationToken)
|    at Aspire.Hosting.Dcp.KubernetesService.CleanupResourcesAsync(CancellationToken cancellationToken) in S:\aspire\src\Aspire.Hosting\Dcp\KubernetesService.cs:line 384
|    at Aspire.Hosting.Dcp.DcpExecutor.StopAsync(CancellationToken cancellationToken) in S:\aspire\src\Aspire.Hosting\Dcp\DcpExecutor.cs:line 180

.NET Version info

No response

Anything else?

Note, for the above logs, I did add some extra logging to DcpExecutor to clearly see the errors & prove the method that is taking the time.

        try
        {
            if (_options.Value.WaitForResourceCleanup)
            {
+                _logger.LogDebug("Starting _kubernetesService.CleanupResourcesAsync");
                await _kubernetesService.CleanupResourcesAsync(cancellationToken).ConfigureAwait(false);
+                _logger.LogDebug("Finished _kubernetesService.CleanupResourcesAsync");
            }

            // ...
        }
        catch (OperationCanceledException ex)
        {
+            _logger.LogDebug(ex, "Timed out stopping DCP Executor");
-            // Ignore.
        }

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions