Description
Is there an existing issue for this?
- I have searched the existing issues
Describe the bug
Whilst working on #8423, the two tests I added are taking a surprisingly long time. In these tests I started up the shortest possible container / executable (i.e. run echo SOMETHING
then exist), waited for them to exit and asserted on the logs output. I didn't expect these to take 10-15 seconds.
After some more digging, these tests are spending large amounts of time in _kubernetesService.StopServerAsync
. In the case of the container resource, the method times out. For the executable, the clean up does complete, but seems to get caught up on could not remove process's standard error file
and could not remove process's standard output file
.
Expected Behavior
Since the resources have all fully exited by the time CleanupResourcesAsync
is called, I wouldn't expect too much clean-up to be required - particularly in the case of the executable.
I'd also expect some kind of error to be logged if clean-up failed - particularly in the context of automated tests as a failure to clean-up resources in one test, could lead to a later test to fail due to the lack of clean-up.
Steps To Reproduce
Run either of the following two tests. (They're essentially the same, except one works with containers, and the other with executables)
[Fact]
public async Task FinalExecutableLogsAreCaptured()
{
var builder = DistributedApplicationTestingBuilder.Create();
builder.Services.AddLogging(logging => logging
.SetMinimumLevel(LogLevel.Debug)
.AddFakeLogging()
.AddXunit(output));
var executable = builder.AddExecutable("shortlived", "pwsh", ".")
.WithArgs("-Command", "Write-Host \"Where We're Going, We Don't Need Roads\"");
FakeLogCollector? logCollector = null;
await using (var app = await builder.BuildAsync())
{
logCollector = app.Services.GetFakeLogCollector();
await app.StartAsync();
await app.WaitForHealthyAsync(executable)
.WaitAsync(TimeSpan.FromSeconds(10));
}
var resourceLogs = logCollector.GetSnapshot().Where(x => x.Category == $"{builder.Environment.ApplicationName}.Resources.{executable.Resource.Name}");
var finalLog = resourceLogs.Last();
Assert.EndsWith("Where We're Going, We Don't Need Roads", finalLog.Message);
}
[Fact]
[RequiresDocker]
public async Task FinalContainerLogsAreCaptured()
{
var builder = DistributedApplicationTestingBuilder.Create();
builder.Services.AddLogging(logging => logging
.SetMinimumLevel(LogLevel.Debug)
.AddFakeLogging()
.AddXunit(output));
var container = builder.AddContainer("shortlived", "mcr.microsoft.com/dotnet/runtime")
.WithEntrypoint("sh")
.WithArgs("-c", "echo \"So Long, Partner\"");
FakeLogCollector? logCollector = null;
await using (var app = await builder.BuildAsync())
{
logCollector = app.Services.GetFakeLogCollector();
await app.StartAsync();
await app.ResourceNotifications.WaitForResourceAsync(container.Resource.Name, KnownResourceStates.Exited)
.WaitAsync(TimeSpan.FromSeconds(10));
}
var resourceLogs = logCollector.GetSnapshot().Where(x => x.Category == $"{builder.Environment.ApplicationName}.Resources.{container.Resource.Name}");
var finalLog = resourceLogs.Last();
Assert.EndsWith("So Long, Partner", finalLog.Message);
}
Exceptions (if any)
Executable
...
| [2025-03-30T21:59:04] Aspire.Hosting.Dcp.DcpExecutor Debug: Starting _kubernetesService.CleanupResourcesAsync
| [2025-03-30T21:59:04] Aspire.Hosting.Dcp.dcp.start-apiserver.api-server.adminHttpHandler Information: API server changed status {"OldStatus": "Running", "NewStatus": "CleaningResources"}
| [2025-03-30T21:59:07] Aspire.Hosting.Dcp.dcpctrl.ExecutableReconciler Error: could not remove process's standard output file {"Executable": {"name":"shortlived-tupmvdws"}, "Reconciliation": 8, "path": "C:\\Users\\alex\\AppData\\Local\\Temp\\aspire.mln2bwdd.mur\\shortlived-tupmvdws_out_4cf6218c-1ad2-4212-bdb5-e056b024b623", "error": "remove C:\\Users\\alex\\AppData\\Local\\Temp\\aspire.mln2bwdd.mur\\shortlived-tupmvdws_out_4cf6218c-1ad2-4212-bdb5-e056b024b623: The process cannot access the file because it is being used by another process.\ncontext deadline exceeded"}
| [2025-03-30T21:59:10] Aspire.Hosting.Dcp.dcpctrl.ExecutableReconciler Error: could not remove process's standard error file {"Executable": {"name":"shortlived-tupmvdws"}, "Reconciliation": 8, "path": "C:\\Users\\alex\\AppData\\Local\\Temp\\aspire.mln2bwdd.mur\\shortlived-tupmvdws_err_4cf6218c-1ad2-4212-bdb5-e056b024b623", "error": "remove C:\\Users\\alex\\AppData\\Local\\Temp\\aspire.mln2bwdd.mur\\shortlived-tupmvdws_err_4cf6218c-1ad2-4212-bdb5-e056b024b623: The process cannot access the file because it is being used by another process.\ncontext deadline exceeded"}
| [2025-03-30T21:59:11] Aspire.Hosting.Dcp.DcpExecutor Debug: Finished _kubernetesService.CleanupResourcesAsync
Cotnainer
...
| [2025-03-30T21:59:16] Aspire.Hosting.Dcp.DcpExecutor Debug: Starting _kubernetesService.CleanupResourcesAsync
| [2025-03-30T21:59:16] Aspire.Hosting.Dcp.dcp.start-apiserver.api-server.adminHttpHandler Information: API server changed status {"OldStatus": "Running", "NewStatus": "CleaningResources"}
| [2025-03-30T21:59:16] Aspire.Hosting.Dcp.dcpctrl.ContainerReconciler Information: no more Container resources are being watched, cancelling container watch {"Container": {"name":"shortlived-dhhcjwvq"}, "Reconciliation": 11}
| [2025-03-30T21:59:17] Aspire.Hosting.Dcp.dcpproc.monitor Information: monitor process exited, shutting down {"pid": 16352}
| [2025-03-30T21:59:26] Aspire.Hosting.Dcp.DcpExecutor Debug: Timed out stopping DCP Executor
| System.Threading.Tasks.TaskCanceledException: A task was canceled.
| at Polly.Retry.RetryResilienceStrategy`1.ExecuteCore[TState](Func`3 callback, ResilienceContext context, TState state)
| at Polly.Outcome`1.GetResultOrRethrow()
| at Polly.ResiliencePipeline.ExecuteAsync[TResult](Func`2 callback, CancellationToken cancellationToken)
| at Aspire.Hosting.Dcp.KubernetesService.CleanupResourcesAsync(CancellationToken cancellationToken) in S:\aspire\src\Aspire.Hosting\Dcp\KubernetesService.cs:line 384
| at Aspire.Hosting.Dcp.DcpExecutor.StopAsync(CancellationToken cancellationToken) in S:\aspire\src\Aspire.Hosting\Dcp\DcpExecutor.cs:line 180
.NET Version info
No response
Anything else?
Note, for the above logs, I did add some extra logging to DcpExecutor
to clearly see the errors & prove the method that is taking the time.
try
{
if (_options.Value.WaitForResourceCleanup)
{
+ _logger.LogDebug("Starting _kubernetesService.CleanupResourcesAsync");
await _kubernetesService.CleanupResourcesAsync(cancellationToken).ConfigureAwait(false);
+ _logger.LogDebug("Finished _kubernetesService.CleanupResourcesAsync");
}
// ...
}
catch (OperationCanceledException ex)
{
+ _logger.LogDebug(ex, "Timed out stopping DCP Executor");
- // Ignore.
}