Skip to content

Context cancellation causes a goroutine leak (possibly intermittent) #3218

Open
@pmalek

Description

@pmalek

Problem statement

When running our integration tests suite with https://github.com/uber-go/goleak, it sometimes reports a goroutine leak in:

 manager_envtest_test.go:84: found unexpected goroutines:
          [Goroutine 12886 in state sync.WaitGroup.Wait, with sync.runtime_SemacquireWaitGroup on top of the stack:
          sync.runtime_SemacquireWaitGroup(0xc000cdc830?)
          	/home/runner/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.1.linux-amd64/src/runtime/sema.go:110 +0x25
          sync.(*WaitGroup).Wait(0xc000cdc830)
          	/home/runner/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.1.linux-amd64/src/sync/waitgroup.go:118 +0x89
          sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).engageStopProcedure.func3.(*runnableGroup).StopAndWait.3.2()
          	/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/manager/runnable_group.go:307 +0x99
          created by sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).engageStopProcedure.func3.(*runnableGroup).StopAndWait.3 in goroutine 12666
          	/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/manager/runnable_group.go:304 +0x213
           Goroutine 12791 in state chan send, with sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1 on top of the stack:
          sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1(0xc00115d940)
          	/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/manager/runnable_group.go:227 +0x1f8
          created by sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile in goroutine 12615
          	/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/manager/runnable_group.go:210 +0x245
           Goroutine 12838 in state chan send, with sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1 on top of the stack:
          sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1(0xc00115dd00)
          	/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/manager/runnable_group.go:227 +0x1f8
          created by sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile in goroutine 12615
          	/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/manager/runnable_group.go:210 +0x245
          ]

After quick glance at the code, it would seem that controllerManager passes its errChan to its runnableGroups.

When runnables fail, they propagate the error through that error channel in

.

If I'm looking at this correctly, when context is done, the errChan is not drained in

case <-ctx.Done():
// We are done
return nil
and thus runnable groups cannot send their errors in .

Proposed solution

Drain the cm.errChan in case <-ctx.Done() branch at

select {
case <-ctx.Done():
// We are done
return nil
case err := <-cm.errChan:
// Error starting or running a runnable
return err
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions