Document that pods hanging in terminating if DRA driver is truly gone is WAI (re: 129402 discussion)

If the DRA driver is well and truly gone, despite all the retry and reconciliation loops, a pod will be stuck in Terminating for as long as its NodeUnprepareResources call has not been fulfilled without error, which is (currently) impossible without a kubelet connection to the driver.

This is also true for networking plugins (as discussed in https://github.com/kubernetes/kubernetes/issues/129402#issuecomment-2578651690), and volumes/CSI drivers (https://github.com/kubernetes/kubernetes/issues/129402#issuecomment-2579086354) which have external services that handle the cleanup asynchronously, and sometimes untracked, by the pod phasing. Device Plugins don't have this issue (though they are at risk of leaving stuff lying around -- per https://github.com/kubernetes/kubernetes/issues/129402#issuecomment-2653348951).

This issue is to document this behavior as it pertains to DRA and describe how it is WAI and what the mediation steps available to a cluster administrator are.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Document that pods hanging in terminating if DRA driver is truly gone is WAI (re: 129402 discussion) #51012

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Document that pods hanging in terminating if DRA driver is truly gone is WAI (re: 129402 discussion) #51012

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions