Description
If the DRA driver is well and truly gone, despite all the retry and reconciliation loops, a pod will be stuck in Terminating for as long as its NodeUnprepareResources call has not been fulfilled without error, which is (currently) impossible without a kubelet connection to the driver.
This is also true for networking plugins (as discussed in kubernetes/kubernetes#129402 (comment)), and volumes/CSI drivers (kubernetes/kubernetes#129402 (comment)) which have external services that handle the cleanup asynchronously, and sometimes untracked, by the pod phasing. Device Plugins don't have this issue (though they are at risk of leaving stuff lying around -- per kubernetes/kubernetes#129402 (comment)).
This issue is to document this behavior as it pertains to DRA and describe how it is WAI and what the mediation steps available to a cluster administrator are.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status