Open
Description
Enhancement Description
DRA drivers may encounter errors such that the devices allocated by kube-scheduler for a pod can never be successfully returned from the NodePrepareResources
gRPC call to the driver. Currently, pods in that state will be continuously retried forever, wasting CPU cycles in the kubelet and DRA driver. This proposal describes a method to break that cycle of continuous retries that are known will fail.
/sig node
/wg device-management
/assign @nojnhuh
/cc @pohly @lauralorenz @SergeyKanzhelev
- One-line enhancement description (can be used as a release note): DRA: Handle permanent driver allocation failures
- Kubernetes Enhancement Proposal: [in progress]
- Discussion Link:
- Primary contact (assignee): @nojnhuh
- Responsible SIGs: SIG Node
- Enhancement target (which target equals to which milestone):
- Alpha release target (x.y): 1.34
- Beta release target (x.y):
- Stable release target (x.y):
- Alpha
- KEP (
k/enhancements
) update PR(s): - Code (
k/k
) update PR(s): - Docs (
k/website
) update PR(s):
- KEP (
Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
📋 Backlog
Status
Triage