Open
Description
Description
Observed Behavior:
We have a Karpenter NodePool with disruption budgets defined, including a scheduled disruption window for the Underutilized reason. However, we’re observing that nodes are being disrupted outside the defined schedule.
Additionally, nodes with a single high-memory pod (e.g., 24Gi out of 32Gi) are still being marked as underutilized rather than fully utilized or non-empty.
Karpenter Logs:
{"level":"INFO","time":"2025-06-22T17:36:58.825Z","logger":"controller","message":"disrupting node(s)","reason":"underutilized","decision":"delete","disrupted-node-count":1,"pod-count":1}
{"level":"INFO","time":"2025-06-22T17:36:59.679Z","logger":"controller","message":"tainted node","taint.Key":"karpenter.sh/disrupted"}
{"level":"INFO","time":"2025-06-22T17:43:35.214Z","logger":"controller","message":"deleted nodeclaim","NodeClaim":{"name":"<masked>"}}
Expected Behavior:
- Nodes should only be disrupted for the Underutilized reason during the scheduled window.
- Nodes with a single high-memory pod (e.g., 75% of node memory requested) should not be marked underutilized unless criteria are clearly defined.
Reproduction Steps (Please include YAML):
Here is a simplified version of the NodePool configuration:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: gpu-g6
spec:
disruption:
budgets:
- duration: 4h
nodes: 10%
reasons: [Drifted]
schedule: 0 4 1-7 1,4,7,10 1
- duration: 4h
nodes: 10%
reasons: [Underutilized]
schedule: 0 4 * * *
- nodes: 70%
reasons: [Empty]
consolidateAfter: 30m
consolidationPolicy: WhenEmptyOrUnderutilized
limits:
cpu: '1000'
memory: 5000Gi
template:
spec:
expireAfter: 720h
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: al2023
requirements:
- key: kubernetes.io/arch
operator: In
values: [amd64]
- key: node.kubernetes.io/instance-type
operator: In
values: [g6.2xlarge]
- key: karpenter.sh/capacity-type
operator: In
values: [on-demand]
taints:
- effect: NoSchedule
key: nvidia.com/gpu
value: 'true'
weight: 25
Versions:
- Chart Version: 1.5.0
- Kubernetes Version (
kubectl version
): 1.32 eks
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment