Description
Description
Observed Behavior:
When deploying 100+ StatefulSets concurrently, each with its own PVC, Karpenter schedules pods onto nodes that have already reached the EBS volume attachment limit (27).
Even with WaitForFirstConsumer and EBS CSI in place, AWS returns this error:
AttachVolume.Attach failed for volume "pvc-xxxx":
rpc error: code = Internal desc = Could not attach volume "vol-xxxx" to node "i-xxxx":
operation error EC2: AttachVolume, api error AttachmentLimitExceeded:
You may attach up to 27 devices for this instance type.
This results in few pods being stuck in Init phase, and volume attach failures
Expected Behavior:
Karpenter should:
1. Respect EBS attach limits based on the EC2 instance type.
2. Avoid scheduling StatefulSet pods with PVCs to a node if it’s already at or near EBS device limit.
3. Optionally expose this logic via limits or annotations in EC2NodeClass.
Reproduction Steps (Please include YAML):
Reproducer Script used to generate STS and deploy
#!/bin/bash
# Clean up old StatefulSets
kubectl delete statefulsets --all --namespace test-namespace
# Clean up previously generated YAMLs
rm -f sts-loadtest-*.yaml
# Generate YAML files for 100 StatefulSets
for i in $(seq -w 101 200); do
cat <<EOF > sts-loadtest-$i.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: sts-loadtest-$i
namespace: test-namespace
spec:
replicas: 1
selector:
matchLabels:
app: sts-loadtest-$i
type: loadtest
serviceName: "sts-loadtest-$i"
template:
metadata:
labels:
app: sts-loadtest-$i
type: loadtest
spec:
containers:
- name: busybox
image: busybox
command: ["sleep", "3600"]
volumeMounts:
- name: data
mountPath: /data
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/arch
operator: In
values:
- arm64
- key: karpenter.sh/nodepool
operator: In
values:
- loadtest
tolerations:
- key: "karpenter.sh/nodepool"
operator: "Equal"
value: "loadtest"
effect: "NoSchedule"
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes:
- ReadWriteOnce
storageClassName: loadtest-storage-gp3
resources:
requests:
storage: 1Gi
EOF
done
# Apply all the generated StatefulSets in parallel
ls sts-loadtest-*.yaml | xargs -P 10 -n 1 kubectl apply -f
Nodepool yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: loadtest-pool
spec:
limits:
cpu: 1000
disruption:
budgets:
- nodes: "1"
consolidateAfter: 1m
consolidationPolicy: WhenEmptyOrUnderutilized
template:
spec:
expireAfter: 720h
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: loadtest-class
requirements:
- key: kubernetes.io/arch
operator: In
values: ["arm64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["r"]
- key: node.kubernetes.io/instance-type
operator: In
values: ["r6g.8xlarge"]
startupTaints:
- key: ebs.csi.aws.com/agent-not-ready
effect: NoExecute
StorageClass:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: loadtest-storage-gp3
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Delete
parameters:
type: gp3
Versions:
• Karpenter Version: v1.5.0 (latest), tried with 1.3.3 as well
• Kubernetes Version: v1.31 (EKS)
• EC2 Instance Type: r6g.8xlarge (EBS volume attach limit: 27)
• StorageClass: gp3, volumeBindingMode: WaitForFirstConsumer
• Pod Type: StatefulSet
• Pod Count Tested: 100
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment