Karpenter exceeds EBS attach limits when scheduling StatefulSet pods concurrently

### Description

**Observed Behavior**:
When deploying 100+ StatefulSets **concurrently**, each with its own PVC, Karpenter schedules pods onto nodes that have already reached  the EBS volume attachment limit (27).

Even with WaitForFirstConsumer and EBS CSI in place, AWS returns this error:

```
AttachVolume.Attach failed for volume "pvc-xxxx": 
rpc error: code = Internal desc = Could not attach volume "vol-xxxx" to node "i-xxxx": 
operation error EC2: AttachVolume, api error AttachmentLimitExceeded: 
You may attach up to 27 devices for this instance type.

```

This results in few pods being stuck in Init phase, and volume attach failures

**Expected Behavior**:
Karpenter should:
	1.	Respect EBS attach limits based on the EC2 instance type.
	2.	Avoid scheduling StatefulSet pods with PVCs to a node if it’s already at or near EBS device limit.
	3.	Optionally expose this logic via limits or annotations in EC2NodeClass.
	
**Reproduction Steps** (Please include YAML):

Reproducer Script used to generate STS and deploy 

```
#!/bin/bash

# Clean up old StatefulSets 
kubectl delete statefulsets --all --namespace test-namespace

# Clean up previously generated YAMLs
rm -f sts-loadtest-*.yaml

# Generate YAML files for 100 StatefulSets
for i in $(seq -w 101 200); do
cat <<EOF > sts-loadtest-$i.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: sts-loadtest-$i
  namespace: test-namespace
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sts-loadtest-$i
      type: loadtest
  serviceName: "sts-loadtest-$i"
  template:
    metadata:
      labels:
        app: sts-loadtest-$i
        type: loadtest
    spec:
      containers:
      - name: busybox
        image: busybox
        command: ["sleep", "3600"]
        volumeMounts:
        - name: data
          mountPath: /data
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/arch
                operator: In
                values:
                - arm64
              - key: karpenter.sh/nodepool
                operator: In
                values:
                - loadtest
      tolerations:
      - key: "karpenter.sh/nodepool"
        operator: "Equal"
        value: "loadtest"
        effect: "NoSchedule"
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes:
      - ReadWriteOnce
      storageClassName: loadtest-storage-gp3
      resources:
        requests:
          storage: 1Gi
EOF
done

# Apply all the generated StatefulSets in parallel
ls sts-loadtest-*.yaml | xargs -P 10 -n 1 kubectl apply -f
```

Nodepool yaml 


```
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: loadtest-pool
spec:
  limits:
    cpu: 1000
  disruption:
    budgets:
      - nodes: "1"
    consolidateAfter: 1m
    consolidationPolicy: WhenEmptyOrUnderutilized
  template:
    spec:
      expireAfter: 720h
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: loadtest-class
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["arm64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["r"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["r6g.8xlarge"]
      startupTaints:
        - key: ebs.csi.aws.com/agent-not-ready
          effect: NoExecute
```

StorageClass:

```
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: loadtest-storage-gp3
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Delete
parameters:
  type: gp3
```

**Versions**:
	•	Karpenter Version: v1.5.0 (latest), tried with 1.3.3 as well
	•	Kubernetes Version: v1.31 (EKS)
	•	EC2 Instance Type: r6g.8xlarge (EBS volume attach limit: 27)
	•	StorageClass: gp3, volumeBindingMode: WaitForFirstConsumer
	•	Pod Type: StatefulSet
	•	Pod Count Tested: 100

* Please vote on this issue by adding a 👍 [reaction](https://blog.github.com/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/) to the original issue to help the community and maintainers prioritize this request
* Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
* If you are interested in working on this issue or have submitted a pull request, please leave a comment


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Karpenter exceeds EBS attach limits when scheduling StatefulSet pods concurrently #2287

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Karpenter exceeds EBS attach limits when scheduling StatefulSet pods concurrently #2287

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions