Skip to content

Karpenter exceeds EBS attach limits when scheduling StatefulSet pods concurrently #2287

Open
@rajeshcodez

Description

@rajeshcodez

Description

Observed Behavior:
When deploying 100+ StatefulSets concurrently, each with its own PVC, Karpenter schedules pods onto nodes that have already reached the EBS volume attachment limit (27).

Even with WaitForFirstConsumer and EBS CSI in place, AWS returns this error:

AttachVolume.Attach failed for volume "pvc-xxxx": 
rpc error: code = Internal desc = Could not attach volume "vol-xxxx" to node "i-xxxx": 
operation error EC2: AttachVolume, api error AttachmentLimitExceeded: 
You may attach up to 27 devices for this instance type.

This results in few pods being stuck in Init phase, and volume attach failures

Expected Behavior:
Karpenter should:
1. Respect EBS attach limits based on the EC2 instance type.
2. Avoid scheduling StatefulSet pods with PVCs to a node if it’s already at or near EBS device limit.
3. Optionally expose this logic via limits or annotations in EC2NodeClass.

Reproduction Steps (Please include YAML):

Reproducer Script used to generate STS and deploy

#!/bin/bash

# Clean up old StatefulSets 
kubectl delete statefulsets --all --namespace test-namespace

# Clean up previously generated YAMLs
rm -f sts-loadtest-*.yaml

# Generate YAML files for 100 StatefulSets
for i in $(seq -w 101 200); do
cat <<EOF > sts-loadtest-$i.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: sts-loadtest-$i
  namespace: test-namespace
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sts-loadtest-$i
      type: loadtest
  serviceName: "sts-loadtest-$i"
  template:
    metadata:
      labels:
        app: sts-loadtest-$i
        type: loadtest
    spec:
      containers:
      - name: busybox
        image: busybox
        command: ["sleep", "3600"]
        volumeMounts:
        - name: data
          mountPath: /data
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/arch
                operator: In
                values:
                - arm64
              - key: karpenter.sh/nodepool
                operator: In
                values:
                - loadtest
      tolerations:
      - key: "karpenter.sh/nodepool"
        operator: "Equal"
        value: "loadtest"
        effect: "NoSchedule"
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes:
      - ReadWriteOnce
      storageClassName: loadtest-storage-gp3
      resources:
        requests:
          storage: 1Gi
EOF
done

# Apply all the generated StatefulSets in parallel
ls sts-loadtest-*.yaml | xargs -P 10 -n 1 kubectl apply -f

Nodepool yaml

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: loadtest-pool
spec:
  limits:
    cpu: 1000
  disruption:
    budgets:
      - nodes: "1"
    consolidateAfter: 1m
    consolidationPolicy: WhenEmptyOrUnderutilized
  template:
    spec:
      expireAfter: 720h
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: loadtest-class
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["arm64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["r"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["r6g.8xlarge"]
      startupTaints:
        - key: ebs.csi.aws.com/agent-not-ready
          effect: NoExecute

StorageClass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: loadtest-storage-gp3
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Delete
parameters:
  type: gp3

Versions:
• Karpenter Version: v1.5.0 (latest), tried with 1.3.3 as well
• Kubernetes Version: v1.31 (EKS)
• EC2 Instance Type: r6g.8xlarge (EBS volume attach limit: 27)
• StorageClass: gp3, volumeBindingMode: WaitForFirstConsumer
• Pod Type: StatefulSet
• Pod Count Tested: 100

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.needs-priorityneeds-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions