Skip to content

EBS volume created but not linked to PV/PVC #2512

@samox73

Description

@samox73

/kind bug

What happened?

Provisioning an io2 volume close to the limit for Storage for Provisioned IOPS SSD (io2) volumes, in TiB (L-09BD8365) causes

E0605 06:26:48.679755       1 handlers.go:85] "Error from AWS API" err="api
error VolumeLimitExceeded: You have exceeded your maximum io2 storage limit of
30 TiB in this region. Please contact AWS Support to request an Elastic Block
Store service limit increase."

We got this error for 2 days while waiting for a quota increase. Once it was raised to 60TiB, the controller didn't fail anymore and could link the EBS volume to the PV/PVC correctly.

What you expected to happen?

The controller should calculate the limits correctly and fail robustly.

How to reproduce it (as minimally and precisely as possible)?

Since I cannot arbitrarily set the quota for L-09BD8365, I will just list what we had done:

  • Have the io2 TiB limit at 30TiB for the account
  • Provision the following io2 volumes:
    • 1x 900GiB
    • 3x 1100GiB
    • 2x 7700GiB
      This leads to a utilization of ~19TiB
  • Try to provision another 7700GiB io2 volume
  • Find the error in the events of the PVC or the controller logs

Anything else we need to know?:

Relevant logs:
2025-06-05 08:26:06.431 I0605 06:26:06.431598       1 event.go:389] "Event occurred" object="history-infra/history-cnpg-15" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Normal" reason="Provisioning" message="External provisioner is provisioning volume for claim \"history-infra/history-cnpg-15\""
2025-06-05 08:26:06.449 I0605 06:26:06.449257       1 controller.go:951] "Retrying syncing claim" key="c8f34cdd-edc1-4f77-8d9f-59bf73365069" failures=0
2025-06-05 08:26:06.449 E0605 06:26:06.449306       1 controller.go:974] "Unhandled Error" err="error syncing claim \"c8f34cdd-edc1-4f77-8d9f-59bf73365069\": failed to provision volume with StorageClass \"cnpg-io2\": error generating accessibility requirements: no topology key found on CSINode ip-10-252-48-139.eu-central-1.compute.internal" logger="UnhandledError"
2025-06-05 08:26:06.449 I0605 06:26:06.449327       1 event.go:389] "Event occurred" object="history-infra/history-cnpg-15" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Warning" reason="ProvisioningFailed" message="failed to provision volume with StorageClass \"cnpg-io2\": error generating accessibility requirements: no topology key found on CSINode ip-10-252-48-139.eu-central-1.compute.internal"
2025-06-05 08:26:07.450 I0605 06:26:07.450324       1 event.go:389] "Event occurred" object="history-infra/history-cnpg-15" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Normal" reason="Provisioning" message="External provisioner is provisioning volume for claim \"history-infra/history-cnpg-15\""
2025-06-05 08:26:07.462 I0605 06:26:07.462566       1 controller.go:951] "Retrying syncing claim" key="c8f34cdd-edc1-4f77-8d9f-59bf73365069" failures=1
2025-06-05 08:26:07.462 E0605 06:26:07.462594       1 controller.go:974] "Unhandled Error" err="error syncing claim \"c8f34cdd-edc1-4f77-8d9f-59bf73365069\": failed to provision volume with StorageClass \"cnpg-io2\": error generating accessibility requirements: no topology key found on CSINode ip-10-252-48-139.eu-central-1.compute.internal" logger="UnhandledError"
2025-06-05 08:26:07.462 I0605 06:26:07.462639       1 event.go:389] "Event occurred" object="history-infra/history-cnpg-15" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Warning" reason="ProvisioningFailed" message="failed to provision volume with StorageClass \"cnpg-io2\": error generating accessibility requirements: no topology key found on CSINode ip-10-252-48-139.eu-central-1.compute.internal"
2025-06-05 08:26:09.463 I0605 06:26:09.463468       1 event.go:389] "Event occurred" object="history-infra/history-cnpg-15" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Normal" reason="Provisioning" message="External provisioner is provisioning volume for claim \"history-infra/history-cnpg-15\""
2025-06-05 08:26:09.477 I0605 06:26:09.477309       1 controller.go:951] "Retrying syncing claim" key="c8f34cdd-edc1-4f77-8d9f-59bf73365069" failures=2
2025-06-05 08:26:09.477 E0605 06:26:09.477342       1 controller.go:974] "Unhandled Error" err="error syncing claim \"c8f34cdd-edc1-4f77-8d9f-59bf73365069\": failed to provision volume with StorageClass \"cnpg-io2\": error generating accessibility requirements: no topology key found on CSINode ip-10-252-48-139.eu-central-1.compute.internal" logger="UnhandledError"
2025-06-05 08:26:09.477 I0605 06:26:09.477366       1 event.go:389] "Event occurred" object="history-infra/history-cnpg-15" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Warning" reason="ProvisioningFailed" message="failed to provision volume with StorageClass \"cnpg-io2\": error generating accessibility requirements: no topology key found on CSINode ip-10-252-48-139.eu-central-1.compute.internal"
2025-06-05 08:26:13.478 I0605 06:26:13.478427       1 event.go:389] "Event occurred" object="history-infra/history-cnpg-15" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Normal" reason="Provisioning" message="External provisioner is provisioning volume for claim \"history-infra/history-cnpg-15\""
2025-06-05 08:26:13.489 I0605 06:26:13.489602       1 controller.go:951] "Retrying syncing claim" key="c8f34cdd-edc1-4f77-8d9f-59bf73365069" failures=3
2025-06-05 08:26:13.489 E0605 06:26:13.489633       1 controller.go:974] "Unhandled Error" err="error syncing claim \"c8f34cdd-edc1-4f77-8d9f-59bf73365069\": failed to provision volume with StorageClass \"cnpg-io2\": error generating accessibility requirements: no topology key found on CSINode ip-10-252-48-139.eu-central-1.compute.internal" logger="UnhandledError"
2025-06-05 08:26:13.489 I0605 06:26:13.489685       1 event.go:389] "Event occurred" object="history-infra/history-cnpg-15" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Warning" reason="ProvisioningFailed" message="failed to provision volume with StorageClass \"cnpg-io2\": error generating accessibility requirements: no topology key found on CSINode ip-10-252-48-139.eu-central-1.compute.internal"
2025-06-05 08:26:21.490 I0605 06:26:21.490638       1 event.go:389] "Event occurred" object="history-infra/history-cnpg-15" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Normal" reason="Provisioning" message="External provisioner is provisioning volume for claim \"history-infra/history-cnpg-15\""
2025-06-05 08:26:32.594 E0605 06:26:32.594502       1 driver.go:108] "GRPC error" err="rpc error: code = Internal desc = Could not create volume \"pvc-c8f34cdd-edc1-4f77-8d9f-59bf73365069\": timed out waiting for volume to create: timed out waiting for the condition"
2025-06-05 08:26:32.594 I0605 06:26:32.594847       1 controller.go:1094] "Final error received, removing PVC from claims in progress" claimUID="c8f34cdd-edc1-4f77-8d9f-59bf73365069"
2025-06-05 08:26:32.594 I0605 06:26:32.594867       1 controller.go:951] "Retrying syncing claim" key="c8f34cdd-edc1-4f77-8d9f-59bf73365069" failures=4
2025-06-05 08:26:32.594 E0605 06:26:32.594894       1 controller.go:974] "Unhandled Error" err="error syncing claim \"c8f34cdd-edc1-4f77-8d9f-59bf73365069\": failed to provision volume with StorageClass \"cnpg-io2\": rpc error: code = Internal desc = Could not create volume \"pvc-c8f34cdd-edc1-4f77-8d9f-59bf73365069\": timed out waiting for volume to create: timed out waiting for the condition" logger="UnhandledError"
2025-06-05 08:26:32.594 I0605 06:26:32.594946       1 event.go:389] "Event occurred" object="history-infra/history-cnpg-15" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Warning" reason="ProvisioningFailed" message="failed to provision volume with StorageClass \"cnpg-io2\": rpc error: code = Internal desc = Could not create volume \"pvc-c8f34cdd-edc1-4f77-8d9f-59bf73365069\": timed out waiting for volume to create: timed out waiting for the condition"
2025-06-05 08:26:48.595 I0605 06:26:48.595544       1 event.go:389] "Event occurred" object="history-infra/history-cnpg-15" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Normal" reason="Provisioning" message="External provisioner is provisioning volume for claim \"history-infra/history-cnpg-15\""
2025-06-05 08:26:48.679 E0605 06:26:48.679755       1 handlers.go:85] "Error from AWS API" err="api error VolumeLimitExceeded: You have exceeded your maximum io2 storage limit of 30 TiB in this region. Please contact AWS Support to request an Elastic Block Store service limit increase."
2025-06-05 08:26:48.679 E0605 06:26:48.679865       1 driver.go:108] "GRPC error" err="rpc error: code = Internal desc = Could not create volume \"pvc-c8f34cdd-edc1-4f77-8d9f-59bf73365069\": could not create volume in EC2: operation error EC2: CreateVolume, https response error StatusCode: 400, RequestID: 183d1dbb-1564-4ff1-9fc6-16340eb2cbf6, api error VolumeLimitExceeded: You have exceeded your maximum io2 storage limit of 30 TiB in this region. Please contact AWS Support to request an Elastic Block Store service limit increase."
2025-06-05 08:26:48.680 I0605 06:26:48.680236       1 controller.go:1094] "Final error received, removing PVC from claims in progress" claimUID="c8f34cdd-edc1-4f77-8d9f-59bf73365069"
2025-06-05 08:26:48.680 I0605 06:26:48.680258       1 controller.go:951] "Retrying syncing claim" key="c8f34cdd-edc1-4f77-8d9f-59bf73365069" failures=5
2025-06-05 08:26:48.680 E0605 06:26:48.680298       1 controller.go:974] "Unhandled Error" err="error syncing claim \"c8f34cdd-edc1-4f77-8d9f-59bf73365069\": failed to provision volume with StorageClass \"cnpg-io2\": rpc error: code = Internal desc = Could not create volume \"pvc-c8f34cdd-edc1-4f77-8d9f-59bf73365069\": could not create volume in EC2: operation error EC2: CreateVolume, https response error StatusCode: 400, RequestID: 183d1dbb-1564-4ff1-9fc6-16340eb2cbf6, api error VolumeLimitExceeded: You have exceeded your maximum io2 storage limit of 30 TiB in this region. Please contact AWS Support to request an Elastic Block Store service limit increase." logger="UnhandledError"

Environment

  • Kubernetes version (use kubectl version):
    Client Version: v1.33.1
    Kustomize Version: v5.6.0
    Server Version: v1.32.5-eks-5d4a308
    
  • Driver version: v1.38.1
  • Helm chart version 2.38.1
  • Helm chart values:
    controller:
      serviceAccount:
        create: false
        name: ********
      logLevel: 3
      replicaCount: 1
      region: eu-central-1
    node:
      tolerateAllTaints: false
      tolerations: {}
      volumeAttachLimit: 24
      enableMetrics: true
    

We don't tolerate taints due to some Fargate nodes we have in our clusters.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions