-
Notifications
You must be signed in to change notification settings - Fork 839
Open
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.
Description
/kind bug
What happened?
Provisioning an io2 volume close to the limit for Storage for Provisioned IOPS SSD (io2) volumes, in TiB
(L-09BD8365
) causes
E0605 06:26:48.679755 1 handlers.go:85] "Error from AWS API" err="api
error VolumeLimitExceeded: You have exceeded your maximum io2 storage limit of
30 TiB in this region. Please contact AWS Support to request an Elastic Block
Store service limit increase."
We got this error for 2 days while waiting for a quota increase. Once it was raised to 60TiB, the controller didn't fail anymore and could link the EBS volume to the PV/PVC correctly.
What you expected to happen?
The controller should calculate the limits correctly and fail robustly.
How to reproduce it (as minimally and precisely as possible)?
Since I cannot arbitrarily set the quota for L-09BD8365, I will just list what we had done:
- Have the io2 TiB limit at
30TiB
for the account - Provision the following io2 volumes:
- 1x 900GiB
- 3x 1100GiB
- 2x 7700GiB
This leads to a utilization of ~19TiB
- Try to provision another 7700GiB io2 volume
- Find the error in the events of the PVC or the controller logs
Anything else we need to know?:
Relevant logs:
2025-06-05 08:26:06.431 I0605 06:26:06.431598 1 event.go:389] "Event occurred" object="history-infra/history-cnpg-15" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Normal" reason="Provisioning" message="External provisioner is provisioning volume for claim \"history-infra/history-cnpg-15\""
2025-06-05 08:26:06.449 I0605 06:26:06.449257 1 controller.go:951] "Retrying syncing claim" key="c8f34cdd-edc1-4f77-8d9f-59bf73365069" failures=0
2025-06-05 08:26:06.449 E0605 06:26:06.449306 1 controller.go:974] "Unhandled Error" err="error syncing claim \"c8f34cdd-edc1-4f77-8d9f-59bf73365069\": failed to provision volume with StorageClass \"cnpg-io2\": error generating accessibility requirements: no topology key found on CSINode ip-10-252-48-139.eu-central-1.compute.internal" logger="UnhandledError"
2025-06-05 08:26:06.449 I0605 06:26:06.449327 1 event.go:389] "Event occurred" object="history-infra/history-cnpg-15" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Warning" reason="ProvisioningFailed" message="failed to provision volume with StorageClass \"cnpg-io2\": error generating accessibility requirements: no topology key found on CSINode ip-10-252-48-139.eu-central-1.compute.internal"
2025-06-05 08:26:07.450 I0605 06:26:07.450324 1 event.go:389] "Event occurred" object="history-infra/history-cnpg-15" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Normal" reason="Provisioning" message="External provisioner is provisioning volume for claim \"history-infra/history-cnpg-15\""
2025-06-05 08:26:07.462 I0605 06:26:07.462566 1 controller.go:951] "Retrying syncing claim" key="c8f34cdd-edc1-4f77-8d9f-59bf73365069" failures=1
2025-06-05 08:26:07.462 E0605 06:26:07.462594 1 controller.go:974] "Unhandled Error" err="error syncing claim \"c8f34cdd-edc1-4f77-8d9f-59bf73365069\": failed to provision volume with StorageClass \"cnpg-io2\": error generating accessibility requirements: no topology key found on CSINode ip-10-252-48-139.eu-central-1.compute.internal" logger="UnhandledError"
2025-06-05 08:26:07.462 I0605 06:26:07.462639 1 event.go:389] "Event occurred" object="history-infra/history-cnpg-15" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Warning" reason="ProvisioningFailed" message="failed to provision volume with StorageClass \"cnpg-io2\": error generating accessibility requirements: no topology key found on CSINode ip-10-252-48-139.eu-central-1.compute.internal"
2025-06-05 08:26:09.463 I0605 06:26:09.463468 1 event.go:389] "Event occurred" object="history-infra/history-cnpg-15" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Normal" reason="Provisioning" message="External provisioner is provisioning volume for claim \"history-infra/history-cnpg-15\""
2025-06-05 08:26:09.477 I0605 06:26:09.477309 1 controller.go:951] "Retrying syncing claim" key="c8f34cdd-edc1-4f77-8d9f-59bf73365069" failures=2
2025-06-05 08:26:09.477 E0605 06:26:09.477342 1 controller.go:974] "Unhandled Error" err="error syncing claim \"c8f34cdd-edc1-4f77-8d9f-59bf73365069\": failed to provision volume with StorageClass \"cnpg-io2\": error generating accessibility requirements: no topology key found on CSINode ip-10-252-48-139.eu-central-1.compute.internal" logger="UnhandledError"
2025-06-05 08:26:09.477 I0605 06:26:09.477366 1 event.go:389] "Event occurred" object="history-infra/history-cnpg-15" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Warning" reason="ProvisioningFailed" message="failed to provision volume with StorageClass \"cnpg-io2\": error generating accessibility requirements: no topology key found on CSINode ip-10-252-48-139.eu-central-1.compute.internal"
2025-06-05 08:26:13.478 I0605 06:26:13.478427 1 event.go:389] "Event occurred" object="history-infra/history-cnpg-15" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Normal" reason="Provisioning" message="External provisioner is provisioning volume for claim \"history-infra/history-cnpg-15\""
2025-06-05 08:26:13.489 I0605 06:26:13.489602 1 controller.go:951] "Retrying syncing claim" key="c8f34cdd-edc1-4f77-8d9f-59bf73365069" failures=3
2025-06-05 08:26:13.489 E0605 06:26:13.489633 1 controller.go:974] "Unhandled Error" err="error syncing claim \"c8f34cdd-edc1-4f77-8d9f-59bf73365069\": failed to provision volume with StorageClass \"cnpg-io2\": error generating accessibility requirements: no topology key found on CSINode ip-10-252-48-139.eu-central-1.compute.internal" logger="UnhandledError"
2025-06-05 08:26:13.489 I0605 06:26:13.489685 1 event.go:389] "Event occurred" object="history-infra/history-cnpg-15" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Warning" reason="ProvisioningFailed" message="failed to provision volume with StorageClass \"cnpg-io2\": error generating accessibility requirements: no topology key found on CSINode ip-10-252-48-139.eu-central-1.compute.internal"
2025-06-05 08:26:21.490 I0605 06:26:21.490638 1 event.go:389] "Event occurred" object="history-infra/history-cnpg-15" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Normal" reason="Provisioning" message="External provisioner is provisioning volume for claim \"history-infra/history-cnpg-15\""
2025-06-05 08:26:32.594 E0605 06:26:32.594502 1 driver.go:108] "GRPC error" err="rpc error: code = Internal desc = Could not create volume \"pvc-c8f34cdd-edc1-4f77-8d9f-59bf73365069\": timed out waiting for volume to create: timed out waiting for the condition"
2025-06-05 08:26:32.594 I0605 06:26:32.594847 1 controller.go:1094] "Final error received, removing PVC from claims in progress" claimUID="c8f34cdd-edc1-4f77-8d9f-59bf73365069"
2025-06-05 08:26:32.594 I0605 06:26:32.594867 1 controller.go:951] "Retrying syncing claim" key="c8f34cdd-edc1-4f77-8d9f-59bf73365069" failures=4
2025-06-05 08:26:32.594 E0605 06:26:32.594894 1 controller.go:974] "Unhandled Error" err="error syncing claim \"c8f34cdd-edc1-4f77-8d9f-59bf73365069\": failed to provision volume with StorageClass \"cnpg-io2\": rpc error: code = Internal desc = Could not create volume \"pvc-c8f34cdd-edc1-4f77-8d9f-59bf73365069\": timed out waiting for volume to create: timed out waiting for the condition" logger="UnhandledError"
2025-06-05 08:26:32.594 I0605 06:26:32.594946 1 event.go:389] "Event occurred" object="history-infra/history-cnpg-15" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Warning" reason="ProvisioningFailed" message="failed to provision volume with StorageClass \"cnpg-io2\": rpc error: code = Internal desc = Could not create volume \"pvc-c8f34cdd-edc1-4f77-8d9f-59bf73365069\": timed out waiting for volume to create: timed out waiting for the condition"
2025-06-05 08:26:48.595 I0605 06:26:48.595544 1 event.go:389] "Event occurred" object="history-infra/history-cnpg-15" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Normal" reason="Provisioning" message="External provisioner is provisioning volume for claim \"history-infra/history-cnpg-15\""
2025-06-05 08:26:48.679 E0605 06:26:48.679755 1 handlers.go:85] "Error from AWS API" err="api error VolumeLimitExceeded: You have exceeded your maximum io2 storage limit of 30 TiB in this region. Please contact AWS Support to request an Elastic Block Store service limit increase."
2025-06-05 08:26:48.679 E0605 06:26:48.679865 1 driver.go:108] "GRPC error" err="rpc error: code = Internal desc = Could not create volume \"pvc-c8f34cdd-edc1-4f77-8d9f-59bf73365069\": could not create volume in EC2: operation error EC2: CreateVolume, https response error StatusCode: 400, RequestID: 183d1dbb-1564-4ff1-9fc6-16340eb2cbf6, api error VolumeLimitExceeded: You have exceeded your maximum io2 storage limit of 30 TiB in this region. Please contact AWS Support to request an Elastic Block Store service limit increase."
2025-06-05 08:26:48.680 I0605 06:26:48.680236 1 controller.go:1094] "Final error received, removing PVC from claims in progress" claimUID="c8f34cdd-edc1-4f77-8d9f-59bf73365069"
2025-06-05 08:26:48.680 I0605 06:26:48.680258 1 controller.go:951] "Retrying syncing claim" key="c8f34cdd-edc1-4f77-8d9f-59bf73365069" failures=5
2025-06-05 08:26:48.680 E0605 06:26:48.680298 1 controller.go:974] "Unhandled Error" err="error syncing claim \"c8f34cdd-edc1-4f77-8d9f-59bf73365069\": failed to provision volume with StorageClass \"cnpg-io2\": rpc error: code = Internal desc = Could not create volume \"pvc-c8f34cdd-edc1-4f77-8d9f-59bf73365069\": could not create volume in EC2: operation error EC2: CreateVolume, https response error StatusCode: 400, RequestID: 183d1dbb-1564-4ff1-9fc6-16340eb2cbf6, api error VolumeLimitExceeded: You have exceeded your maximum io2 storage limit of 30 TiB in this region. Please contact AWS Support to request an Elastic Block Store service limit increase." logger="UnhandledError"
Environment
- Kubernetes version (use
kubectl version
):Client Version: v1.33.1 Kustomize Version: v5.6.0 Server Version: v1.32.5-eks-5d4a308
- Driver version: v1.38.1
- Helm chart version 2.38.1
- Helm chart values:
controller: serviceAccount: create: false name: ******** logLevel: 3 replicaCount: 1 region: eu-central-1 node: tolerateAllTaints: false tolerations: {} volumeAttachLimit: 24 enableMetrics: true
We don't tolerate taints due to some Fargate nodes we have in our clusters.
Metadata
Metadata
Assignees
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.