Skip to content

Commit 9c6ddfd

Browse files
authored
Merge pull request #5353 from gnufied/bump-recovery-ga
Target GA for recover from Volume Expansion failure feature
2 parents b164fd1 + 4e06845 commit 9c6ddfd

File tree

3 files changed

+56
-23
lines changed

3 files changed

+56
-23
lines changed

keps/prod-readiness/sig-storage/1790.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,5 @@ alpha:
33
approver: "@deads2k"
44
beta:
55
approver: "@deads2k"
6+
stable:
7+
approver: "@deads2k"

keps/sig-storage/1790-recover-resize-failure/README.md

Lines changed: 51 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
- [Non-Goals](#non-goals)
1111
- [Proposal](#proposal)
1212
- [Making allocatedResourceStatus not change unnecessarily for every error in 1.31](#making-allocatedresourcestatus-not-change-unnecessarily-for-every-error-in-131)
13+
- [Handling of RWX volumes that don't require node expansion](#handling-of-rwx-volumes-that-dont-require-node-expansion)
1314
- [Making resizeStatus more general in v1.28](#making-resizestatus-more-general-in-v128)
1415
- [User flow stories](#user-flow-stories)
1516
- [Case 0 (default PVC creation):](#case-0-default-pvc-creation)
@@ -27,6 +28,7 @@
2728
- [Graduation Criteria](#graduation-criteria)
2829
- [Alpha](#alpha)
2930
- [Beta](#beta)
31+
- [GA](#ga)
3032
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
3133
- [Version Skew Strategy](#version-skew-strategy)
3234
- [Monitoring](#monitoring)
@@ -140,6 +142,17 @@ This will allow external-resizer to recover safely from node expansion failures
140142

141143
![New flow kubelet](./Expanding volume - Kubelet Loop.png)
142144

145+
### Handling of RWX volumes that don't require node expansion
146+
147+
There are CSI drivers which return `NodeExpansionRequired: false` after `ControllerExpandVolume` is finished, but in kubelet to handle the case of node-expansion of RWX volumes, we usually MUST call `NodeExpandVolume` on each node where volume is attached, even if `ControllerExpandVolume` is finished and no node-expansion is required. This special case *only* applies to RWX volumes.
148+
149+
To avoid calling `NodeExpandVolume` for such volumes, we are proposing that we add an annotation to PVC called - `volume.kubernetes.io/node-expansion-not-required` when `ControllerExpandVolume` is finished and `NodeExpansionRequired` is set to `false` - https://github.com/kubernetes-csi/external-resizer/pull/496/files .
150+
151+
In kubelet if a RWX volume has this annotation, no `NodeExpandVolume` will be called and and updated size of the volume will simply be recorded in actual state of the world.This is implemented in - https://github.com/kubernetes/kubernetes/pull/131907 .
152+
153+
The proposed mechanism is fully backward compatible and only applies to RWX volumes.
154+
155+
143156
### Making resizeStatus more general in v1.28
144157

145158
After [some discussion](https://github.com/kubernetes/kubernetes/pull/116335#issuecomment-1624566731) with sig-storage folks, we are proposing that we rename `pvc.Status.ResizeStatus` to `pvc.Status.AllocatedResourceStatus` and make it a map.
@@ -330,9 +343,15 @@ There are already e2e tests running that verify correctness of this feature - ht
330343
- There are already e2e tests running that verify correctness of this feature -
331344
https://testgrid.k8s.io/presubmits-kubernetes-nonblocking#pull-kubernetes-e2e-gce-cos-alpha-features
332345

346+
#### GA
347+
348+
- The feature has been extensively tested in CI and deployed with drivers in real world. For example - https://github.com/kubernetes-sigs/azuredisk-csi-driver/blob/master/charts/latest/azuredisk-csi-driver/templates/csi-azuredisk-controller.yaml#L166
349+
- The test grid already has tests for the feature.
350+
333351
### Upgrade / Downgrade Strategy
334352

335-
Not Applicable
353+
- For the case of older kubelet running with newer resizer, the kubelet must handle the newer fields introduced by this feature even if feature-gate is not enabled. Having said that, if kubelet is older and has no notion of this feature at all and `api-server` and `external-resizer` is newer, it is possible that kubelet will not properly update `allocatedResourceStatus` after expansion is finished on the node. This is a known limitation and if user wants to keep running older version of kubelet (older than v1.31 as of this writing and hence unsupported), then they MUST not update external-resizer.
354+
- In general `external-resizer` should only be upgraded(and `RecoverVolumeExpansionFailure` enabled), after `kubelet`, `api-server` and `kube-controller-manager` already has this feature enabled.
336355

337356
### Version Skew Strategy
338357

@@ -413,9 +432,15 @@ after expansion is complete even with older kubelet. No recovery from expansion
413432
### Monitoring requirements
414433

415434
###### How can an operator determine if the feature is in use by workloads?**
416-
Any volume that has been recovered will emit a metric: `operation_operation_volume_recovery_total{state='success', volume_name='pvc-abce'}`.
417-
418-
435+
436+
The Recovery feature as such designed does not require if external operator must be able to observe
437+
if the feature is in-use by the workloads. The reason is, proposed changes already enhance
438+
user experience of resizing workflow by providing additional states such as `allocatedResourceStatus`
439+
and `allocatedResources` in pvc's status and there is nothing special about recovery in itself.
440+
441+
Operators can already observe if volume's are being resized by using `pvc.Status.Conditions` and other
442+
metrics documented below.
443+
419444
###### How can someone using this feature know that it is working for their instance?
420445

421446
Since feature requires user interaction, reducing the size of the PVC is only supported
@@ -425,13 +450,17 @@ if this feature-gate is enabled.
425450

426451
- [X] Metrics
427452
- controller expansion operation duration:
428-
- Metric name: storage_operation_duration_seconds{operation_name=expand_volume}
453+
- Metric name: csi_sidecar_operations_seconds{method_name="/csi.v1.Controller/ControllerExpandVolume}
429454
- [Optional] Aggregation method: percentile
430-
- Components exposing the metric: kube-controller-manager
455+
- Components exposing the metric: external-resizer
431456
- controller expansion operation errors:
432-
- Metric name: storage_operation_errors_total{operation_name=expand_volume}
457+
- Metric name: csi_sidecar_operations_seconds{method_name="/csi.v1.Controller/ControllerExpandVolume, grpc_status_code!="OK"}
433458
- [Optional] Aggregation method: cumulative counter
434-
- Components exposing the metric: kube-controller-manager
459+
- Components exposing the metric: external-resizer
460+
- CSI node expansion operation durations:
461+
- Metric name: csi_operations_seconds{method_name="/csi.v1.Controller/NodeExpandVolume}
462+
- [Optional] Aggregation method: cumulative counter
463+
- Components exposing the metric: kubelet
435464
- node expansion operation duration:
436465
- Metric name: storage_operation_duration_seconds{operation_name=volume_fs_resize}
437466
- [Optional] Aggregation method: percentile
@@ -446,18 +475,14 @@ if this feature-gate is enabled.
446475
###### What are the reasonable SLOs (Service Level Objectives) for the above SLIs?
447476

448477
After this feature is rolled out, there should not be any increase in 95-99 percentile of
449-
both `expand_volume` and `volume_fs_resize` durations. Also the error rate should not increase for
450-
`storage_operation_errors_total` metric.
451-
452-
###### Are there any missing metrics that would be useful to have to improve observability if this feature?
453-
454-
We are planning to add new counter metrics that will record success and failure of recovery operations.
455-
In cases where recovery fails, the counter will forever be increasing until an admin action resolves the error.
478+
both `csi_sidecar_operations_seconds{method_name="/csi.v1.Controller/ControllerExpandVolume"}` and `storage_operation_duration_seconds{operation_name=volume_fs_resize}` durations.
479+
480+
Also the error rate should not increase for `csi_sidecar_operations_seconds{method_name="/csi.v1.Controller/ControllerExpandVolume"}` metric.
456481

457-
Tentative name of metric is - `operation_operation_volume_recovery_total{state='success', volume_name='pvc-abce'}`
482+
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
458483

459-
The reason of using PV name as a label is - we do not expect this feature to be used in a cluster very often
460-
and hence it should be okay to use name of PVs that were recovered this way.
484+
We think that existing resizing related metrics are sufficient and we do as such think we need to introduce new
485+
metrics for tracking this feature.
461486

462487
### Dependencies
463488

@@ -474,7 +499,7 @@ previous answers based on experience in the field._
474499

475500
###### Will enabling / using this feature result in any new API calls
476501

477-
Potentially yes. If user expands a PVC and expansion fails on the node, then
502+
Yes - if user recovers a volume from failed expansion. When user expands a PVC and expansion fails on the node, then
478503
expansion controller in control-plane must intervene and verify if it is safe
479504
to retry expansion on the kubelet.This requires round-trips between kubelet
480505
and control-plane and hence more API calls.
@@ -484,9 +509,14 @@ previous answers based on experience in the field._
484509

485510
###### Will enabling / using this feature result in any new calls to cloud provider
486511

487-
Potentially yes. Since expansion operation is idempotent and expansion controller
512+
Since expansion operation is idempotent and expansion controller
488513
must verify if it is safe to retry expansion with lower values, there could be
489-
additional calls to CSI drivers (and hence cloudproviders).
514+
additional calls to CSI drivers (and hence cloudproviders), when user attempts recovery from
515+
failed expansion.
516+
517+
On the other hand - previously we retried expansion of failed volumes infinitiely and hence incurred
518+
significant api usage of both k8s and cloudprovider. This enhancement significantly reduces this by
519+
allowing recovery and slower pace of retries for infeasible expansion operations.
490520

491521
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
492522

keps/sig-storage/1790-recover-resize-failure/kep.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,12 +21,13 @@ see-also:
2121
replaces:
2222
superseded-by:
2323

24-
latest-milestone: "v1.32"
25-
stage: "alpha"
24+
latest-milestone: "v1.34"
25+
stage: "stable"
2626

2727
milestone:
2828
alpha: "v1.23"
2929
beta: "v1.32"
30+
stable: "v1.34"
3031

3132
feature-gates:
3233
- name: RecoverVolumeExpansionFailure

0 commit comments

Comments
 (0)