You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-storage/1790-recover-resize-failure/README.md
+51-21Lines changed: 51 additions & 21 deletions
Original file line number
Diff line number
Diff line change
@@ -10,6 +10,7 @@
10
10
-[Non-Goals](#non-goals)
11
11
-[Proposal](#proposal)
12
12
-[Making allocatedResourceStatus not change unnecessarily for every error in 1.31](#making-allocatedresourcestatus-not-change-unnecessarily-for-every-error-in-131)
13
+
-[Handling of RWX volumes that don't require node expansion](#handling-of-rwx-volumes-that-dont-require-node-expansion)
13
14
-[Making resizeStatus more general in v1.28](#making-resizestatus-more-general-in-v128)
### Handling of RWX volumes that don't require node expansion
146
+
147
+
There are CSI drivers which return `NodeExpansionRequired: false` after `ControllerExpandVolume` is finished, but in kubelet to handle the case of node-expansion of RWX volumes, we usually MUST call `NodeExpandVolume` on each node where volume is attached, even if `ControllerExpandVolume` is finished and no node-expansion is required. This special case *only* applies to RWX volumes.
148
+
149
+
To avoid calling `NodeExpandVolume` for such volumes, we are proposing that we add an annotation to PVC called - `volume.kubernetes.io/node-expansion-not-required` when `ControllerExpandVolume` is finished and `NodeExpansionRequired` is set to `false` - https://github.com/kubernetes-csi/external-resizer/pull/496/files .
150
+
151
+
In kubelet if a RWX volume has this annotation, no `NodeExpandVolume` will be called and and updated size of the volume will simply be recorded in actual state of the world.This is implemented in - https://github.com/kubernetes/kubernetes/pull/131907 .
152
+
153
+
The proposed mechanism is fully backward compatible and only applies to RWX volumes.
154
+
155
+
143
156
### Making resizeStatus more general in v1.28
144
157
145
158
After [some discussion](https://github.com/kubernetes/kubernetes/pull/116335#issuecomment-1624566731) with sig-storage folks, we are proposing that we rename `pvc.Status.ResizeStatus` to `pvc.Status.AllocatedResourceStatus` and make it a map.
@@ -330,9 +343,15 @@ There are already e2e tests running that verify correctness of this feature - ht
330
343
- There are already e2e tests running that verify correctness of this feature -
- The feature has been extensively tested in CI and deployed with drivers in real world. For example - https://github.com/kubernetes-sigs/azuredisk-csi-driver/blob/master/charts/latest/azuredisk-csi-driver/templates/csi-azuredisk-controller.yaml#L166
349
+
- The test grid already has tests for the feature.
350
+
333
351
### Upgrade / Downgrade Strategy
334
352
335
-
Not Applicable
353
+
- For the case of older kubelet running with newer resizer, the kubelet must handle the newer fields introduced by this feature even if feature-gate is not enabled. Having said that, if kubelet is older and has no notion of this feature at all and `api-server` and `external-resizer` is newer, it is possible that kubelet will not properly update `allocatedResourceStatus` after expansion is finished on the node. This is a known limitation and if user wants to keep running older version of kubelet (older than v1.31 as of this writing and hence unsupported), then they MUST not update external-resizer.
354
+
- In general `external-resizer` should only be upgraded(and `RecoverVolumeExpansionFailure` enabled), after `kubelet`, `api-server` and `kube-controller-manager` already has this feature enabled.
336
355
337
356
### Version Skew Strategy
338
357
@@ -413,9 +432,15 @@ after expansion is complete even with older kubelet. No recovery from expansion
413
432
### Monitoring requirements
414
433
415
434
###### How can an operator determine if the feature is in use by workloads?**
416
-
Any volume that has been recovered will emit a metric: `operation_operation_volume_recovery_total{state='success', volume_name='pvc-abce'}`.
417
-
418
-
435
+
436
+
The Recovery feature as such designed does not require if external operator must be able to observe
437
+
if the feature is in-use by the workloads. The reason is, proposed changes already enhance
438
+
user experience of resizing workflow by providing additional states such as `allocatedResourceStatus`
439
+
and `allocatedResources` in pvc's status and there is nothing special about recovery in itself.
440
+
441
+
Operators can already observe if volume's are being resized by using `pvc.Status.Conditions` and other
442
+
metrics documented below.
443
+
419
444
###### How can someone using this feature know that it is working for their instance?
420
445
421
446
Since feature requires user interaction, reducing the size of the PVC is only supported
@@ -425,13 +450,17 @@ if this feature-gate is enabled.
@@ -446,18 +475,14 @@ if this feature-gate is enabled.
446
475
###### What are the reasonable SLOs (Service Level Objectives) for the above SLIs?
447
476
448
477
After this feature is rolled out, there should not be any increase in 95-99 percentile of
449
-
both `expand_volume` and `volume_fs_resize` durations. Also the error rate should not increase for
450
-
`storage_operation_errors_total` metric.
451
-
452
-
###### Are there any missing metrics that would be useful to have to improve observability if this feature?
453
-
454
-
We are planning to add new counter metrics that will record success and failure of recovery operations.
455
-
In cases where recovery fails, the counter will forever be increasing until an admin action resolves the error.
478
+
both `csi_sidecar_operations_seconds{method_name="/csi.v1.Controller/ControllerExpandVolume"}` and `storage_operation_duration_seconds{operation_name=volume_fs_resize}` durations.
479
+
480
+
Also the error rate should not increase for `csi_sidecar_operations_seconds{method_name="/csi.v1.Controller/ControllerExpandVolume"}` metric.
456
481
457
-
Tentative name of metric is - `operation_operation_volume_recovery_total{state='success', volume_name='pvc-abce'}`
482
+
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
458
483
459
-
The reason of using PV name as a label is - we do not expect this feature to be used in a cluster very often
460
-
and hence it should be okay to use name of PVs that were recovered this way.
484
+
We think that existing resizing related metrics are sufficient and we do as such think we need to introduce new
485
+
metrics for tracking this feature.
461
486
462
487
### Dependencies
463
488
@@ -474,7 +499,7 @@ previous answers based on experience in the field._
474
499
475
500
###### Will enabling / using this feature result in any new API calls
476
501
477
-
Potentially yes. If user expands a PVC and expansion fails on the node, then
502
+
Yes - if user recovers a volume from failed expansion. When user expands a PVC and expansion fails on the node, then
478
503
expansion controller in control-plane must intervene and verify if it is safe
479
504
to retry expansion on the kubelet.This requires round-trips between kubelet
480
505
and control-plane and hence more API calls.
@@ -484,9 +509,14 @@ previous answers based on experience in the field._
484
509
485
510
###### Will enabling / using this feature result in any new calls to cloud provider
486
511
487
-
Potentially yes. Since expansion operation is idempotent and expansion controller
512
+
Since expansion operation is idempotent and expansion controller
488
513
must verify if it is safe to retry expansion with lower values, there could be
489
-
additional calls to CSI drivers (and hence cloudproviders).
514
+
additional calls to CSI drivers (and hence cloudproviders), when user attempts recovery from
515
+
failed expansion.
516
+
517
+
On the other hand - previously we retried expansion of failed volumes infinitiely and hence incurred
518
+
significant api usage of both k8s and cloudprovider. This enhancement significantly reduces this by
519
+
allowing recovery and slower pace of retries for infeasible expansion operations.
490
520
491
521
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
0 commit comments