-
Notifications
You must be signed in to change notification settings - Fork 1.5k
KEP-961: Bump Statefulset MaxUnavailable to beta (built on top of #4474) #5228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-961: Bump Statefulset MaxUnavailable to beta (built on top of #4474) #5228
Conversation
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: Lucas Severo Alves <lseveroa@redhat.com>
/assign @janetkuo |
cluster required to make on upgrade, in order to maintain previous behavior? | ||
- This is a new field, it will only be enabled after an upgrade. To maintain the | ||
previous behavior, you can disable the feature gate manually in Beta or leave | ||
the field unconfigured. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can discuss the case of when existing clusters already use this feature in alpha before upgrade.
Also, discuss what happens if the feature gate was disabled, and the field was set, before upgrading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added both cases, PTAL
If their StatefulSet rollingUpdate section has the field maxUnavailable specified with | ||
a value different than 1. | ||
The below command should show maxUnavailable value: | ||
a value different than 1. While in alpha and beta, the feature-gate needs to be enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can still set it to 1 when this feature is enabled, right? Perhaps they just need to check the existence of the maxUnavailable field, and that the feature gate is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, it can be both. During rollout though you won't see a behavioral change if it is set to 1, but you will see it if it set to anything else. As an example, LWS sets the value even if the feature flag was disabled. The only way we could tell it wasn't enabled by default was by testing it with a value larger than one.
Co-authored-by: Janet Kuo <chiachenk@google.com>
Co-authored-by: Janet Kuo <chiachenk@google.com>
Co-authored-by: Janet Kuo <chiachenk@google.com>
Co-authored-by: Janet Kuo <chiachenk@google.com>
/label tide/merge-method-squash |
Co-authored-by: Maciej Szulik <soltysh@gmail.com>
Co-authored-by: Maciej Szulik <soltysh@gmail.com>
Co-authored-by: Maciej Szulik <soltysh@gmail.com>
Co-authored-by: Maciej Szulik <soltysh@gmail.com>
Co-authored-by: Maciej Szulik <soltysh@gmail.com>
Co-authored-by: Maciej Szulik <soltysh@gmail.com>
Co-authored-by: Maciej Szulik <soltysh@gmail.com>
Co-authored-by: Maciej Szulik <soltysh@gmail.com>
PRR shadow: Taking a look before @wojtek-t. @soltysh calls out here that there are a lot of TODO, bugs and metrics to add for alpha -> beta generation. I even see unresolved threads about existing bugs that have blocked the promotion before. I see that PRs are up for these. As part of https://github.com/kubernetes/enhancements/tree/master/keps/sig-architecture/5241-beta-featuregate-promotion-requirements#summary, a requirement for features to be promoted to beta is that metrics, features and bugs are all complete. I think the PRR seems fine to me but I want to call out that all these items must be complete before the feature gate gets turned on. My main concern is beta requirements. This seems like a pretty hefty list to achieve in a few weeks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two minor comments from me - other than that it LGTM from PRR pov and is waiting for SIG approval from @soltysh
|
||
Administrators should monitor the following metrics to assess the need for a rollback: | ||
|
||
- `statefulset_unavailability_violation`: Ths metric reflects the number of times the maxUnavailable condition is violated (i.e. spec.replicas - availableReplicas > maxUnavailable). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please ensure that this metric will get added before enabling the feature by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
I think that's the best path forward. I will personally ensure we DON'T enable the feature gate before all the missing bits pointed out here are addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
from sig-apps pov
/hold
for @wojtek-t PRR approval
The PRR looks fine - please just ensure that the following happen before FG is turned on to beta:
/approve PRR |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Edwinhr716, soltysh, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
Supersedes #4474
Metrics PR: kubernetes/kubernetes#130951