-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release-3.6] Add an interface to query downgrade status #19439
Comments
Thanks @fuweid for raising this discussion. I think it'd be better to add an interface for users to query the downgrade status anytime they want.
I was thinking to remove the AUTO Downgrade Cancellation. The original thought is that users explicitly start/enable the downgrade process, then they should explicitly stop/cancel downgrade as well. However, we should be good as long as we provide an interface for users to query the downgrade status as mentioned above. Also AUTO Downgrade Cancellation has a little benefit, as it automatically stops/cancel the downgrade for users after completion. Please let me know your thought. thx |
I think it is better to keep the AUTO Downgrade Cancellation, because in real world use cases, most downgraded clusters would not be upgraded immediately. Adding extra manual step just increases ops overhead, and more prone to errors if that step is forgotten. |
Sounds good to me. It's more useful for admin to confirm that downgrade process is already finished. |
@fuweid Do you have bandwidth to add the downgrade query API? We need to backport it to release-3.6. |
Sure. Self-assigned |
@ivanvc @jmhbnz Once this feature gets done, I think we should release Hopefully this feature can be done this week or early next week (@fuweid I just added label "priority/important-soon" to this feature, please feel free to let me know if you need any help), and we can release |
close by #19456 |
Bug report criteria
What happened?
We don't have an API to indicate when the downgrade process is complete. Cluster administrators must manually check the cluster, storage, and server versions from the member status. Once they confirm that all members have reached the target version, they can consider the downgrade process finished. If administrators do not manually cancel the downgrade and instead upgrade the members immediately after the downgrade, they may encounter an issue where the target version is never reached.
There is an example: Three-Members Cluster
By default, the leader cancels the downgrade process once all the members are ready in target version.
etcd/server/etcdserver/server.go
Lines 2394 to 2411 in eb7607b
If the leader doesn't cancel the downgrade in time before
T3
, afterT3
, all members will remain at cluster version v3.5.0, and the upgrade process will not complete until the administrator manually cancels the downgrade process.This scenario is uncommon in real-world use cases (upgrading immediately after a downgrade), but we encountered this issue in a robustness test case (#19306)
Maybe we should consider removing that monitor and force the administrator to cancel downgrade process when it's finish.
ping @ahrtr @siyuanfoundation @serathius @wenjiaswe
What did you expect to happen?
After
T3
, all members can upgrade to v3.6.0.How can we reproduce it (as minimally and precisely as possible)?
Anything else we need to know?
No response
Etcd version (please run commands below)
Etcd configuration (command line flags or environment variables)
paste your configuration here
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
Relevant log output
The text was updated successfully, but these errors were encountered: