-
Notifications
You must be signed in to change notification settings - Fork 292
CP-28406: improve logging and error checking in clustering functions #3614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CP-28406: improve logging and error checking in clustering functions #3614
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good PR, this cleanup is necessary. Unless Gabor has any quibbles, I think this can be merged.
Xapi_clustering.Daemon.disable ~__context | ||
| Result.Error error -> | ||
D.warn "Error occurred during Cluster.destroy"; | ||
handle_error error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good, we were unnecessarily duplicating code.
Xapi_clustering.Daemon.disable ~__context | ||
| Result.Error error -> | ||
warn "Error occurred during Cluster_host.force_destroy"; | ||
handle_error error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also nicely done!
ocaml/xapi/xapi_clustering.ml
Outdated
) srs | ||
then raise Api_errors.(Server_error (cluster_stack_in_use, [ cluster_stack ])) | ||
|
||
let daemon_enabled = ref false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want a helper to query this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually ignore that, the code is good to merge
Just realized that we do need to prevent Clusters without cluster hosts, otherwise pool autojoin won't work, and in fact a Cluster object without cluster hosts is unusable. |
We should merge #3607 first and then this one needs to be rebased. |
Locking_helpers provides a nice list of active locks as part of `xe host-get-thread-diagnostics`, which helps debugging. Signed-off-by: Edwin Török <edvin.torok@citrix.com>
This is quite useful for tracing what Xapi_cluster_host methods are called, and together with the previous commit for tracking deadlocks. Signed-off-by: Edwin Török <edvin.torok@citrix.com>
…ustering daemon is down Cluster.destroy did not work if you destroyed the last cluster host (like some of the unit tests actually did). Cluster_host.destroy on the last node is special: there is no cluster after you leave (leave would fail in reality), so just destroy the cluster. Refactor all 3 destroy operations into one, choose automatically based on number of hosts. Could've introduced a new API error to forbid destroying the last cluster host, but it is better if XAPI is able to automatically do the right thing than to tell the user it should call some other API instead. Also with Cluster_host.force_destroy you could have already ended up in a situation where you have no cluster hosts and want to destroy the cluster, which would hang indefinitely because the daemon was stopped. We always try to enable the daemon on startup, so keep track on whether we think it should be running, and if we know we stopped it then just raise an error when trying to do an RPC. This is useful in debugging situations where we try to send RPCs too early too (e.g. before we started the daemon). Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Rebased on latest branch. |
This was useful in debugging problems with @minishrink PR.