-
Notifications
You must be signed in to change notification settings - Fork 292
[From REQ-477] Cherry-pick of shutdown changes #3473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[From REQ-477] Cherry-pick of shutdown changes #3473
Conversation
5ca00cb
to
b769833
Compare
Removed 'Remove scripts/examples/python/shutdown.py' - that's self-contained and will require a synchronised xenserver-specs PR, we'll make it separately later. |
Is this ready to go? Do we need to clean up the history a bit? |
b769833
to
36d113b
Compare
Tidied up the commits a bit, and we've reviewed and tested this in the feature branch. Might be useful if someone else takes a quick look as this affects the shutdown code of XAPI and is not feature flagged. SM has implemented various workarounds (like introducing another service that gets run after xapi is shutdown and unmounts the SR) for unmounts getting stuck on shutdown. |
Just noticed that its missing this commit: 4f497c8 |
36d113b
to
3143f43
Compare
(public_name xapi-client) | ||
(libraries ( | ||
mtime | ||
mtime.clock.os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to update the opam file as well?
let tasks_str = tasks |> List.map Ref.really_pretty_and_small |> String.concat "," in | ||
D.info "Waiting for tasks timed out on %s" tasks_str; | ||
false | ||
| _ -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should run ocp-indent
on this file after these changes.
|
You can still unplug the PBDs when the machines are suspended, I've checked that, and fixed the handling of paused VMs. |
b621150
to
0d5c7db
Compare
Please double check. Aside that, the current fixup version looks good to me. |
Might be useful to cancel the tasks as the python script did (the other XXX comment), e.g. if you were attempting to migrate a VM you probably can't hard shutdown it, will try to fix that. |
Thanks |
The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
a1aca01
to
ac5ad51
Compare
Thanks, squashed the fixups. |
Btw, I saw your discussion on slack and I was looking at the docs and the code. What you do is not different from how we use the current operations in other parts of xapi. It is unfortunate but we cannot really do much about it with a reasonable effort. |
Feel free to merge when appropriate |
This needs to be merged after #3470. |
I am half way through that PR |
I've pushed a fixup for the indentation, I think that was the only outstanding action on this PR. |
Yes, I think this is fine. Once #3470 is ready and merged, we can merge this as well. Feel free to squash the fixup commits |
If the HA daemon is not running it is safe to unplug the statefile VDI. Need to allow unplugging the statefile VDI if we want to unplug all PBDs on shutdown. Note: this fixes shutdown on HA slaves, but not on HA pool master. There the metadata VDI will stay plugged: CA-276993. Signed-off-by: Edwin Török <edvin.torok@citrix.com>
We need to unplug the PBDs before shutdown/reboot, while we still have functional networking. We should not rely only on systemd to unmount the filesystems, because SM would also expect PBD.unplug to get called on clean shutdown/reboot. For some SRs (like GFS2) we also need to perform additional cleanup operations that can only be done after all the SRs are unplugged. The unplug must be done after HA is disabled, otherwise the statefile might still be using it. There are 3 ways to shutdown: * stop xapi-domains.service, xe host-disable, xe host-shutdown * stop from XC which shuts down VMs, and then does the host shutdown * xe host-shutdown directly with VMs running (not prevented, but the docs say you shouldn't do this) Call the unplug in both places, this is a best effort unplug, i.e. unplug as many PBDs as we can and ignore errors. Signed-off-by: Edwin Török <edvin.torok@citrix.com>
We were trying to detach the metadata VDI while it was still in use. Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Slaves might still be trying to write to the master DB via RPC. If we turned off HA and the redo log in order to detach the static VDIs and unplug all PBDs then we must not allow more writes to the DB. Signed-off-by: Edwin Török <edvin.torok@citrix.com>
…timeout For CP-24693. Signed-off-by: Edwin Török <edvin.torok@citrix.com> Changes made to cherry-pick: Part of the xapi_pbd.ml change was clustering-specific/dependent, will have to go in separately update Stdext to Xapi_stdext_pervasives Signed-off-by: Callum McIntyre <callum.mcintyre@citrix.com>
7ab9c94
to
10d7fbb
Compare
I've copied the code written by @edwintorok in an older PR into vm_evacuation.ml. Changes made for cherry-pick: - Removed the change to ocaml/xapi/xapi_host_helpers.mli, will need to be fixed on a rebase (added in the iscsi changes) - Tweaked the datamodel changes to put them in datamodel_host.ml Don't do prechecks in Host.prepare_for_poweroff At that point we've already reached to point of no return: it's best if we carry on to evacuate and cleanly shut down our VMs and HA. Signed-off-by: Edwin Török <edvin.torok@citrix.com> Signed-off-by: Callum McIntyre <callum.mcintyre@citrix.com> Signed-off-by: Gabor Igloi <gabor.igloi@citrix.com>
10d7fbb
to
b5cded1
Compare
** Do not merge before #3470, that adds an mli file that I need to make a change to here, will need to fix after rebasing after that merges**. This builds because that mli file doesn't exist on master, once it's there I'll need to add the change to it. It's added over a few commits in #3470 though, more straightforward to rebase later.
Again, adding this one despite the rebase and small fix needed because it's significant changes and would appreciate feedback! Needed a fair bit of manual tweaking around the pbd.ml (untangling clustering related changes from the bug fixes) and the datamodel due to the split.