Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Smart Host Reboot Feature (suspend and resume resident VMs during reboot) #6750

Closed
olivierlambert opened this issue Mar 27, 2023 · 9 comments · Fixed by #6795
Closed

Comments

@olivierlambert
Copy link
Member

Context

Currently, when a user tries to reboot a host using host.reboot, the system attempts to live migrate VMs to other available hosts. However, if live migration is not possible (e.g., due to a lack of shared storage), the operation fails with a "no hosts available" message.
Objective

To enhance the user experience and provide a seamless reboot option, we aim to implement a new feature called "Smart Host Reboot." This feature will offer two alternatives when live migration is not possible:

  1. Force reboot: This existing option shuts down all VMs before rebooting the host. We can keep the existing "Force Reboot" button for this.
  2. Smart reboot: This new option will suspend all VMs, reboot the host, and then resume all VMs. This ensures minimal disruption for the VMs during the host reboot process.

Proposed Changes

In the UI, we will intercept the "no hosts available" error and present the user with the two reboot alternatives mentioned above. The smart reboot process will involve the following steps:

  1. Suspend all VMs on the host.
  2. Reboot the host.
  3. Resume all VMs after the host reboot is complete.

This will allow for a more seamless and user-friendly reboot experience, especially when live migration is not a viable option.

⚠️ If there's the XOA VM running on the host, it won't work (nobody will be there to resume the VMs). We need to think of a potential warning message.

@olivierlambert
Copy link
Member Author

@marcungeschikts we need to estimate the load and then to put than somewhere in the next release if it's doable

@vincentparrett
Copy link

" If there's the XOA VM running on the host, it won't work (nobody will be there to resume the VMs)."

This will be a problem - since with single servers it's quite likely the XOA vm is on that server. Not sure what you could do about that.. perhaps don't suspend the XOA vm, just shut it down so it will start on boot (since it's likely setup that way)?

@olivierlambert
Copy link
Member Author

If XOA is restarted, the "task" of resuming VMs will be interrupted. Single server (or even "no shared storage") doesn't mean it's the only machine in your entire environment.

It's not obvious if you have only one host (writing something in Other config? this needs to be discussed internally)

@vincentparrett
Copy link

Just to confirm my understanding, option 1 would shutdown the running vms cleanly?

@HPPinata
Copy link

@vincentparrett If the guest-tools are installed it works like any other xe vm-shutdown uuid=, clean shutdown in XO, etc. I haven't tested if there are shorter timeouts using the Force Reboot option, but have not encountered any issues using this option myself. (the default one in xapi I think is 1200 sec.). The guest is notified and shuts down cleanly as normal.

@vincentparrett
Copy link

vincentparrett commented Mar 27, 2023

I just tested this, force reboot on a server with mostly windows vms and a few linux vms - all have the tools installed.

I watched as the linux vms shutdown cleanly, then xoa and then the host stopped responding a few seconds later (not long enough to shutdown all the windows vms).

Edit - I confirmed from the windows eventlog that the windows vms did not shutdown cleanly.

@vincentparrett
Copy link

@olivierlambert Should I log a separate issue with regards to windows vm's not shutting down cleanly during a force reboot? All the windows vms have the xcp-ng tools installed.

@olivierlambert
Copy link
Member Author

Yes, we might take a look why by reading the XAPI timeout and such.

@julien-f julien-f changed the title Implement Smart Host Reboot Feature Implement Smart Host Reboot Feature (suspend and resume resident VMs during reboot) Apr 3, 2023
@julien-f
Copy link
Member

julien-f commented Apr 3, 2023

Case: detect and ignore (not suspend) XOA because it has auto power on.

julien-f added a commit that referenced this issue Apr 20, 2023
pdonias added a commit that referenced this issue Apr 25, 2023
Fixes #6750
See https://xcp-ng.org/forum/topic/7136

Suspend resident VMs, restart host and resume VMs
pdonias added a commit that referenced this issue Apr 25, 2023
Fixes #6750
See https://xcp-ng.org/forum/topic/7136

Suspend resident VMs, restart host and resume VMs
pdonias added a commit that referenced this issue Apr 26, 2023
Fixes #6750
See https://xcp-ng.org/forum/topic/7136
See #6791

Suspend resident VMs, restart host and resume VMs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants