Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smart Reboot Feature Doesn't Automatically Resume VMs. #7194

Closed
ajpri opened this issue Nov 21, 2023 · 11 comments · Fixed by #7231
Closed

Smart Reboot Feature Doesn't Automatically Resume VMs. #7194

ajpri opened this issue Nov 21, 2023 · 11 comments · Fixed by #7231

Comments

@ajpri
Copy link

ajpri commented Nov 21, 2023

Describe the bug

When using Xen Orchestra Community Edition 5fe53 utilizing the "Smart Reboot" feature VMs will suspend and the host will reboot. Once the host is rebooted, resident VMs will not automatically resume. I haven't gotten this feature to work.

System Specs:

  • Host has a Ryzen 9 3900X, although behavior has also been observed with an Intel Core i7-6700/i5-6500.
  • SR is a NFS mount, from TrueNAS Core.
  • VM for this specific test/bug report is Fedora Server 36 with Management agent 8.2.0-2, although has been observed with other VMs (Debian, Ubuntu).
  • My Xen Orchestra is NOT on my XCP-ng pool, it's a VM on the TrueNAS host.

To Reproduce
Steps to reproduce the behavior:

  1. Go to Smart Reboot in Host
  2. Observe VMs suspend
  3. Observe host reboot
  4. VMs will not resume

Expected behavior
VMs will resume after/shortly after host reboot.

Screenshots
For the most part, my VMs are the same.
image
image

Hosts also have a very similar configuration:
image
image

Environment (please provide the following information):

  • Node: 18.18.2
  • Hypervisor: XCP-ng 8.2, fully up to date
@Danp2
Copy link
Collaborator

Danp2 commented Nov 29, 2023

@ajpri You may want to review this PR and this thread.

Bottom line is that it won't work if the XO VM resides on the host being rebooted.

@ajpri
Copy link
Author

ajpri commented Nov 29, 2023

@Danp2 this is not the case. My XO VM is not a vm in XCP-ng.
VMs are suspending, The host is rebooting, but after reboot when the host is back online, VMs aren't being resumed.

@Danp2
Copy link
Collaborator

Danp2 commented Nov 29, 2023

@ajpri Ok... I missed that detail when I skimmed your OP. AFAIK, the XO instance is supposed to resume the VMs once the host has rebooted. Have you checked the XO logs for any obvious errors? Is your XO up-to-date?

@ajpri
Copy link
Author

ajpri commented Dec 1, 2023

@Danp2
XO was updated to 2dcb5, still occurring. Nothing in Settings/Logs.

@ajpri
Copy link
Author

ajpri commented Dec 1, 2023

Decided to make a video of it. https://youtu.be/n87jG5Vyk40

Rebooting host "104 - Lonely Star" with one resident VM "124 - WithoutYou". I am able to manually resume the VM at the 4:30 mark. Test happened at ~7:55pm. Other errors showing up were unrelated

@Danp2
Copy link
Collaborator

Danp2 commented Dec 1, 2023

Nothing in Settings/Logs

You will need to check the system logs. This is the command I typically use -- journalctl -u xo-server -f -n 50

@ajpri
Copy link
Author

ajpri commented Dec 1, 2023

Dec 01 12:49:12 livewire xo-server[48176]: XapiError: HOST_STILL_BOOTING()
Dec 01 12:49:12 livewire xo-server[48176]:     at Function.wrap (file:///opt/xo/xo-builds/xen-orchestra-202311301752/packages/xen-api/_XapiError.mjs:16:12)
Dec 01 12:49:12 livewire xo-server[48176]:     at default (file:///opt/xo/xo-builds/xen-orchestra-202311301752/packages/xen-api/_getTaskResult.mjs:11:29)
Dec 01 12:49:12 livewire xo-server[48176]:     at Xapi._addRecordToCache (file:///opt/xo/xo-builds/xen-orchestra-202311301752/packages/xen-api/index.mjs:999:24)
Dec 01 12:49:12 livewire xo-server[48176]:     at file:///opt/xo/xo-builds/xen-orchestra-202311301752/packages/xen-api/index.mjs:1033:14
Dec 01 12:49:12 livewire xo-server[48176]:     at Array.forEach (<anonymous>)
Dec 01 12:49:12 livewire xo-server[48176]:     at Xapi._processEvents (file:///opt/xo/xo-builds/xen-orchestra-202311301752/packages/xen-api/index.mjs:1023:12)
Dec 01 12:49:12 livewire xo-server[48176]:     at Xapi._watchEvents (file:///opt/xo/xo-builds/xen-orchestra-202311301752/packages/xen-api/index.mjs:1196:14) {
Dec 01 12:49:12 livewire xo-server[48176]:   code: 'HOST_STILL_BOOTING',
Dec 01 12:49:12 livewire xo-server[48176]:   params: [],
Dec 01 12:49:12 livewire xo-server[48176]:   call: undefined,
Dec 01 12:49:12 livewire xo-server[48176]:   url: undefined,
Dec 01 12:49:12 livewire xo-server[48176]:   task: task {
Dec 01 12:49:12 livewire xo-server[48176]:     uuid: 'c7ffafbf-5d96-9d29-be6b-77f2977d3ec0',
Dec 01 12:49:12 livewire xo-server[48176]:     name_label: 'Async.host.enable',
Dec 01 12:49:12 livewire xo-server[48176]:     name_description: '',
Dec 01 12:49:12 livewire xo-server[48176]:     allowed_operations: [],
Dec 01 12:49:12 livewire xo-server[48176]:     current_operations: {},
Dec 01 12:49:12 livewire xo-server[48176]:     created: '20231201T18:49:11Z',
Dec 01 12:49:12 livewire xo-server[48176]:     finished: '20231201T18:49:12Z',
Dec 01 12:49:12 livewire xo-server[48176]:     status: 'failure',
Dec 01 12:49:12 livewire xo-server[48176]:     resident_on: 'OpaqueRef:d971ee55-aaa1-4257-81af-c771473a8a86',
Dec 01 12:49:12 livewire xo-server[48176]:     progress: 1,
Dec 01 12:49:12 livewire xo-server[48176]:     type: '<none/>',
Dec 01 12:49:12 livewire xo-server[48176]:     result: '',
Dec 01 12:49:12 livewire xo-server[48176]:     error_info: [ 'HOST_STILL_BOOTING' ],
Dec 01 12:49:12 livewire xo-server[48176]:     other_config: {},
Dec 01 12:49:12 livewire xo-server[48176]:     subtask_of: 'OpaqueRef:NULL',
Dec 01 12:49:12 livewire xo-server[48176]:     subtasks: [],
Dec 01 12:49:12 livewire xo-server[48176]:     backtrace: '(((process xapi)(filename ocaml/xapi-client/client.ml)(line 7))((process xapi)(filename ocaml/xapi-client/client.ml)(line 19))((pro
cess xapi)(filename ocaml/xapi-client/client.ml)(line 8415))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename ocaml/xapi/rbac.ml)(line 20
5))((process xapi)(filename ocaml/xapi/server_helpers.ml)(line 95)))'
Dec 01 12:49:12 livewire xo-server[48176]:   }
Dec 01 12:49:12 livewire xo-server[48176]: }

The host is enabled a moment after this error. Maybe being called too early? I tried doing a "normal" reboot with no resident VMs running, no errors/abnormal behavior.

@olivierlambert
Copy link
Member

Thanks for the feedback, I think it should retry, I don't know why it's not. Pinging @julien-f

@julien-f
Copy link
Member

julien-f commented Dec 8, 2023

@ajpri Can you try the branch host_smartReboot-STILL_BOOTING and tell me whether it fixes this issue?

@ajpri
Copy link
Author

ajpri commented Dec 8, 2023

@julien-f Sure! Will test over the weekend.

@ajpri
Copy link
Author

ajpri commented Dec 10, 2023

@julien-f @olivierlambert
Installed the December patches with Smart Reboot. Worked perfect! Branch fixes the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants