Skip to content

Conversation

@gangj
Copy link
Contributor

@gangj gangj commented Dec 3, 2025

During rolling pool upgrade (RPU), RestartVM guidance should only be cleared when a VM restarts on a host that has been updated to match the coordinator's software version. Previously, the guidance was cleared whenever a VM restarted, regardless of the host's update status.

This commit ensures that RestartVM guidance persists until the VM restarts on an up-to-date host, this provides accurate feedback to administrators about which VMs still need restarting after RPU.

Also adds unit tests covering 6 scenarios:

  • VM restart on updated vs old host (via xenopsd)
  • VM halt on updated vs old host (via force_state_reset)
  • Suspended VM resume on updated vs old host

@gangj gangj force-pushed the private/gangj/CA-420533 branch 2 times, most recently from cff51af to ba48b75 Compare December 3, 2025 14:36
remove_pending_guidance ~__context ~self ~value:`restart_device_model ;
remove_pending_guidance ~__context ~self ~value:`restart_vm
(* Only remove RestartVM guidance if host is up-to-date with coordinator *)
let resident_on = Db.VM.get_resident_on ~__context ~self in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the branch of "state = `Halted". If the VM can't start on a host whose running xapi version lower than the one on the coordinator, this will not be necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is a great question!
After a VM is Halted on an up-to-date host, does it still required for it to be started on an up-to-date host to clear the RestartVM guidance?
I think it should be yes? Then we can't clear the RestartVM guidance here at all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the VM can't start on a host whose running xapi version lower than the one on the coordinator

I think it can only reboot in this case, where the host has is "older". That is where we should not clear the restart guidance. Starting on such a host is not allowed, so if a VM becomes halted, then the guidance can be removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think the change here should be removed, while we keep only the conditional in xapi_xenops.ml that handles the start and reboot case.

Copy link
Contributor Author

@gangj gangj Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, @robhoes

So I think the change here should be removed

Do you mean we should not change xapi_vm_lifecycle.ml this time, and keep it always remove `restart_vm guidance?

Starting on such a host is not allowed

What if the VM is halted on an old host and then will the VM be allowed to start on the same old host again?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean we should not change xapi_vm_lifecycle.ml this time, and keep it always remove `restart_vm guidance?

Yes I think so: for the case where the power_state will be halted.

What if the VM is halted on an old host and then will the VM be allowed to start on the same old host again?

No, because VM.start_on calls Helpers.Checks.RPU.assert_host_has_highest_version_in_pool in message_forwarding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, I see, thank you!

@gangj gangj force-pushed the private/gangj/CA-420533 branch from ba48b75 to 6db176c Compare December 5, 2025 06:43
During rolling pool upgrade (RPU), RestartVM guidance should only be
cleared when a VM restarts on a host that has been updated to match
the coordinator's software version. Previously, the guidance was
cleared whenever a VM restarted, regardless of the host's update
status.

This commit ensures that RestartVM guidance persists until the VM
restarts on an up-to-date host, this provides accurate feedback to
administrators about which VMs still need restarting after RPU.

Also adds unit tests covering 6 scenarios:
* VM start on updated vs old host (via xenopsd)
* Suspended VM resume on updated vs old host
* VM halt on updated vs old host (via force_state_reset)

Signed-off-by: Gang Ji <gang.ji@cloud.com>
@gangj gangj force-pushed the private/gangj/CA-420533 branch from 6db176c to f905d64 Compare December 5, 2025 10:00
@gangj gangj added this pull request to the merge queue Dec 5, 2025
Merged via the queue into xapi-project:master with commit fb2a1ab Dec 5, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants