Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CP-49665: VM anti-affinity support for host evacuation - implemented with Psq #5652

Conversation

gangj
Copy link
Contributor

@gangj gangj commented May 24, 2024

Host evacuation plan with anti-affinity support will be carried out in 3 phases:

  1. Try to get a "spread evenly" plan for anti-affinity VMs, done if
    all the rest VMs got planned using binpack, otherwise continue.
  2. Try to get a "no breach" plan for anti-affinity VMs, done if all the rest VMs
    got planned using binpack, otherwise continue.
  3. Carry out a binpack plan ignoring VM anti-affinity.

@gangj gangj changed the title CP-49665: VM anti-affinity support for host evacuation - implement with Psq CP-49665: VM anti-affinity support for host evacuation - implemented with Psq May 24, 2024
@gangj gangj force-pushed the private/gangj/CP-48625.Psq branch from 1de8b31 to ec11a05 Compare May 27, 2024 05:44
@gangj gangj force-pushed the private/gangj/CP-48625.Psq branch from ec11a05 to dd9d98d Compare May 28, 2024 05:10
@gangj gangj force-pushed the private/gangj/CP-48625.Psq branch 3 times, most recently from 7910a4b to 965098a Compare May 29, 2024 07:46
@gangj gangj requested a review from minglumlu May 29, 2024 08:22
Gang Ji added 3 commits May 30, 2024 19:01
Add func host_to_vm_count_map to be used, rename RefMap to HostMap

Signed-off-by: Gang Ji <gang.ji@citrix.com>
fixup review comments

Signed-off-by: Gang Ji <gang.ji@citrix.com>
Signed-off-by: Gang Ji <gang.ji@citrix.com>
@gangj gangj force-pushed the private/gangj/CP-48625.Psq branch from 965098a to 487b289 Compare May 30, 2024 11:37
@gangj
Copy link
Contributor Author

gangj commented May 30, 2024

Rebased to include this commit from master: c0e5dc4, so that the commit to support more than 3 hosts in UT is not needed now.

@gangj gangj force-pushed the private/gangj/CP-48625.Psq branch from 487b289 to 4529230 Compare May 30, 2024 11:45
Copy link
Member

@robhoes robhoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a lot clearer than the previous PR, now that the state is tracked with PSQ and without the complicated state-update mechanism. Also thank you for adding elaborate comments in the code. I personally would have chosen different names for some of the values, but the comments make things clearer. It's worth considering Colin's comments, but otherwise I'm happy with this now.

| Some ((host, (_, h_size)), rest_hosts) -> (
match vm_size <= h_size with
| true ->
Some (host, hosts_psq)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only time hosts_psq is not equal to the pool_state if the "minimum" host from the PSQ does not have enough memory to run the VM. In this case, that host is removed from the PSQ and this smaller PSQ is eventually returned to compute_spread_evenly_plan. The latter then continues with the next VM in the list, which will never get a chance to get planned onto the host removed above.

This relies on the fact that the list of VMs is sorted from small to large, such that any following VM will certainly not fit on hosts that have been removed from the pool_state. So this is an optimisation for performance, not necessary for correctness. Is that right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relies on the fact that the list of VMs is sorted from small to large...

Yes. And the checking is only on the VM size. E.g. in previous PR, it would check the can_boot_on, then the current improvement would not work, as even if the current VM couldn't boot on a host, a bigger VM might do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is only an optimization: if a host is smaller than the current VM, it doesn't need to be considered for the next VM.
Added some comments to make it clear.

Copy link
Contributor Author

@gangj gangj May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally would have chosen different names for some of the values, but the comments make things clearer.

@robhoes , would you please point those names and a better way for them? I think not only for this time but also will be helpful for later PRs, thanks~

in

let ( let* ) o f = match o with None -> f None | p -> p in
(let* _no_plan =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite confusing, as the syntax looks like the plan is just always thrown away...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's failed to plan, and has to continue with the next step.
Otherwise, the plan will be returned directly.
This is the opposite of usual pattern succeeded and then continue.

ocaml/xapi/xapi_ha_vm_failover.ml Outdated Show resolved Hide resolved
@gangj gangj force-pushed the private/gangj/CP-48625.Psq branch from 4529230 to f01ac08 Compare May 31, 2024 06:42
@gangj gangj force-pushed the private/gangj/CP-48625.Psq branch from f01ac08 to 089c0fd Compare May 31, 2024 09:01
Gang Ji added 3 commits May 31, 2024 17:07
Host evacuation plan with anti-affinity support will be carried out in 3
phases:

1. Try to get a "spread evenly" plan for anti-affinity VMs, done if
   all the rest VMs got planned using binpack, otherwise continue.
2. Try to get a "no breach" plan for anti-affinity VMs, done if all the
   rest VMs got planned using binpack, otherwise continue.
3. Carry out a binpack plan ignoring VM anti-affinity.

Signed-off-by: Gang Ji <gang.ji@citrix.com>
Add "groups" "power_state" for VM, vm set_resident_on host

Signed-off-by: Gang Ji <gang.ji@citrix.com>
Signed-off-by: Gang Ji <gang.ji@citrix.com>
@gangj gangj force-pushed the private/gangj/CP-48625.Psq branch from 089c0fd to c626cdb Compare May 31, 2024 09:08
@minglumlu minglumlu merged commit d9c6535 into xapi-project:feature/vm-anti-affinity Jun 3, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants