Skip to content
This repository has been archived by the owner on Nov 9, 2020. It is now read-only.

In multinode mode, disallow deleting vmgroup which are not empty #1214

Closed
wants to merge 18 commits into from

Conversation

pshahzeb
Copy link
Contributor

@pshahzeb pshahzeb commented May 3, 2017

If vmgroup contains VMs, do now allow vmgroup rm. This is due to limitation of ESX host doesn't have information about VM residing on another host. This limitation only happens in MultiNode mode, but to keep consistency, we are enforcing this in both SingleNode and MultiNode.

Following are the test.
Two esx hosts, configured in multinode and each of them have one VMs. They are added to a shared vmgroup.

ESX1:

[root@sc2-rdops-vm06-dhcp-219-119:~] /usr/lib/vmware/vmdkops/bin/vmdkops_admin.py vmgroup create --name sharedVMGroup --vm-list ubuntu-VM0.1 --default-datastore sharedVmfs-0
vmgroup 'sharedVMGroup' is created. Do not forget to run 'vmgroup vm add' to add vm to vmgroup.
[root@sc2-rdops-vm06-dhcp-219-119:~] /usr/lib/vmware/vmdkops/bin/vmdkops_admin.py vmgroup ls
Uuid                                  Name           Description                Default_datastore  VM_list
------------------------------------  -------------  -------------------------  -----------------  ------------
11111111-1111-1111-1111-111111111111  _DEFAULT       This is a default vmgroup  _VM_DS
b44b173a-cbe6-4d05-ae3c-8c90e3287558  sharedVMGroup                             sharedVmfs-0       ubuntu-VM0.1

ESX2

[root@sc2-rdops-vm06-dhcp-209-223:~] /usr/lib/vmware/vmdkops/bin/vmdkops_admin.py vmgroup vm add --name sharedVMGroup --vm-list ubuntu-VM0.2
vmgroup vm add succeeded

[root@sc2-rdops-vm06-dhcp-209-223:~] /usr/lib/vmware/vmdkops/bin/vmdkops_admin.py vmgroup vm ls --name sharedVMGroup
Uuid                                  Name
------------------------------------  ------------
564d1a2a-8d8e-1c4f-a1b6-dacf99d3e2b0  ubuntu-VM0.1
564db3f0-b9e9-d036-51f2-c83d54edd602  ubuntu-VM0.2

[root@sc2-rdops-vm06-dhcp-209-223:~] /usr/lib/vmware/vmdkops/bin/vmdkops_admin.py vmgroup rm --name sharedVMGroup
Cannot delete non empty Vmgroup sharedVMGroup in Multinode mode. Remove VMs from the Vmgroup before deleting it.

Fixes #1188

# check if vms that are a part of this tenant have any volumes mounted.
# If they have, can't delete the tenant.
if tenant.vms:
logging.info("_tenant_rm. VMs in tenant are %s", tenant.vms)

# in multinode mode, we cannot find VM residing on another host
# so a gracefull way to admin know about the error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: graceful?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to fix the inline comment something like in multinode mode, we cannot find VM residing on another host so stopping tenant deletion gracefully

@@ -87,6 +88,7 @@ class ErrorCode:
ErrorCode.TENANT_SET_ACCESS_PRIVILEGES_FAILED : "Vmgroup {0} set access privileges on datastore {1} failed with err: {2}",
ErrorCode.TENANT_GET_FAILED : "Get vmgroup {0} failed",
ErrorCode.TENANT_NAME_INVALID : "Vmgroup name {0} is invalid, only {1} is allowed",
ErrorCode.TENANT_NOT_EMPTY : "Cannot delete non empty Vmgroup {0} in Multinode mode. Remove VMs from the Vmgroup before deleting it.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/Vmgroup/vmgroup/g

Copy link
Contributor

@govint govint May 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which also becomes stop containers in VMs before moving from vmgroup.

To which vmgroup should the VMs be moved to?

Pls. update documentation to specify the steps to do this.

Beginning to think these x-vmgroup (cross vmgroup) movements of VMs are actually parallels to vmotion!! If a VM has to be moved from one VM group to another we chould possibly do this (definitely not this PR):

  1. Add access to the currently used datastores to the destination vmgroup (option to --retain current location of disks) - its like vmotion where the VM and its containers can continue using the same datastores as before. This is doable now.

  2. Or, the VM is moved to the new VM group and the disks are migrated by initiating a storage vmotion of the in use disks (independent persistent disks cannot be migrated for now, may be with FCD this will be doable later).

Copy link
Contributor

@govint govint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls. document the steps for a user (from container level) what they must do before removing a vmgroup.

vmgroups are logical collections with physical entities - VMs and datastores. And movements between vmgroups can be impactful right from stopping user workloads (services, apps) and must be done very rarely with tested approaches.

This PR needs tests for the proposed approach to handling VM migration out of a vmgroup. Or make a separate PR for the doc changes.

@pshahzeb
Copy link
Contributor Author

pshahzeb commented May 3, 2017

@govint makes sense.
I can make a separate PR for the steps to handle VM migration out of a vmgroup.

@pshahzeb
Copy link
Contributor Author

pshahzeb commented May 4, 2017

Did the doc update here itself.
@govint @shuklanirdesh82 @shaominchen Please take a look

@pshahzeb pshahzeb requested a review from shaominchen May 4, 2017 02:59
Copy link
Contributor

@shaominchen shaominchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do have some concern about the consistency: with this change, the behavior between SingleNode and MultiNode will be inconsistent: In SingleNode we do allow deleting non-empty vm-group, but in MultiNode we don't. For now with current design we don't seem to have better solutions.

@@ -283,6 +283,11 @@ Uuid Name Description Defa
11111111-1111-1111-1111-111111111111 _DEFAULT This is a default vmgroup
```

In multinode mode, VMs from different hosts can be a part of a vmgroup.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the terms in document consistent:

  1. multinode => MultiNode
  2. vmgroup => vm-group

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. We moved from vm-group to vmgroup everywhere.

@@ -283,6 +283,11 @@ Uuid Name Description Defa
11111111-1111-1111-1111-111111111111 _DEFAULT This is a default vmgroup
```

In multinode mode, VMs from different hosts can be a part of a vmgroup.
When in this mode, vmgroup which has member VMs in it cannot be directly deleted.
First remove remove the VMs individually from the vmgroup using admin cli
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: duplicate "remove"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

they belong is not permitted. Make sure no volumes are attached.
To do so:
1. Get the list of containers running. (docker ps)
2. If the container has any dvs volume mounted (docker inspect container_name), stop the container.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: dvs => vDVS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -559,22 +559,29 @@ def _tenant_rm(name, remove_volumes=False):
error_info = generate_error_info(ErrorCode.TENANT_NOT_EXIST, name)
return error_info

error_info, auth_mgr = get_auth_mgr_object()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: remove this empty line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -87,6 +88,7 @@ class ErrorCode:
ErrorCode.TENANT_SET_ACCESS_PRIVILEGES_FAILED : "Vmgroup {0} set access privileges on datastore {1} failed with err: {2}",
ErrorCode.TENANT_GET_FAILED : "Get vmgroup {0} failed",
ErrorCode.TENANT_NAME_INVALID : "Vmgroup name {0} is invalid, only {1} is allowed",
ErrorCode.TENANT_NOT_EMPTY : "Cannot delete non empty vmgroup {0} in Multinode mode. Remove VMs from the vmgroup before deleting it.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: non empty => non-empty; Multinode => MultiNode

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@pshahzeb
Copy link
Contributor Author

pshahzeb commented May 4, 2017

@shaominchen I agree. Once we have some central control, we can get rid of such strict checks.

Copy link
Contributor

@shaominchen shaominchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -283,6 +283,11 @@ Uuid Name Description Defa
11111111-1111-1111-1111-111111111111 _DEFAULT This is a default vmgroup
```

In MultiNode mode, VMs from different hosts can be a part of a vmgroup.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we are using "vmgroup" and "vm-group" in the same document - better to use same term consistently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. In some places it was still vm-group. Replaced it with vmgroup

Copy link
Contributor

@shuklanirdesh82 shuklanirdesh82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor/nits. code looks OK to me.

Please supply comment before merging it.

@@ -283,6 +283,11 @@ Uuid Name Description Defa
11111111-1111-1111-1111-111111111111 _DEFAULT This is a default vmgroup
```

In MultiNode mode, VMs from different hosts can be a part of a vmgroup.
When in this mode, vmgroup which has member VMs in it cannot be directly deleted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: cannot be directly deleted => cannot be deleted directly.

To do so:
1. Get the list of containers running. (docker ps)
2. If the container has any vDVS volume mounted (docker inspect container_name), stop the container.
3. Ensure that the dvs volumes have status detached (docker volume inspect)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: dvs => vDVS, let's stick with the same name to avoid confusion or may be just vsphere driver volumes to keep it simple.

* Update README.md

* Update README.md
error_info = generate_error_info(ErrorCode.TENANT_NOT_EMPTY, name)
logging.error(error_info.msg)
return error_info

error_info = vmdk_utils.check_volumes_mounted(tenant.vms)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L578 is dead code right? Should be remvoed.

# check if vms that are a part of this tenant have any volumes mounted.
# If they have, can't delete the tenant.
if tenant.vms:
logging.info("_tenant_rm. VMs in tenant are %s", tenant.vms)

# in multinode mode, we cannot find VM residing on another host so stopping
# tenant deletion gracefully
if auth_mgr.mode == auth_data.DBMode.MultiNode:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only in MultiNode? Keep it consistent and also simplifies code by avoiding unnecessary ifs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same comment above. I will take over and address this.

Shaomin Chen and others added 4 commits May 5, 2017 11:53
* Rescanning refCounts if first attempt during driver init fails

1. If the attempt during first refCounting during driver init fails, create a timer and
redo the refCounting after a delay.

2. Refcounting with single map and synchronization between mounts and refcounting thread.
Skipping unmounts until refcount has been successful.
exponential backoff delay to schedule a refcount reattempt.
* Revert "Rescanning refCounts if first attempt during driver init fails (#1178)"

This reverts commit 40ab3be.

* Revert "Update README.md [SKIP CI] (#1222)"

This reverts commit 9da7475.
Copy link
Contributor

@pdhamdhere pdhamdhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pdhamdhere
Copy link
Contributor

@pshahzeb Please fix address CI failures and merge.

@shaominchen
Copy link
Contributor

@pdhamdhere I've taken over this PR. Some test cases are broken due to the change that a tenant to be removed must be empty. I'm fixing this right now.

@shaominchen
Copy link
Contributor

@pshahzeb @shuklanirdesh82 Can you please take a quick look at my latest commits, which fixed the broken tests?

d47694d
3bac992

@shuklanirdesh82
Copy link
Contributor

@shaominchen Please rebase with master correctly .. this PR's diff is showing unwanted files.

@shaominchen
Copy link
Contributor

@shuklanirdesh82 It's a little bit tricky to fix the wrong commit history due to rebase. I will simply create a new PR for this.

shaominchen pushed a commit that referenced this pull request May 9, 2017
…h ESXs

(See original PR #1214 for the details of review comments)
@shaominchen
Copy link
Contributor

Replaced with PR #1231

@shaominchen shaominchen closed this May 9, 2017
shuklanirdesh82 pushed a commit that referenced this pull request May 10, 2017
…h ESXs (#1231)

* Fix issue #1188: vmgroup is not removable when vms are added from both ESXs

(See original PR #1214 for the details of review comments)

* Trivial: add a new line at the end of the file

* Addressed Nirdesh's comments.
@shuklanirdesh82 shuklanirdesh82 deleted the issue_1188.pshahzeb branch May 13, 2017 13:47
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants