VMDK files get deleted when node is deleted #55

benweet · 2019-09-26T14:49:52Z

We're using the vSphere cloud provider over a Rancher provisioned cluster (vSphere 6.5 and Kubernetes 1.14.6). Here is the RKE configuration we use:

cloud_provider:
  name: vsphere
  vsphereCloudProvider:
    virtual_center:
      pcc-178-32-194-122.ovh.com:
        user: admin
        password: ********
        datacenters: pcc-178-32-194-122_datacenter3372
    workspace:
      server: pcc-178-32-194-122.ovh.com
      folder: "/"
      default-datastore: pcc-008769
      datacenter: pcc-178-32-194-122_datacenter3372

I managed to setup dynamic provisioning to automatically create persistent volumes with the following storage class:

> kubectl get storageclass vsphere-pcc-008769 -o yaml
allowVolumeExpansion: false
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    field.cattle.io/creatorId: user-vpp9j
    storageclass.beta.kubernetes.io/is-default-class: "true"
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: "2019-09-25T10:34:46Z"
  labels:
    cattle.io/creator: norman
  name: vsphere-pcc-008769
  resourceVersion: "1061432"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/vsphere-pcc-008769
  uid: 1866a708-df80-11e9-a284-005056b766cc
parameters:
  datastore: pcc-008769
  diskformat: thin
provisioner: kubernetes.io/vsphere-volume
reclaimPolicy: Delete
volumeBindingMode: Immediate

The datastore "pcc-008769" is an NFS 3 shared storage. The persistent volumes are backed by VMDK files automatically created in the kubevols folder inside the datastore. (I don't know where this folder name comes from?)

When I delete a worker node VM, either using kubectl/rancher or vSphere, it seems like the VMDK files created by this VM get all deleted as well, which is quite unexpected. Indeed, the stored data are lost and unrecoverable, and of course the pods are unabled to restart on another worker node:

> kubectl describe pod gitlab-gitaly-0 -n gitlab 
...
Events:
  Type     Reason              Age                 From                         Message
  ----     ------              ----                ----                         -------
  Normal   Scheduled           9m4s                default-scheduler            Successfully assigned gitlab/gitlab-gitaly-0 to k8s-usine-worker-1
  Warning  FailedAttachVolume  9m4s                attachdetach-controller      Multi-Attach error for volume "pvc-4e2b85e5-de10-11e9-a284-005056b766cc" Volume is already exclusively attached to one node and can't be attached to another
  Warning  FailedAttachVolume  35s (x9 over 3m)    attachdetach-controller      AttachVolume.Attach failed for volume "pvc-4e2b85e5-de10-11e9-a284-005056b766cc" : File []/vmfs/volumes/30035193-32e26c39/kubevols/kubernetes-dynamic-pvc-4e2b85e5-de10-11e9-a284-005056b766cc.vmdk was not found
  Warning  FailedMount         14s (x4 over 7m1s)  kubelet, k8s-usine-worker-1  Unable to mount volumes for pod "gitlab-gitaly-0_gitlab(a0455137-e06a-11e9-a284-005056b766cc)": timeout expired waiting for volumes to attach or mount for pod "gitlab"/"gitlab-gitaly-0". list of unmounted volumes=[repo-data]. list of unattached volumes=[repo-data gitaly-config gitaly-secrets init-gitaly-secrets etc-ssl-certs custom-ca-certificates default-token-8b7xt]

Is this the expected behavior? Is there a way we can prevent VMDK deletion?

Also, I've managed to create FCD VMDK files using the vSphere python SDK, these are not tied to any VM. Is it possible to configure the cloud provider to create FCDs?

The text was updated successfully, but these errors were encountered:

divyenpatel · 2019-09-27T20:23:23Z

@benweet This issue is resolved with vsphere 67u3 and CSI driver.

With vsphere 67u3 we have released bunch of new APIs , which allows provisioning volumes as First Class Disk (Managed Virtual Disk).

With these new APIs, Internally when volume is attached to the node VM, we are attaching disk with
keepAfterDeleteVm control flag is set to true, So when node VM is deleted. Disk remains on the datastore.

Please refer https://docs.vmware.com/en/VMware-vSphere/6.7/Cloud-Native-Storage/GUID-51D308C7-ECFE-4C04-AD56-64B6E00A6548.html for more details.

New CSI driver for vSphere is available at https://github.com/kubernetes-sigs/vsphere-csi-driver

divyenpatel · 2019-09-27T20:34:28Z

Is this the expected behavior? Is there a way we can prevent VMDK deletion?

To prevent this with in-tree provider, recommendation is to drain the node, let all pods re-scheduled on another node, let all volumes detached from the node and then finally delete the node VM.

benweet · 2019-09-28T10:22:55Z

Thanks for the clarification. Closing this.

benweet closed this as completed Sep 28, 2019

janeczku mentioned this issue Jun 29, 2020

vSphere deletes PVs when deleting single node rancher/rancher#18221

Closed

luthermonson mentioned this issue Sep 23, 2020

Vmware driver removes (!) attached PersistentVolumes while recreating worker nodes rancher/rancher#24690

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VMDK files get deleted when node is deleted #55

VMDK files get deleted when node is deleted #55

benweet commented Sep 26, 2019 •

edited

Loading

divyenpatel commented Sep 27, 2019

divyenpatel commented Sep 27, 2019

benweet commented Sep 28, 2019

VMDK files get deleted when node is deleted #55

VMDK files get deleted when node is deleted #55

Comments

benweet commented Sep 26, 2019 • edited Loading

divyenpatel commented Sep 27, 2019

divyenpatel commented Sep 27, 2019

benweet commented Sep 28, 2019

benweet commented Sep 26, 2019 •

edited

Loading