Skip to content
This repository has been archived by the owner on Feb 4, 2022. It is now read-only.
This repository has been archived by the owner on Feb 4, 2022. It is now read-only.

VMDK files get deleted when node is deleted #55

Closed
benweet opened this issue Sep 26, 2019 · 3 comments
Closed

VMDK files get deleted when node is deleted #55

benweet opened this issue Sep 26, 2019 · 3 comments

Comments

@benweet
Copy link

benweet commented Sep 26, 2019

We're using the vSphere cloud provider over a Rancher provisioned cluster (vSphere 6.5 and Kubernetes 1.14.6). Here is the RKE configuration we use:

cloud_provider:
  name: vsphere
  vsphereCloudProvider:
    virtual_center:
      pcc-178-32-194-122.ovh.com:
        user: admin
        password: ********
        datacenters: pcc-178-32-194-122_datacenter3372
    workspace:
      server: pcc-178-32-194-122.ovh.com
      folder: "/"
      default-datastore: pcc-008769
      datacenter: pcc-178-32-194-122_datacenter3372

I managed to setup dynamic provisioning to automatically create persistent volumes with the following storage class:

> kubectl get storageclass vsphere-pcc-008769 -o yaml
allowVolumeExpansion: false
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    field.cattle.io/creatorId: user-vpp9j
    storageclass.beta.kubernetes.io/is-default-class: "true"
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: "2019-09-25T10:34:46Z"
  labels:
    cattle.io/creator: norman
  name: vsphere-pcc-008769
  resourceVersion: "1061432"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/vsphere-pcc-008769
  uid: 1866a708-df80-11e9-a284-005056b766cc
parameters:
  datastore: pcc-008769
  diskformat: thin
provisioner: kubernetes.io/vsphere-volume
reclaimPolicy: Delete
volumeBindingMode: Immediate

The datastore "pcc-008769" is an NFS 3 shared storage. The persistent volumes are backed by VMDK files automatically created in the kubevols folder inside the datastore. (I don't know where this folder name comes from?)

When I delete a worker node VM, either using kubectl/rancher or vSphere, it seems like the VMDK files created by this VM get all deleted as well, which is quite unexpected. Indeed, the stored data are lost and unrecoverable, and of course the pods are unabled to restart on another worker node:

> kubectl describe pod gitlab-gitaly-0 -n gitlab 
...
Events:
  Type     Reason              Age                 From                         Message
  ----     ------              ----                ----                         -------
  Normal   Scheduled           9m4s                default-scheduler            Successfully assigned gitlab/gitlab-gitaly-0 to k8s-usine-worker-1
  Warning  FailedAttachVolume  9m4s                attachdetach-controller      Multi-Attach error for volume "pvc-4e2b85e5-de10-11e9-a284-005056b766cc" Volume is already exclusively attached to one node and can't be attached to another
  Warning  FailedAttachVolume  35s (x9 over 3m)    attachdetach-controller      AttachVolume.Attach failed for volume "pvc-4e2b85e5-de10-11e9-a284-005056b766cc" : File []/vmfs/volumes/30035193-32e26c39/kubevols/kubernetes-dynamic-pvc-4e2b85e5-de10-11e9-a284-005056b766cc.vmdk was not found
  Warning  FailedMount         14s (x4 over 7m1s)  kubelet, k8s-usine-worker-1  Unable to mount volumes for pod "gitlab-gitaly-0_gitlab(a0455137-e06a-11e9-a284-005056b766cc)": timeout expired waiting for volumes to attach or mount for pod "gitlab"/"gitlab-gitaly-0". list of unmounted volumes=[repo-data]. list of unattached volumes=[repo-data gitaly-config gitaly-secrets init-gitaly-secrets etc-ssl-certs custom-ca-certificates default-token-8b7xt]

Is this the expected behavior? Is there a way we can prevent VMDK deletion?

Also, I've managed to create FCD VMDK files using the vSphere python SDK, these are not tied to any VM. Is it possible to configure the cloud provider to create FCDs?

@divyenpatel
Copy link
Contributor

@benweet This issue is resolved with vsphere 67u3 and CSI driver.

With vsphere 67u3 we have released bunch of new APIs , which allows provisioning volumes as First Class Disk (Managed Virtual Disk).

With these new APIs, Internally when volume is attached to the node VM, we are attaching disk with
keepAfterDeleteVm control flag is set to true, So when node VM is deleted. Disk remains on the datastore.

Please refer https://docs.vmware.com/en/VMware-vSphere/6.7/Cloud-Native-Storage/GUID-51D308C7-ECFE-4C04-AD56-64B6E00A6548.html for more details.

New CSI driver for vSphere is available at https://github.com/kubernetes-sigs/vsphere-csi-driver

@divyenpatel
Copy link
Contributor

Is this the expected behavior? Is there a way we can prevent VMDK deletion?

To prevent this with in-tree provider, recommendation is to drain the node, let all pods re-scheduled on another node, let all volumes detached from the node and then finally delete the node VM.

@benweet
Copy link
Author

benweet commented Sep 28, 2019

Thanks for the clarification. Closing this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants