Skip to content
This repository has been archived by the owner on Sep 7, 2022. It is now read-only.

Unable to start kubelet after adding vsphere.conf file #501

Closed
GajaHebbar opened this issue Aug 6, 2018 · 13 comments
Closed

Unable to start kubelet after adding vsphere.conf file #501

GajaHebbar opened this issue Aug 6, 2018 · 13 comments

Comments

@GajaHebbar
Copy link

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:

I am trying configure/use vmware datastore to use it as volume(create static vmdk and/or create volume dynamically as per https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/policy-based-mgmt.html

and when I follow
https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/existing.html

to configure user and vsphere.conf for k8s v1.10.4 (version 1.9 and above) I am not able to start the kubelet service and further no operation can be done like kubectl create,get pods,get nodes

What you expected to happen:
after vsphere.conf setting kubelet should start and should be able to perform operation like create

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version
    Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.4", GitCommit:"5ca598b4ba5abb89bb773071ce452e33fb66339d", GitTreeState:"clean", BuildDate:"2018-06-06T08:13:03Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.4", GitCommit:"5ca598b4ba5abb89bb773071ce452e33fb66339d", GitTreeState:"clean", BuildDate:"2018-06-06T08:00:59Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

  • Cloud provider or hardware configuration: vsphere V 6.5

  • OS (e.g. from /etc/os-release):
    NAME="CentOS Linux"
    VERSION="7 (Core)"
    ID="centos"
    ID_LIKE="rhel fedora"
    VERSION_ID="7"

  • Kernel (e.g. uname -a):

Linux barnda129.inblrlab.avaya.com 3.10.0-862.3.2.el7.x86_64

  • Install tools:
  • Others:

create vsphere.conf in /etc/kubernetes

disk.EnableUUID is set to true for both master and worker node

added

--cloud-provider=vsphere
--cloud-config=/etc/kubernetes/vsphere.conf

in
/etc/kubernetes/manifests/kube-controller-manager.yaml
/etc/kubernetes/manifests/kube-apiserver.yaml

Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --cloud-provider=vsphere --cloud-config=/etc/kubernetes/vsphere.conf" (at location /etc/systemd/system/kubelet.service.d/10-kubeadm.conf )

in master node

in worker node

Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --cloud-provider=vsphere"

at location /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

Attached vspgere.conf vsphere.docx

Error Trace

Jul 31 12:36:11 barnda129 kubelet: E0731 12:36:11.688844 24258 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:461: Failed to list *v1.Node: Get https://10.133.132.129:6443/api/v1/nodes?fieldSelector=metadata.name%3Dbarnda129.inblrlab.avaya.com&limit=500&resourceVersion=0: dial tcp 10.133.132.129:6443: getsockopt: connection refused
Jul 31 12:36:12 barnda129 kubelet: E0731 12:36:12.686611 24258 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:452: Failed to list *v1.Service: Get https://10.133.132.129:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.133.132.129:6443: getsockopt: connection refused
Jul 31 12:36:12 barnda129 kubelet: E0731 12:36:12.688701 24258 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.133.132.129:6443/api/v1/pods?
vsphere.docx
fieldSelector=spec.nodeName%3Dbarnda129.inblrlab.avaya.com&limit=500&resourceVersion=0: dial tcp 10.133.132.129:6443: getsockopt: connection refused
Jul 31 12:36:12 barnda129 kubelet: E0731 12:36:12.689943 24258 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:461: Failed to list *v1.Node: Get https://10.133.132.129:6443/api/v1/nodes?fieldSelector=metadata.name%3Dbarnda129.inblrlab.avaya.com&limit=500&resourceVersion=0: dial tcp 10.133.132.129:6443: getsockopt: connection refused

Please let me know what is missing here

@embano1
Copy link

embano1 commented Aug 17, 2018

Can you please try to also deploy vsphere.conf on the workers and add --cloud-config= parameter? We run into the same issue and even though it's documented that the conf is not needed on the workers, it seems to break the kubelet.

@GajaHebbar
Copy link
Author

I have done that before opening the issue here. That also didn't work.

I have mentioned it in the issue

please refer

in worker node

Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --cloud-provider=vsphere"

@embano1
Copy link

embano1 commented Aug 22, 2018

I have done that before opening the issue here. That also didn't work.

Sorry for being not clear. What I meant is to also pass the vsphere.conf as --cloud-config parameter to each kubelet. Looks like you don't do that currently:

in worker node
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --cloud-provider=vsphere"

However, looking at the logs it seems like communication between the API server and kubelet is blocked or API is not reachable. Is everything working as expected on the control plane?

@GajaHebbar
Copy link
Author

Ok, that was not done. Will try that.

@GajaHebbar
Copy link
Author

- --cloud-provider=vsphere
- --cloud-config=/etc/kubernetes/vsphere.conf

added above in /etc/kubernetes/manifests/kube-apiserver.yaml
/etc/kubernetes/manifestskube-controller-manager.yaml
/etc/systemd/system/kubelet.service.d/10-kubeadm.conf

of master node

and in worker node

- --cloud-provider=vsphere
- --cloud-config=/etc/kubernetes/vsphere.conf

in /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
then restarted systemctl daemon-reload followed by systemctl restart kubelet.service worker and master

which results in error

Aug 22 14:46:51 barnda129 kubelet: E0822 14:46:51.272118 9409 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.133.132.129:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dbarnda129.inblrlab.avaya.com&limit=500&resourceVersion=0: dial tcp 10.133.132.129:6443: getsockopt: connection refused

If I remove - --cloud-provider=vsphere
- --cloud-config=/etc/kubernetes/vsphere.conf
from /etc/kubernetes/manifests/kube-apiserver.yaml and kube-controller-manager.yaml, I dint see any error but while creating pod with the example given https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/persistent-vols-claims.html

Aug 22 15:29:25 barnda135 kubelet: I0822 15:29:25.112493 23148 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "pv0001" (UniqueName: "kubernetes.io/vsphere-volume/[10.133.132.83_DS1] volume/test") pod "pvpod" (UID: "f9ed41a6-a5f1-11e8-94ea-005056b3208e")
Aug 22 15:29:25 barnda135 kubelet: E0822 15:29:25.117281 23148 nestedpendingoperations.go:267] Operation for ""kubernetes.io/vsphere-volume/[10.133.132.83_DS1] volume/test"" failed. No retries permitted until 2018-08-22 15:29:57.117211852 +0530 IST m=+237.602265226 (durationBeforeRetry 32s). Error: "Volume not attached according to node status for volume "pv0001" (UniqueName: "kubernetes.io/vsphere-volume/[10.133.132.83_DS1] volume/test") pod "pvpod" (UID: "f9ed41a6-a5f1-11e8-94ea-005056b3208e") "

@divyenpatel
Copy link

@GajaHebbar Looks like API server is not getting started correctly after you are adding flags

- --cloud-provider=vsphere
- --cloud-config=/etc/kubernetes/vsphere.conf

Please checkout manifest file for API server, and make sure /etc/kubernetes/ is mounted into the API server pod.

For kubernetes cluster deployed using kubeadm, generally /etc/kubernetes is not accessible to system pods.

you may need to move vsphere.conf file in /etc/kubernetes/pki/ or other accessible directory.
Please refer manifest files posted at - https://gist.github.com/divyenpatel/f5f23addca31b0a7da1647831539969f

@neeraj23
Copy link

Hi @divyenpatel ,
I am working with @GajaHebbar on this, we tried the configuration as mentioned here: https://gist.github.com/divyenpatel/f5f23addca31b0a7da1647831539969f , but after creating the pod we are encountering this error "Invalid configuration for device '0'."

The logs are as follows:

Aug 24 18:55:29 barnda135 kubelet: I0824 18:55:29.670835 5815 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "pv0001" (UniqueName: "kubernetes.io/vsphere-volume/[10.133.132.83_DS1] volume/test") pod "pvpod" (UID: "e2b75b77-a7a0-11e8-9476-005056b3208e")
Aug 24 18:55:29 barnda135 kubelet: E0824 18:55:29.675404 5815 nestedpendingoperations.go:267] Operation for ""kubernetes.io/vsphere-volume/[10.133.132.83_DS1] volume/test"" failed. No retries permitted until 2018-08-24 18:57:31.67535422 +0530 IST m=+90113.195561718 (durationBeforeRetry 2m2s). Error: "Volume not attached according to node status for volume "pv0001" (UniqueName: "kubernetes.io/vsphere-volume/[10.133.132.83_DS1] volume/test") pod "pvpod" (UID: "e2b75b77-a7a0-11e8-9476-005056b3208e") "
Aug 24 18:55:30 barnda135 kubelet: E0824 18:55:30.545966 5815 kubelet.go:1640] Unable to mount volumes for pod "pvpod_default(e2b75b77-a7a0-11e8-9476-005056b3208e)": timeout expired waiting for volumes to attach or mount for pod "default"/"pvpod". list of unmounted volumes=[test-volume]. list of unattached volumes=[test-volume default-token-rcb68]; skipping pod
Aug 24 18:55:30 barnda135 kubelet: E0824 18:55:30.546053 5815 pod_workers.go:186] Error syncing pod e2b75b77-a7a0-11e8-9476-005056b3208e ("pvpod_default(e2b75b77-a7a0-11e8-9476-005056b3208e)"), skipping: timeout expired waiting for volumes to attach or mount for pod "default"/"pvpod". list of unmounted volumes=[test-volume]. list of unattached volumes=[test-volume default-token-rcb68]

@divyenpatel
Copy link

@neeraj23 @GajaHebbar Have you set disk.enableUUID=1 flag on all your node VMs.

The disk UUID on the node VMs must be enabled: the disk.EnableUUID value must be set to True. This step is necessary so that the VMDK always presents a consistent UUID to the VM, thus allowing the disk to be mounted properly. For each of the virtual machine nodes that will be participating in the cluster, follow the steps below using govc.

Find Node VM Paths

govc ls /datacenter/vm/<vm-folder-name>

Set disk.EnableUUID to true for all VMs.

govc vm.change -e="disk.enableUUID=1" -vm='VM Path'

Note: If Kubernetes Node VMs are created from template VM then disk.EnableUUID=1 can be set on the template VM. VMs cloned from this template, will automatically inherit this property.

@divyenpatel
Copy link

@neeraj23 @GajaHebbar Do you see PVC bound to PV? Are you using PVC in the Pod Spec?
Can provide kubectl describe output for PV, PVC and Pod. We need to see the events section from kubectl describe output for failures.

@neeraj23
Copy link

Hi @divyenpatel , The VMs already have disk.enableUUID=1 set.
I have created the pv, pvc and pods using these three files
vpshere-volume-pvcpod.yaml.txt
vsphere-volume-pv.yaml.txt
vsphere-volume-pvc.yaml.txt

The pvc and pv are shown to be in bound state. But I am not able to start a pod using the pv and pvc.
The describe output for pv, pvc and pod are as follows.
describe pod.txt
describe pv.txt
describe pvc.txt

@neeraj23
Copy link

neeraj23 commented Sep 4, 2018

I tried to create a pod using vsphere volume in another setup using this yaml file

test-pod.yaml.txt

But I get the error saying "Invalid configuration for device '0.'"
The output of kubectl describe pod is as follows.

describe pod.txt

@divyenpatel
Copy link

I see you have following volumePath

volumePath: "[/Bangalore/datastore/10.133.132.83_DS1] volume/test.vmdk"

In the above path Bangalore, and datastore are datastore folders? If not, you have incorrect volumePath.

It should be as shown below.

If datastore sharedVmfs-0 is under datastore folder DatastoreFolder.
Here kubevols is the directory in the datastore in which vmdk is present.

volumePath: "[DatastoreFolder/sharedVmfs-0] kubevols/test.vmdk"

If datastore sharedVmfs-0 is under root / folder.

volumePath: "[sharedVmfs-0] kubevols/test.vmdk"

We have updated instructions for configuring vSphere Cloud Provider recently. Can you please follow and make sure vsphere.conf is correctly configured. - https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/existing.html

@GajaHebbar
Copy link
Author

@divyenpatel Looked in to the system and there were issues with datastore which was not accessible from the VM which was running kubernetes cluster, after re-configuring that and with new vsphere.conf file provided https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/existing.html. This issue is fixed

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants