Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stuck on Pending and CrashLoopBackOff #12

Closed
Aethrexal opened this issue Aug 17, 2021 · 23 comments
Closed

Stuck on Pending and CrashLoopBackOff #12

Aethrexal opened this issue Aug 17, 2021 · 23 comments

Comments

@Aethrexal
Copy link

More issues haha.
I follow the guide and I showed my yaml in the other issue I opened.
When I run the command to get all pods in all namespaces, this is the result:

NAMESPACE        NAME                                              READY   STATUS             RESTARTS   AGE
kube-system      coredns-7448499f4d-5zwgx                          0/1     Pending            0          18m
kube-system      hcloud-cloud-controller-manager-9546b6cc6-8wgrs   1/1     Running            0          17m
kube-system      hcloud-csi-controller-0                           0/5     Pending            0          17m
kube-system      hcloud-csi-node-bb5pw                             2/3     CrashLoopBackOff   9          17m
kube-system      hcloud-csi-node-bgqfx                             2/3     CrashLoopBackOff   9          17m
kube-system      hcloud-csi-node-ht5d7                             2/3     CrashLoopBackOff   9          17m
kube-system      hcloud-csi-node-nzbw4                             2/3     CrashLoopBackOff   9          17m
kube-system      hcloud-csi-node-vhlkg                             2/3     CrashLoopBackOff   9          17m
kube-system      hcloud-csi-node-znmzx                             2/3     CrashLoopBackOff   9          17m
system-upgrade   system-upgrade-controller-677965cc4d-cdrvp        0/1     Pending            0          17m

It stays like that and it's the same if I install cert-manager, it just stays pending. The output in the codeblock above is from newely created Cluster, the one I created very first after fixing the last issue, is where I saw that it's been like this since it was created.

When I run the command to describe the pods, this is the message most of them says (give or take a few changes like the ready numbers):

Events:
  Type     Reason            Age    From               Message
  ----     ------            ----   ----               -------
  Warning  FailedScheduling  5m38s  default-scheduler  0/6 nodes are available: 6 node(s) had taint {node.cloudprovider.kubernetes.io/uninitialized: true}, that the pod didn't tolerate.
  Warning  FailedScheduling  5m36s  default-scheduler  0/6 nodes are available: 6 node(s) had taint {node.cloudprovider.kubernetes.io/uninitialized: true}, that the pod didn't tolerate.

Not really sure how to fix this.

Another question I got since I've seen k3s with external database, is that something I still need to setup using this way?
I'm still fairly new with all of this 😅

@vitobotta
Copy link
Owner

No, you don't need any external dependency as the tool installs K3s with etcd as datastore. As for the issue, what do you see in the cloud controller manager pod's log?

@Aethrexal
Copy link
Author

Oooh, I see! I'm also a bit confused on the persistent volume part haha.

Most of the logs have the events I posted above, some others have these:

Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  4m24s                  default-scheduler  Successfully assigned kube-system/hcloud-csi-node-rnsbh to gs-cpx21-pool-small-worker2
  Normal   Pulling    4m20s                  kubelet            Pulling image "quay.io/k8scsi/csi-node-driver-registrar:v1.3.0"
  Normal   Created    4m18s                  kubelet            Created container csi-node-driver-registrar
  Normal   Started    4m18s                  kubelet            Started container csi-node-driver-registrar
  Normal   Pulled     4m18s                  kubelet            Successfully pulled image "quay.io/k8scsi/csi-node-driver-registrar:v1.3.0" in 2.147625184s
  Normal   Pulled     4m15s                  kubelet            Successfully pulled image "hetznercloud/hcloud-csi-driver:1.5.3" in 3.623080946s
  Normal   Pulling    4m14s                  kubelet            Pulling image "quay.io/k8scsi/livenessprobe:v1.1.0"
  Normal   Created    4m13s                  kubelet            Created container liveness-probe
  Normal   Pulled     4m13s                  kubelet            Successfully pulled image "quay.io/k8scsi/livenessprobe:v1.1.0" in 1.800363141s
  Normal   Started    4m12s                  kubelet            Started container liveness-probe
  Normal   Killing    3m55s                  kubelet            Container hcloud-csi-driver failed liveness probe, will be restarted
  Normal   Pulling    3m55s (x2 over 4m18s)  kubelet            Pulling image "hetznercloud/hcloud-csi-driver:1.5.3"
  Normal   Started    3m54s (x2 over 4m14s)  kubelet            Started container hcloud-csi-driver
  Normal   Created    3m54s (x2 over 4m14s)  kubelet            Created container hcloud-csi-driver
  Normal   Pulled     3m54s                  kubelet            Successfully pulled image "hetznercloud/hcloud-csi-driver:1.5.3" in 1.038516926s
  Warning  Unhealthy  3m39s (x8 over 4m3s)   kubelet            Liveness probe failed: Get "http://10.244.2.2:9808/healthz": dial tcp 10.244.2.2:9808: connect: connection refused

When I run kubectl describe node node-name-master1 it shows this:

  Warning  FailedToCreateRoute      6m23s                   route_controller  Could not create route f6366050-ff07-4e1e-ab6b-d53b92071f3e 10.244.0.0/24 for node gs-cpx11-master1 after 219.475534ms: hcloud/CreateRoute: hcops/AllServersCache.ByName: gs-cpx11-master1 hcops/AllServersCache.getCache: not found
  Warning  FailedToCreateRoute      2m42s (x22 over 6m11s)  route_controller  (combined from similar events): Could not create route f6366050-ff07-4e1e-ab6b-d53b92071f3e 10.244.0.0/24 for node gs-cpx11-master1 after 1.470676778s: hcloud/CreateRoute: hcops/AllServersCache.ByName: gs-cpx11-master1 hcops/AllServersCache.getCache: not found

And it's the same for the workers.

@vitobotta
Copy link
Owner

I think you're having the same problem someone else reported in #7

Which OS are you using and how did you install Ruby? Can you please try with the Docker image instead of the gem directly? See instructions in the README.

@Aethrexal
Copy link
Author

I get the same errors when I run it using Docker.

I'm running Manjaro which uses Arch (I use arch btw) sorry not sorry, had to 😆.
I installed ruby using pacman same with rubygems.
I then couldn't install hetzner-k3s directly cause got error about 'request'. So I went to the gems website and isntalled all the dependencies that was listed there (http, sshkey etc) one by one. Which also gave me an error with one of them cause of the ruby version, so I got RVM and installed ruby 2.6.0 then after that I installed everything.

Once that was done I ran the hetzner-k3s command and it all started without any errors (except those about master which was noted in the README to ignore when starting HA).

@vitobotta
Copy link
Owner

Well this is weird. With Docker it should just work :p But not sure if I understand, did you get it working in the end or still the same problem after you installed Ruby etc with RVM?

Did you just try to update an existing cluster or did you try creating a new one?

@Aethrexal
Copy link
Author

Nope it's still not working, the logs I sent earlier (45 min ago) is from a newly created Project then after that I haven't tried again just trying to figure it out.
I'm not sure what it could be, but I do have some issues with the gems, each time I close the konsole and open it again I have to reinstall all gems.
But that doesn't explain why the docker command causes the cluster to show same errors.

@vitobotta
Copy link
Owner

I don't understand why you're having problems with Docker, but anyway can you try installing Ruby as described in #7 (comment)?

The reason I've added the Docker image is exactly so that you wouldn't have to deal with Ruby 🤔
I'll try the latest image again just in case.

@Aethrexal
Copy link
Author

Sure I'll try that, here's my config again in the meantime just in case.

---
hetzner_token: <TOKEN>
cluster_name: GS
kubeconfig_path: "./kubeconfig"
k3s_version: v1.21.3+k3s1
ssh_key_path: "~/.ssh/id_rsa.pub"
verify_host_key: false
location: nbg1
masters:
  instance_type: cpx11
  instance_count: 3
worker_node_pools:
- name: small
  instance_type: cpx21
  instance_count: 3

@vitobotta
Copy link
Owner

If this is the config you used with the Docker image, you should set the kubeconfig path to /cluster/kubeconfig as mentioned in the README. Or is this the config you are using with the gem directly?

@Aethrexal
Copy link
Author

It's the config I've used with both. I missed the part with /cluster/kubeconfig but I'll try that before finding the targets for #7 (comment). Since it's different from arch and ubuntu.

@vitobotta
Copy link
Owner

Hey I got the same problem with your config and I suspect it's got to do with the uppercase name for the cluster, since that's used to generate the names of the resources. I am now trying with a lowercase string to see if that's the problem. Can you try as well? Maybe with a new project so you start clean.

@vitobotta
Copy link
Owner

Yes with your config but lowercase cluster name it works fine. I will add a validation to enforce lowercase letters. Please try that and let me know.

@Aethrexal
Copy link
Author

Oh lol, alright I'll try that first then.

@vitobotta
Copy link
Owner

So it seems that the Hetzner Cloud Controller Manager looks for servers with lowercase names, so it doesn't find the servers with uppercase text in the name.

@Aethrexal
Copy link
Author

Eyy they're all running now!

God damnet 🤣 Last night I even thought about the uppercase when I was staring at the configs trying to see if I had messed up somewhere, but I just thought "Nah, it can't be that easy" and left it. Thanks for the help 😄

About the persistent volumes, do I just follow the guide in the git for Hetzner CSI excluding the installation since that was done automatically with this script.

@vitobotta
Copy link
Owner

Yeah I am surprised too that the cloud controller doesn't like uppercase characters :D I just released 0.3.7 with more validation on the cluster name and am about to push the Docker image v0.3.7 as well.

As for the CSI you don't need to do anything, you're ready to go about creating volumes. The single storage class hcloud-volumes is the default, so you can just create volumes either specifying that storage class or leaving it unspecified.

@vitobotta
Copy link
Owner

@Rinnray Do you mind giving the latest Docker image v0.3.7 a try?

@Aethrexal
Copy link
Author

The single storage class hcloud-volumes is the default, so you can just create volumes either specifying that storage class or leaving it unspecified.

Oh so I just create a volume normally on Hetzner and that's it?

Do you mind giving the latest Docker image v0.3.7 a try?

Sure I'll do it in a minute.

@vitobotta
Copy link
Owner

You don't have to create it yourself if that's what you mean, you just create a normal Kubernetes persistent volume claim resource and the volume will be created automatically :)

@Aethrexal
Copy link
Author

Ooooh even easier then 😄

I just tested the docker and I got the warning about using lowercase, so that works nicely!

@vitobotta
Copy link
Owner

Perfect, thanks for your help with this! :) I guess I can close now if all looks good?

@Aethrexal
Copy link
Author

Yap! All the issues I had is now resolved, now I just need to figure out why I can't access rancher haha.
Thanks for the help! :D

@vitobotta
Copy link
Owner

Np. Have fun :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants