Skip to content
This repository has been archived by the owner on Oct 5, 2023. It is now read-only.

[BUG] - Kubeadm init fails with timeout #46

Closed
shutupflanders opened this issue Sep 3, 2021 · 4 comments
Closed

[BUG] - Kubeadm init fails with timeout #46

shutupflanders opened this issue Sep 3, 2021 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@shutupflanders
Copy link

Describe the bug
I'm trying to provision a cluster using the example code in the README, but it always times out when getting to kubeadm init:

Unfortunately, an error has occurred: timed out waiting for the condition

journalctl -xeu kubelet returns a load of different errors:

eviction_manager.go:260] eviction manager: failed to get summary stats: failed to get node info: node "condor-default-6be8751bc7f10eb7-controller-0" not found

kubelet.go:2163] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized

controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://10.240.0.3:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/condor-default-6be8751bc7f10eb7-controller-0?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

kubelet_node_status.go:563] Failed to set some node status fields: failed to validate nodeIP: node IP: "10.240.0.3" not found in the host's network interfaces
trace.go:205] Trace[1635771837]: "Reflector ListAndWatch" name:k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46 (03-Sep-2021 13:40:04.283) (total time: 30000ms):

reflector.go:138] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Failed to watch *v1.Pod: failed to list *v1.Pod: Get "https://10.240.0.3:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dcondor-default-6be8751bc7f10eb7-controller-0&limit=500&resourceVersion=0": dial tccp 10.240.0.3:6443: i/o timeout

reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://10.240.0.3:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.240.0.3:6443: i/o timeout

reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://10.240.0.3:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp 10.240.0.3:6443: i/o timeout

controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://10.240.0.3:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/condor-default-6be8751bc7f10eb7-controller-0?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

# With loads of these in between
kubelet.go:2243] node "condor-default-6be8751bc7f10eb7-controller-0" not found

My module configuration is:

module "condor" {
  source  = "vultr/condor/vultr"
  version = "1.2.0"
  cluster_vultr_api_key = var.vultr_api_key
  provisioner_public_key = chomp(file("~/.ssh/id_rsa.pub"))
  cluster_region = "lhr"
}

API key is set, whitelisted, I can connect to all 3 nodes fine.

To Reproduce
Steps to reproduce the behavior:

  1. use above module code, terraform init, terraform apply
  2. wait 4-5 minutes
  3. See error

Expected behavior
For the cluster to be provisioned

Screenshots
Screenshot from 2021-09-03 14-50-59

Screenshot from 2021-09-03 14-51-56

Desktop (please complete the following information where applicable:

  • OS: Ubuntu 20.04
  • Version: Terraform v1.0.5, terraform-vultr-condor v1.2.0

Additional context

Any help will be greatly appreciated, I must be going wrong somewhere if other people have got this working out of the box.
Thanks

@shutupflanders shutupflanders added the bug Something isn't working label Sep 3, 2021
@ddymko
Copy link

ddymko commented Sep 3, 2021

@shutupflanders thanks bringing this up.

We'll look into this for you

@Oogy
Copy link
Contributor

Oogy commented Sep 3, 2021

Hello @shutupflanders,

We've just released v2 of this module which has been in development for some time. There are a number of new features and improvements(CHANGELOG), however also some new requirements and dependencies so please take a look at the new README and let us know if you have any questions.

Thanks!

@shutupflanders
Copy link
Author

That's handy!

I'll try again tonight and let you know how I get on.

Thanks

@shutupflanders
Copy link
Author

Hey @Oogy,

Thanks for the new release, that worked great (maybe make the part about having k0sctl installed a bit more prominent in the docs though, spent about 45 mins debugging before seeing that!)

The only issue I've had is when it comes to destroying the cluster, I had to run destroy twice due to:

│ Error: error destroying private network (79c01eec-e27f-4110-9325-fb42c2230b5b): {"error":"Network is attached to multiple servers.","status":400}

Not a massive issue, but might be worth looking into.
Happy for you to mark this issue as resolved when ready.
Cheers

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants