[BUG] - Kubeadm init fails with timeout #46

shutupflanders · 2021-09-03T13:54:11Z

Describe the bug
I'm trying to provision a cluster using the example code in the README, but it always times out when getting to kubeadm init:

Unfortunately, an error has occurred: timed out waiting for the condition

journalctl -xeu kubelet returns a load of different errors:

eviction_manager.go:260] eviction manager: failed to get summary stats: failed to get node info: node "condor-default-6be8751bc7f10eb7-controller-0" not found

kubelet.go:2163] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized

controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://10.240.0.3:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/condor-default-6be8751bc7f10eb7-controller-0?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

kubelet_node_status.go:563] Failed to set some node status fields: failed to validate nodeIP: node IP: "10.240.0.3" not found in the host's network interfaces
trace.go:205] Trace[1635771837]: "Reflector ListAndWatch" name:k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46 (03-Sep-2021 13:40:04.283) (total time: 30000ms):

reflector.go:138] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Failed to watch *v1.Pod: failed to list *v1.Pod: Get "https://10.240.0.3:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dcondor-default-6be8751bc7f10eb7-controller-0&limit=500&resourceVersion=0": dial tccp 10.240.0.3:6443: i/o timeout

reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://10.240.0.3:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.240.0.3:6443: i/o timeout

reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://10.240.0.3:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp 10.240.0.3:6443: i/o timeout

controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://10.240.0.3:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/condor-default-6be8751bc7f10eb7-controller-0?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

# With loads of these in between
kubelet.go:2243] node "condor-default-6be8751bc7f10eb7-controller-0" not found

My module configuration is:

module "condor" {
  source  = "vultr/condor/vultr"
  version = "1.2.0"
  cluster_vultr_api_key = var.vultr_api_key
  provisioner_public_key = chomp(file("~/.ssh/id_rsa.pub"))
  cluster_region = "lhr"
}

API key is set, whitelisted, I can connect to all 3 nodes fine.

To Reproduce
Steps to reproduce the behavior:

use above module code, terraform init, terraform apply
wait 4-5 minutes
See error

Expected behavior
For the cluster to be provisioned

Screenshots

Desktop (please complete the following information where applicable:

OS: Ubuntu 20.04
Version: Terraform v1.0.5, terraform-vultr-condor v1.2.0

Additional context

Any help will be greatly appreciated, I must be going wrong somewhere if other people have got this working out of the box.
Thanks

The text was updated successfully, but these errors were encountered:

ddymko · 2021-09-03T14:24:25Z

@shutupflanders thanks bringing this up.

We'll look into this for you

Oogy · 2021-09-03T15:57:52Z

Hello @shutupflanders,

We've just released v2 of this module which has been in development for some time. There are a number of new features and improvements(CHANGELOG), however also some new requirements and dependencies so please take a look at the new README and let us know if you have any questions.

Thanks!

shutupflanders · 2021-09-03T16:20:18Z

That's handy!

I'll try again tonight and let you know how I get on.

Thanks

shutupflanders · 2021-09-03T21:29:28Z

Hey @Oogy,

Thanks for the new release, that worked great (maybe make the part about having k0sctl installed a bit more prominent in the docs though, spent about 45 mins debugging before seeing that!)

The only issue I've had is when it comes to destroying the cluster, I had to run destroy twice due to:

│ Error: error destroying private network (79c01eec-e27f-4110-9325-fb42c2230b5b): {"error":"Network is attached to multiple servers.","status":400}

Not a massive issue, but might be worth looking into.
Happy for you to mark this issue as resolved when ready.
Cheers

shutupflanders added the bug Something isn't working label Sep 3, 2021

shutupflanders assigned ddymko Sep 3, 2021

ddymko assigned Oogy Sep 3, 2021

shutupflanders closed this as completed Sep 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] - Kubeadm init fails with timeout #46

[BUG] - Kubeadm init fails with timeout #46

shutupflanders commented Sep 3, 2021

ddymko commented Sep 3, 2021

Oogy commented Sep 3, 2021

shutupflanders commented Sep 3, 2021

shutupflanders commented Sep 3, 2021

[BUG] - Kubeadm init fails with timeout #46

[BUG] - Kubeadm init fails with timeout #46

Comments

shutupflanders commented Sep 3, 2021

ddymko commented Sep 3, 2021

Oogy commented Sep 3, 2021

shutupflanders commented Sep 3, 2021

shutupflanders commented Sep 3, 2021