New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crashing weave-net pod when adding node to k8 cluster without supplying network-CIDR #3758
Comments
When you specific Plese check the logs why the second container which is weave-npc is crashing for you. |
I recreated the issue, here is the description from the crashing weave-net container:
There are many other errors in the logs like the one above - they all say 'timeout' with the same address/port listed. And here is the log from the crashing weave container:
The 10.96.0.1:443 address/port combo maps to my service/kubernetes. And here is the description of that service:
And here is the description of the crashing weave pod - just in case.
For a compare, here is the description of the other weave-net pod which is reporting 2/2 Running:
|
above errors from weave container logs indicate service IP |
Yes, kube-proxy is running on the node - Node-1. Here is the log:
And the events for the kube-proxy pod on Node-1:
|
Looking at the kube-poxy logs I see iptables mentioned. Could this have anything to do with the fact I am running all my VMs on an Ubuntu 19.10 system? I read that Weave only likes iptables 1.6 and 19.10 has 1.8. |
Its requisite for weave-net pods to be able to reach kubernetes api server to even start as you have noticed. So the real problem is not with weave-net but the service proxy. you need to debug and ensure |
I'm not sure why weave-npc can't see kube-proxy. Kube proxy and all the pods associated with it are running. Are there any other logs that would help? I'm new to k8 and Weave so I am not sure what all needs checking. |
you can do:
|
Thanks for the response. I telnetted from node-1 to the kubernetes service 443 is the port seen open when I use |
did it say the following?
if yes, the connection is fine. |
if no, i have no immediate explanation what can cause that. do you have firewall rules enabled? |
No, it just tried and never connected. I only let it sit maybe less than 20 seconds - I think it would have connected by then if things were ok. I have a firewall, but I don't think this would touch the firewall at all since the 10.x.x.x addresses are a virtual network hosted on my linux box which has my kube cluster running on it with 3 VMs. Plus, all this works if I supply the CIDR command when running the initial init command. So I don't need this fixed since I can just remake my cluster, supply the CIDR and have everything work. Maybe a note can be added to the setup process that supplying the CIDR could help people who run into this - for whatever reason. |
if weavenet does not require a CIDR, but in some cases it does (other than https://www.weave.works/docs/net/latest/kubernetes/kube-addon/#-things-to-watch-out-for), then this is breaking a UX contract and is better to understand the reason. |
When you specifiy CIDR for kubeadm init, it goes to the If you dont specify CIDR, possibly traffic is getting masquraded (not masqurade not SNAT). And if traffic is not going through then it means wrong source IP address is perhaps picked and is unroutable. This sounds similar to kubernetes/kubeadm#102. Are you using host with multiple interafaces? |
yes, the master node has multiple interfaces:
And here are the interfaces for the Linux host which is hosting the Master and Worker nodes through VirtualBox:
|
This also sounds like #3363 |
What you expected to happen?
To not have to supply a
--pod-network-cidr=10.32.0.0/12
command when setting up a weave network when usingkubeadm init
. For the weave-net pod to remain stable when adding a node to the cluster.What happened?
When I setup a k8 cluster using
kubeadm init --apiserver-advertise-address=192.168.1.31
and add one node, the newly created weave-net pod enters a CrashLoop when setting up the 2nd container. This does not allow the new node to exit the NotReady state.The weave-net pod for the master node looks healthy and has 2/2 RUNNING the entire time.
How to reproduce it?
NOTE - The k8 master and node are Ubuntu 18.04 VMs running on an Ubuntu 19.10 box.
kubeadm reset
on all nodes and master/etc/cni/net.d
and$HOME/.kube/config
folders.kubeadm init --apiserver-advertise-address=192.168.1.31
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
to deploy weavekubeadm get pods --all-namespaces
At this point you can monitor the pods being created. If the issue occurs, the 2nd container in the newest weave-net pod will start crashing and not come online - which keeps the node in a NotReady state.
Anything else we need to know?
I have recreated this issue a few times now to debug - the reproduction rate is not 100% (I have had it happen to me about 4/5 times with the above steps.
NOTE - adding
--pod-network-cidr=10.32.0.0/12
to my init command when creating the cluster - I have not had this issue 4/4. I see all pods/containers create as expected.I opened an issue with K8 thinking this was just a documentation issue (I did not see the CIDR command in the K8 setup docs or in the Weave docs. Opening an issue here since we did not see a CIDR address supplied in the log files when reproducing this bug, but saw one once I got a working cluster up.
One time before trying Weave I setup a Calico network for my cluster, but kept seeing crashing pods with that, so I moved to Weave.
Versions:
KubeCtl:
Weave
Using Weaving CNI plugin for Kubernetes
Docker:
uname -a
Linux kubemaster 5.3.0-26-generic #28~18.04.1-Ubuntu SMP Wed Dec 18 16:40:14 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Logs:
The kube-proxy output once I have setup the master (before adding the node that has the crashing net pod):
And the output once I added the one node and started seeing the crashing net pod:
And for a compare, here is the output after I add the one node when I supply the CIDR in the init command:
The text was updated successfully, but these errors were encountered: