Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashing weave-net pod when adding node to k8 cluster without supplying network-CIDR #3758

Open
ryan-g2 opened this issue Jan 24, 2020 · 16 comments

Comments

@ryan-g2
Copy link

ryan-g2 commented Jan 24, 2020

What you expected to happen?

To not have to supply a --pod-network-cidr=10.32.0.0/12 command when setting up a weave network when using kubeadm init. For the weave-net pod to remain stable when adding a node to the cluster.

What happened?

When I setup a k8 cluster using kubeadm init --apiserver-advertise-address=192.168.1.31 and add one node, the newly created weave-net pod enters a CrashLoop when setting up the 2nd container. This does not allow the new node to exit the NotReady state.

The weave-net pod for the master node looks healthy and has 2/2 RUNNING the entire time.

How to reproduce it?

NOTE - The k8 master and node are Ubuntu 18.04 VMs running on an Ubuntu 19.10 box.

  1. Tear down existing k8 cluster to get to square 1
    • drain and delete all nodes
    • kubeadm reset on all nodes and master
    • On master: delete /etc/cni/net.d and $HOME/.kube/config folders.
  2. On master - run kubeadm init --apiserver-advertise-address=192.168.1.31
    • Run commands the kubeadm says to run at the end to sset up the kubeconfig correctly (mkdir....)
    • Run kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')" to deploy weave
  3. Wait for all pods to correctly come online
  4. Add one node to the cluster with join cmd in the kubeadm output from the master.
  5. On master - run kubeadm get pods --all-namespaces

At this point you can monitor the pods being created. If the issue occurs, the 2nd container in the newest weave-net pod will start crashing and not come online - which keeps the node in a NotReady state.

Anything else we need to know?

I have recreated this issue a few times now to debug - the reproduction rate is not 100% (I have had it happen to me about 4/5 times with the above steps.

NOTE - adding --pod-network-cidr=10.32.0.0/12 to my init command when creating the cluster - I have not had this issue 4/4. I see all pods/containers create as expected.

I opened an issue with K8 thinking this was just a documentation issue (I did not see the CIDR command in the K8 setup docs or in the Weave docs. Opening an issue here since we did not see a CIDR address supplied in the log files when reproducing this bug, but saw one once I got a working cluster up.

One time before trying Weave I setup a Calico network for my cluster, but kept seeing crashing pods with that, so I moved to Weave.

Versions:

KubeCtl:

Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-18T23:30:10Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-18T23:22:30Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}

Weave

Using Weaving CNI plugin for Kubernetes

Docker:

Client:
 Version:           18.09.7
 API version:       1.39
 Go version:        go1.10.1
 Git commit:        2d0083d
 Built:             Fri Aug 16 14:20:06 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.09.7
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.1
  Git commit:       2d0083d
  Built:            Wed Aug 14 19:41:23 2019
  OS/Arch:          linux/amd64
  Experimental:     false

uname -a

Linux kubemaster 5.3.0-26-generic #28~18.04.1-Ubuntu SMP Wed Dec 18 16:40:14 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Logs:

The kube-proxy output once I have setup the master (before adding the node that has the crashing net pod):

apiVersion: v1
data:
config.conf: |-
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 0
contentType: ""
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 0
clusterCIDR: ""
configSyncPeriod: 0s
conntrack:
maxPerCore: null
min: null
tcpCloseWaitTimeout: null
tcpEstablishedTimeout: null
enableProfiling: false
healthzBindAddress: ""
hostnameOverride: ""
iptables:
masqueradeAll: false
masqueradeBit: null
minSyncPeriod: 0s
syncPeriod: 0s
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: ""
strictARP: false
syncPeriod: 0s
kind: KubeProxyConfiguration
metricsBindAddress: ""
mode: ""
nodePortAddresses: null
oomScoreAdj: null
portRange: ""
udpIdleTimeout: 0s
winkernel:
enableDSR: false
networkName: ""
sourceVip: ""
kubeconfig.conf: |-
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: https://192.168.1.31:6443
name: default
contexts:
- context:
cluster: default
namespace: default
user: default
name: default
current-context: default
users:
- name: default
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
creationTimestamp: "2020-01-24T01:15:43Z"
labels:
app: kube-proxy
name: kube-proxy
namespace: kube-system
resourceVersion: "238"
selfLink: /api/v1/namespaces/kube-system/configmaps/kube-proxy
uid: f12e3e4b-73b2-439e-8be9-7289d2bce49a

And the output once I added the one node and started seeing the crashing net pod:

apiVersion: v1
data:
config.conf: |-
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 0
contentType: ""
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 0
clusterCIDR: ""
configSyncPeriod: 0s
conntrack:
maxPerCore: null
min: null
tcpCloseWaitTimeout: null
tcpEstablishedTimeout: null
enableProfiling: false
healthzBindAddress: ""
hostnameOverride: ""
iptables:
masqueradeAll: false
masqueradeBit: null
minSyncPeriod: 0s
syncPeriod: 0s
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: ""
strictARP: false
syncPeriod: 0s
kind: KubeProxyConfiguration
metricsBindAddress: ""
mode: ""
nodePortAddresses: null
oomScoreAdj: null
portRange: ""
udpIdleTimeout: 0s
winkernel:
enableDSR: false
networkName: ""
sourceVip: ""
kubeconfig.conf: |-
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: https://192.168.1.31:6443
name: default
contexts:
- context:
cluster: default
namespace: default
user: default
name: default
current-context: default
users:
- name: default
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
creationTimestamp: "2020-01-24T01:15:43Z"
labels:
app: kube-proxy
name: kube-proxy
namespace: kube-system
resourceVersion: "238"
selfLink: /api/v1/namespaces/kube-system/configmaps/kube-proxy
uid: f12e3e4b-73b2-439e-8be9-7289d2bce49a

And for a compare, here is the output after I add the one node when I supply the CIDR in the init command:

apiVersion: v1
data:
config.conf: |-
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 0
contentType: ""
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 0
clusterCIDR: 10.32.0.0/12
configSyncPeriod: 0s
conntrack:
maxPerCore: null
min: null
tcpCloseWaitTimeout: null
tcpEstablishedTimeout: null
enableProfiling: false
healthzBindAddress: ""
hostnameOverride: ""
iptables:
masqueradeAll: false
masqueradeBit: null
minSyncPeriod: 0s
syncPeriod: 0s
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: ""
strictARP: false
syncPeriod: 0s
kind: KubeProxyConfiguration
metricsBindAddress: ""
mode: ""
nodePortAddresses: null
oomScoreAdj: null
portRange: ""
udpIdleTimeout: 0s
winkernel:
enableDSR: false
networkName: ""
sourceVip: ""
kubeconfig.conf: |-
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: https://192.168.1.31:6443
name: default
contexts:
- context:
cluster: default
namespace: default
user: default
name: default
current-context: default
users:
- name: default
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
creationTimestamp: "2020-01-24T01:27:20Z"
labels:
app: kube-proxy
name: kube-proxy
namespace: kube-system
resourceVersion: "242"
selfLink: /api/v1/namespaces/kube-system/configmaps/kube-proxy
uid: 10ce9974-f193-4b5c-9efb-78c0317746d2
@murali-reddy
Copy link
Contributor

When you specific --pod-network-cidr=10.32.0.0/12 to kubeamd init which will result in passing the specified CIDR to kube-proxy. Which will help kube-proxy to know what is internal and external traffic. That should not in anyway will cause weave-net pods or any pods.

Plese check the logs why the second container which is weave-npc is crashing for you.

@ryan-g2
Copy link
Author

ryan-g2 commented Jan 28, 2020

I recreated the issue, here is the description from the crashing weave-net container:

ERROR: logging before flag.Parse: E0128 00:36:32.053295 24752 reflector.go:205] github.com/weaveworks/weave/prog/weave-npc/main.go:321: Failed to list *v1.Pod: Get https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout

There are many other errors in the logs like the one above - they all say 'timeout' with the same address/port listed.

And here is the log from the crashing weave container:

kubemaster@kubemaster:~/git_repo/test$ kubectl logs -n kube-system pod/weave-net-qhgz5 weave
FATA: 2020/01/28 00:29:09.098127 [kube-peers] Could not get peers: Get https://10.96.0.1:443/api/v1/nodes: dial tcp 10.96.0.1:443: i/o timeout
Failed to get peers

The 10.96.0.1:443 address/port combo maps to my service/kubernetes. And here is the description of that service:

Name: kubernetes
Namespace: default
Labels: component=apiserver
provider=kubernetes
Annotations:
Selector:
Type: ClusterIP
IP: 10.96.0.1
Port: https 443/TCP
TargetPort: 6443/TCP
Endpoints: 192.168.1.31:6443
Session Affinity: None
Events: :

And here is the description of the crashing weave pod - just in case.

Name: weave-net-qhgz5
Namespace: kube-system
Priority: 0
Node: kube-node-1/192.168.0.11
Start Time: Mon, 27 Jan 2020 16:27:16 -0800
Labels: controller-revision-hash=7f54576664
name=weave-net
pod-template-generation=1
Annotations:
Status: Running
IP: 192.168.0.11
IPs:
IP: 192.168.0.11
Controlled By: DaemonSet/weave-net
Containers:
weave:
Container ID: docker://8256c6077ed0b2cf2eefb5d3a359500c87e998140a3043cd7e79f8b9ebade9df
Image: docker.io/weaveworks/weave-kube:2.6.0
Image ID: docker-pullable://weaveworks/weave-kube@sha256:e4a3a5b9bf605a7ff5ad5473c7493d7e30cbd1ed14c9c2630a4e409b4dbfab1c
Port:
Host Port:
Command:
/home/weave/launch.sh
State: Running
Started: Mon, 27 Jan 2020 16:28:38 -0800
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 27 Jan 2020 16:27:51 -0800
Finished: Mon, 27 Jan 2020 16:28:22 -0800
Ready: False
Restart Count: 2
Requests:
cpu: 10m
Readiness: http-get http://127.0.0.1:6784/status delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
HOSTNAME: (v1:spec.nodeName)
Mounts:
/host/etc from cni-conf (rw)
/host/home from cni-bin2 (rw)
/host/opt from cni-bin (rw)
/host/var/lib/dbus from dbus (rw)
/lib/modules from lib-modules (rw)
/run/xtables.lock from xtables-lock (rw)
/var/run/secrets/kubernetes.io/serviceaccount from weave-net-token-nbhwn (ro)
/weavedb from weavedb (rw)
weave-npc:
Container ID: docker://e128a80d16db155238c1ce17382de7b68790f9a13942056a023672558b87071e
Image: docker.io/weaveworks/weave-npc:2.6.0
Image ID: docker-pullable://weaveworks/weave-npc@sha256:985de9ff201677a85ce78703c515466fe45c9c73da6ee21821e89d902c21daf8
Port:
Host Port:
State: Running
Started: Mon, 27 Jan 2020 16:27:39 -0800
Ready: True
Restart Count: 0
Requests:
cpu: 10m
Environment:
HOSTNAME: (v1:spec.nodeName)
Mounts:
/run/xtables.lock from xtables-lock (rw)
/var/run/secrets/kubernetes.io/serviceaccount from weave-net-token-nbhwn (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
weavedb:
Type: HostPath (bare host directory volume)
Path: /var/lib/weave
HostPathType:
cni-bin:
Type: HostPath (bare host directory volume)
Path: /opt
HostPathType:
cni-bin2:
Type: HostPath (bare host directory volume)
Path: /home
HostPathType:
cni-conf:
Type: HostPath (bare host directory volume)
Path: /etc
HostPathType:
dbus:
Type: HostPath (bare host directory volume)
Path: /var/lib/dbus
HostPathType:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
weave-net-token-nbhwn:
Type: Secret (a volume populated by a Secret)
SecretName: weave-net-token-nbhwn
Optional: false
QoS Class: Burstable
Node-Selectors:
Tolerations: :NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule

Events:

Type Reason Age From Message


Normal Scheduled 108s default-scheduler Successfully assigned kube-system/weave-net-qhgz5 to kube-node-1
Normal Pulled 89s kubelet, kube-node-1 Container image "docker.io/weaveworks/weave-npc:2.6.0" already present on machine
Normal Created 85s kubelet, kube-node-1 Created container weave-npc
Normal Started 83s kubelet, kube-node-1 Started container weave-npc
Warning BackOff 40s kubelet, kube-node-1 Back-off restarting failed container
Normal Pulled 29s (x3 over 95s) kubelet, kube-node-1 Container image "docker.io/weaveworks/weave-kube:2.6.0" already present on machine
Normal Created 26s (x3 over 91s) kubelet, kube-node-1 Created container weave
Normal Started 24s (x3 over 89s) kubelet, kube-node-1 Started container weave
Warning Unhealthy 6s (x6 over 76s) kubelet, kube-node-1 Readiness probe failed: Get http://127.0.0.1:6784/status: dial tcp 127.0.0.1:6784: connect: connection refus

For a compare, here is the description of the other weave-net pod which is reporting 2/2 Running:

Name: weave-net-gn9vq
Namespace: kube-system
Priority: 0
Node: kubemaster/192.168.0.10
Start Time: Mon, 27 Jan 2020 16:25:20 -0800
Labels: controller-revision-hash=7f54576664
name=weave-net
pod-template-generation=1
Annotations:
Status: Running
IP: 192.168.0.10
IPs:
IP: 192.168.0.10
Controlled By: DaemonSet/weave-net
Containers:
weave:
Container ID: docker://47bc973aa1a9360519bacc4e449102b95e54ea29ceffbe356ad681cd2b33e93e
Image: docker.io/weaveworks/weave-kube:2.6.0
Image ID: docker-pullable://weaveworks/weave-kube@sha256:e4a3a5b9bf605a7ff5ad5473c7493d7e30cbd1ed14c9c2630a4e409b4dbfab1c
Port:
Host Port:
Command:
/home/weave/launch.sh
State: Running
Started: Mon, 27 Jan 2020 16:25:29 -0800
Ready: True
Restart Count: 0
Requests:
cpu: 10m
Readiness: http-get http://127.0.0.1:6784/status delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
HOSTNAME: (v1:spec.nodeName)
Mounts:
/host/etc from cni-conf (rw)
/host/home from cni-bin2 (rw)
/host/opt from cni-bin (rw)
/host/var/lib/dbus from dbus (rw)
/lib/modules from lib-modules (rw)
/run/xtables.lock from xtables-lock (rw)
/var/run/secrets/kubernetes.io/serviceaccount from weave-net-token-nbhwn (ro)
/weavedb from weavedb (rw)
weave-npc:
Container ID: docker://dc826b837d21f0165bc4b4a7f0aaa45f020991a8ae0ad36d36e664d9e4b08e22
Image: docker.io/weaveworks/weave-npc:2.6.0
Image ID: docker-pullable://weaveworks/weave-npc@sha256:985de9ff201677a85ce78703c515466fe45c9c73da6ee21821e89d902c21daf8
Port:
Host Port:
State: Running
Started: Mon, 27 Jan 2020 16:25:33 -0800
Ready: True
Restart Count: 0
Requests:
cpu: 10m
Environment:
HOSTNAME: (v1:spec.nodeName)
Mounts:
/run/xtables.lock from xtables-lock (rw)
/var/run/secrets/kubernetes.io/serviceaccount from weave-net-token-nbhwn (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
weavedb:
Type: HostPath (bare host directory volume)
Path: /var/lib/weave
HostPathType:
cni-bin:
Type: HostPath (bare host directory volume)
Path: /opt
HostPathType:
cni-bin2:
Type: HostPath (bare host directory volume)
Path: /home
HostPathType:
cni-conf:
Type: HostPath (bare host directory volume)
Path: /etc
HostPathType:
dbus:
Type: HostPath (bare host directory volume)
Path: /var/lib/dbus
HostPathType:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
weave-net-token-nbhwn:
Type: Secret (a volume populated by a Secret)
SecretName: weave-net-token-nbhwn
Optional: false
QoS Class: Burstable
Node-Selectors:
Tolerations: :NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule

Events:
Type Reason Age From Message


Normal Scheduled 23m default-scheduler Successfully assigned kube-system/weave-net-gn9vq to kubemaster
Normal Pulled 23m kubelet, kubemaster Container image "docker.io/weaveworks/weave-kube:2.6.0" already present on machine
Normal Created 23m kubelet, kubemaster Created container weave
Normal Started 23m kubelet, kubemaster Started container weave
Normal Pulled 23m kubelet, kubemaster Container image "docker.io/weaveworks/weave-npc:2.6.0" already present on machine
Normal Created 23m kubelet, kubemaster Created container weave-npc
Normal Started 23m kubelet, kubemaster Started container weave-npc
Warning Unhealthy 23m (x2 over 23m) kubelet, kubemaster Readiness probe failed: Get http://127.0.0.1:6784/status: dial tcp 127.0.0.1:6784: connect: connection refused

@murali-reddy
Copy link
Contributor

github.com/weaveworks/weave/prog/weave-npc/main.go:321: Failed to list *v1.Pod: Get https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout

FATA: 2020/01/28 00:29:09.098127 [kube-peers] Could not get peers: Get https://10.96.0.1:443/api/v1/nodes: dial tcp 10.96.0.1:443: i/o timeout
Failed to get peers

above errors from weave container logs indicate service IP 10.96.0.1 is not accessible on the node, as its fatal conditaion weave-net pod shutsdown. You need to debug why services are not accessible. Do you have kube-proxy running on the node? Check for any errors in kube-proxy logs.

@ryan-g2
Copy link
Author

ryan-g2 commented Jan 28, 2020

Yes, kube-proxy is running on the node - Node-1. Here is the log:

kubemaster@kubemaster:~/git_repo/test$ kubectl logs -n kube-system pod/kube-proxy-7d82k
W0128 05:05:17.169516 1 server_others.go:323] Unknown proxy mode "", assuming iptables proxy
I0128 05:05:17.328464 1 node.go:135] Successfully retrieved node IP: 192.168.0.11
I0128 05:05:17.328527 1 server_others.go:145] Using iptables Proxier.
W0128 05:05:17.328913 1 proxier.go:286] clusterCIDR not specified, unable to distinguish between internal and external traffic
I0128 05:05:17.329278 1 server.go:571] Version: v1.17.2
I0128 05:05:17.330095 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0128 05:05:17.331975 1 config.go:313] Starting service config controller
I0128 05:05:17.332004 1 shared_informer.go:197] Waiting for caches to sync for service config
I0128 05:05:17.332154 1 config.go:131] Starting endpoints config controller
I0128 05:05:17.332172 1 shared_informer.go:197] Waiting for caches to sync for endpoints config
I0128 05:05:17.432301 1 shared_informer.go:204] Caches are synced for endpoints config
I0128 05:05:17.432495 1 shared_informer.go:204] Caches are synced for service config

And the events for the kube-proxy pod on Node-1:

Events:

Type Reason Age From Message


Normal Scheduled 3m59s default-scheduler Successfully assigned kube-system/kube-proxy-7d82k to kube-node-1
Normal Pulled 3m50s kubelet, kube-node-1 Container image "k8s.gcr.io/kube-proxy:v1.17.2" already present on machine
Normal Created 3m45s kubelet, kube-node-1 Created container kube-proxy
Normal Started 3m42s kubelet, kube-node-1 Started container kube-proxy

@ryan-g2
Copy link
Author

ryan-g2 commented Jan 28, 2020

Looking at the kube-poxy logs I see iptables mentioned. Could this have anything to do with the fact I am running all my VMs on an Ubuntu 19.10 system? I read that Weave only likes iptables 1.6 and 19.10 has 1.8.

@murali-reddy
Copy link
Contributor

I read that Weave only likes iptables 1.6 and 19.10 has 1.8.

Its requisite for weave-net pods to be able to reach kubernetes api server to even start as you have noticed. So the real problem is not with weave-net but the service proxy.

you need to debug and ensure kubernetes service IP 10.96.0.1 is accessible from the node

@ryan-g2
Copy link
Author

ryan-g2 commented Jan 28, 2020

I'm not sure why weave-npc can't see kube-proxy.

Kube proxy and all the pods associated with it are running. Are there any other logs that would help? I'm new to k8 and Weave so I am not sure what all needs checking.

@neolit123
Copy link

you need to debug and ensure kubernetes service IP 10.96.0.1 is accessible from the node

you can do:
kubectl get svc kubernetes
which should give you the IP / port of the kubernetes service.

telnet <ip> <port>
will then tell you if the node has connection to the service.

@ryan-g2
Copy link
Author

ryan-g2 commented Feb 1, 2020

Thanks for the response.

I telnetted from node-1 to the kubernetes service telnet 10.96.0.1 443 and telnet 10.96.0.1 6443 and both attempts just sat there trying with no response.

443 is the port seen open when I use kubectl get svc kubernetes.

@neolit123
Copy link

did it say the following?

Trying 10.96.0.1...
Connected to 10.96.0.1.

if yes, the connection is fine.
this verifies that the node has connectivity.

@neolit123
Copy link

if no, i have no immediate explanation what can cause that. do you have firewall rules enabled?

@ryan-g2
Copy link
Author

ryan-g2 commented Feb 2, 2020

No, it just tried and never connected. I only let it sit maybe less than 20 seconds - I think it would have connected by then if things were ok.

I have a firewall, but I don't think this would touch the firewall at all since the 10.x.x.x addresses are a virtual network hosted on my linux box which has my kube cluster running on it with 3 VMs.

Plus, all this works if I supply the CIDR command when running the initial init command. So I don't need this fixed since I can just remake my cluster, supply the CIDR and have everything work. Maybe a note can be added to the setup process that supplying the CIDR could help people who run into this - for whatever reason.

@neolit123
Copy link

Maybe a note can be added to the setup process that supplying the CIDR could help people who run into this - for whatever reason.

if weavenet does not require a CIDR, but in some cases it does (other than https://www.weave.works/docs/net/latest/kubernetes/kube-addon/#-things-to-watch-out-for), then this is breaking a UX contract and is better to understand the reason.

@murali-reddy
Copy link
Contributor

Plus, all this works if I supply the CIDR command when running the initial init command.

When you specifiy CIDR for kubeadm init, it goes to the --cluster-cidr argument of kube-proxy (please see https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/). This would help kube-proxy to figure what is internal traffic and exntenal traffic.

If you dont specify CIDR, possibly traffic is getting masquraded (not masqurade not SNAT). And if traffic is not going through then it means wrong source IP address is perhaps picked and is unroutable.

This sounds similar to kubernetes/kubeadm#102. Are you using host with multiple interafaces?

@ryan-g2
Copy link
Author

ryan-g2 commented Feb 14, 2020

yes, the master node has multiple interfaces:

datapath: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1376
inet6 fe80::10b7:87ff:fe21:f821 prefixlen 64 scopeid 0x20
ether 12:b7:87:21:f8:21 txqueuelen 1000 (Ethernet)
RX packets 6788 bytes 498949 (498.9 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 2603 bytes 226461 (226.4 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
ether 02:42:d5:e3:f5:16 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.0.10 netmask 255.255.255.0 broadcast 192.168.0.255
inet6 fe80::9bdb:ca2a:31ee:beba prefixlen 64 scopeid 0x20
ether 08:00:27:ab:c8:91 txqueuelen 1000 (Ethernet)
RX packets 1119999 bytes 634571114 (634.5 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 395661 bytes 145955863 (145.9 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

enp0s8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.31 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::a00:27ff:fe61:3d37 prefixlen 64 scopeid 0x20
ether 08:00:27:61:3d:37 txqueuelen 1000 (Ethernet)
RX packets 3591560 bytes 337031073 (337.0 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 3193847 bytes 1423071293 (1.4 GB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10
loop txqueuelen 1000 (Local Loopback)
RX packets 195097965 bytes 28953233045 (28.9 GB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 195097965 bytes 28953233045 (28.9 GB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

vethwe-bridge: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1376
inet6 fe80::d083:e0ff:fe25:5337 prefixlen 64 scopeid 0x20
ether d2:83:e0:25:53:37 txqueuelen 0 (Ethernet)
RX packets 6385 bytes 558005 (558.0 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 4089 bytes 357758 (357.7 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

vethwe-datapath: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1376
inet6 fe80::d483:37ff:fee3:ef7d prefixlen 64 scopeid 0x20
ether d6:83:37:e3:ef:7d txqueuelen 0 (Ethernet)
RX packets 4089 bytes 357758 (357.7 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 6385 bytes 558005 (558.0 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

vethwepl7096ee8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1376
inet6 fe80::e434:64ff:febe:f8ea prefixlen 64 scopeid 0x20
ether e6:34:64:be:f8:ea txqueuelen 0 (Ethernet)
RX packets 659 bytes 55426 (55.4 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 746 bytes 207812 (207.8 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

vethweplecda175: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1376
inet6 fe80::583f:ecff:fe34:67c8 prefixlen 64 scopeid 0x20
ether 5a:3f:ec:34:67:c8 txqueuelen 0 (Ethernet)
RX packets 649 bytes 54957 (54.9 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 737 bytes 207364 (207.3 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

vxlan-6784: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 65535
inet6 fe80::4850:93ff:fe79:a9d0 prefixlen 64 scopeid 0x20
ether 4a:50:93:79:a9:d0 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

weave: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1376
inet 10.32.0.1 netmask 255.240.0.0 broadcast 10.47.255.255
inet6 fe80::48ee:bcff:fe1d:6319 prefixlen 64 scopeid 0x20
ether 4a:ee:bc:1d:63:19 txqueuelen 1000 (Ethernet)
RX packets 4836959 bytes 329922997 (329.9 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 5229000 bytes 1453684418 (1.4 GB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

And here are the interfaces for the Linux host which is hosting the Master and Worker nodes through VirtualBox:

docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
ether 02:42:0b:d5:50:03 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.0.30 netmask 255.255.255.0 broadcast 192.168.0.255
inet6 fe80::153:d098:f582:d701 prefixlen 64 scopeid 0x20
ether 18:03:73:1e:27:a9 txqueuelen 1000 (Ethernet)
RX packets 3645726 bytes 3221355692 (3.2 GB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1503963 bytes 196857989 (196.8 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10
loop txqueuelen 1000 (Local Loopback)
RX packets 27871 bytes 2507853 (2.5 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 27871 bytes 2507853 (2.5 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

vboxnet0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.30 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::800:27ff:fe00:0 prefixlen 64 scopeid 0x20
ether 0a:00:27:00:00:00 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 15743 bytes 1366878 (1.3 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

@ryan-g2
Copy link
Author

ryan-g2 commented Feb 14, 2020

This also sounds like #3363

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants