Routing of Flannel traffic over wireguard #139

jowenn · 2018-06-01T12:50:35Z

It is nice that there is wireguard, but it is not used for inter pod communication

Flannel by default uses the device with the default route (if run on the host), I'm not sure how it is handled when run as a pod

At least I could verify that my communication between two pods on two nodes is not encrypted, because it does not use the wg0 interface.

What I did:
I created a cluster
hetzner-kube cluster create ..... --ha-enabled -w 3

I launched 3 nginx containers (each container is scheduled on one node)
kubectl run nginx1 --image=nginx --replicas=3 --port=80

I started on each node (directly)
tcpdump -nnvvSSs 1514 -i eth0 | grep GET

I ran a single interactive container with curl:
kubectl run curl1 --image=radial/busyboxplus:curl -i -t --rm
In this container I startet
curl http://10.244...../MYTEST1 >/dev/null

For every connection between the busy box and a container on another node I gut a nice
GET /MYTEST1 HTTP/1.1 output from tcpdump

Therefore the traffic is no going flannel->wg0->encryption->eth0->and on but is going directly flannel->eth0, which means the traffic is not encrypted.

I also got some other GET / for instance when kubectl connected a terminal to a pod, so it appears that also not all control plane traffic is encrypted, except etcd, which seems to really use wg0

pierreozoux · 2018-06-06T11:32:58Z

Yes, noticed that too.

I think what is happening is that wireguard is created, then kubeadm is configured to use wireguard IPs, and then when flannel starts, it discovers interface on its own, and choose eth.

I had to deploy my cluster with kubeadm (additional worker failing), and I used that config for flannel on top of wireguard:
https://git.indie.host/indiehost/standard/blob/master/kube-flannel.yml#L74-85

Hope it helps!

jowenn · 2018-06-10T13:43:05Z

I've tried removing everything flannel related and used your file with kubectl apply and with kubectl create in both cases the result is that the pods are in crash loop. From the logs I can see that the post startup command errors out with a "Address already in use" message

pierreozoux · 2018-06-10T13:48:42Z

Yeah, maybe you have to clean some folders too, and iterfaces.

I've used this command when I used kubeadm, to reset node:

systemctl stop kubelet
systemctl stop docker
rm -rf /var/lib/cni/
rm -rf /var/lib/kubelet/*
rm -rf /etc/cni/
ifconfig cni0 down
ifconfig flannel.1 down
ifconfig docker0 down

And reboot the node, it might help, but be careful, I'm not sure the node will join again the cluster. Try this at your own risk, on staging cluster.

xetys · 2018-06-17T09:49:22Z

Your approach is generating new keys Everytime the network is restarting. That would lead to other nodes to lack the new public key. Isn't there a way just to specify which network interface should use?

simonkern · 2018-06-26T14:14:04Z

@pierreozoux's solution looks quite similar to flannel's official wireguard extension:
https://github.com/coreos/flannel/blob/master/dist/extension-wireguard

In the docs, it says about the PreStartupCommand which includes the key generation command:

Command to run before allocating a network to this host
The stdout of the process is captured and passed to the stdin of the SubnetAdd/Remove commands.

See: https://github.com/coreos/flannel/blob/master/Documentation/extension.md

pierreozoux · 2018-07-11T07:24:04Z

@simonkern yes sorry, I should have linked to official doc instead of my folder :)

monofone · 2018-08-08T07:39:02Z

There is an option --iface to flanneld which can take an Interface name. By patching this in the flannel manifest and set to wg0 this should work already.

voron · 2018-08-17T18:35:11Z

@monofone

There is an option --iface to flanneld which can take an Interface name.

I tried --iface wg0 without success, no internal traffic worked. Didn't dig into it unfortunately.

monofone · 2018-08-20T14:50:12Z

yes that's right, I experienced this behavior too

voron · 2018-08-20T21:10:36Z

Looks like I found a work-around

Add --iface wg0 (2 lines) to flannel command line in kube-system flannel DaemonSet, kill all flannel pods to apply

echo 'spec:
  template:
    spec:
      containers:
      - args:
        - --ip-masq
        - --kube-subnet-mgr
        - --iface
        - wg0
        name: kube-flannel'|kubectl -n kube-system patch ds kube-flannel-ds --patch "$(cat)"

kubectl -n kube-system delete pod -l 'app=flannel'

disable TX checksum offload for flannel.1 interface on all servers via

ethtool -K flannel.1 tx off

Just a PoC to test, we'll need to make checksum offload permanent in case of implementation of this solution

xetys · 2018-08-23T18:08:13Z

I like your approach. Could you explain what's happening here?

About the permanent solution, is systemd a good way to do that? That's what I would do here if this solution is sustainable.

voron · 2018-08-23T21:05:12Z

I like your approach. Could you explain what's happening here?

As @monofone mentioned, there is enough to specify --iface wg0 arg to flanneld to get flannel to work via wg0 instead of default eth0. flanneld reports wg0's IP to k8s and then all nodes peer using node metadata annotations from k8s. ICMP starts to work already, but TCP is really-really buggy. I didn't investigate UDP though. I started to research TCP problem using this article as a base and got tcpdump -v checksum mismatch errors when service doesn't respond and normal rare TCP connections w/o checksum mismatch when service responded as expected. Same high TcpInCsumErrors values from nstat inside pod. All this pointed me to checksum offload problem. I started to disable TCP checksum offload on all adapters, but it looks like flannel.1 is good enough. Here is similar issue with Azure. It looks to me like a (kernel/driver) checksum offload bug with flannel vxlan encapsulation(all inside udp) inside wireguard encapsulation(all inside udp).

I got near 1500Mbit iperf3 w/o wireguard and near 900Mbit with wireguard in my performance tests on cx11 instances. IMHO flannel tcp checksum offload doesn't affect speed noticeably, especially compared to wireguard encryption.

About the permanent solution, is systemd a good way to do that?

I was thinking about some PostStartupCommand like with flannel extensions, but it looks like flannel doesn't support such option with build-in backends, while extension backends are not recommended for production use. flanneld re-creates it's interface in case of some configuration changes and so on. Thus udev hook, with RUN script or systemd event, as you asked, are possible. Here is systemd option:

Create file /etc/udev/rules.d/71-flannel.rules with line like

SUBSYSTEM=="net", ACTION=="add", KERNEL=="flannel.*", TAG+="systemd", ENV{SYSTEMD_WANTS}="flannel-created@%k.service"

Create systemd unit /etc/systemd/system/flannel-created@.service

[Unit]
Description=Disable TX checksum offload on flannel interface
[Service]
Type=oneshot
ExecStart=/sbin/ethtool -K %I tx off

Reload via systemctl daemon-reload and systemctl restart systemd-udevd.service

It did the job for me

xetys · 2018-08-24T07:05:47Z

Does this persist after doing reboots, because of the "oneshot" type?

voron · 2018-08-24T08:05:03Z

It runs on every flannel.* interface creation. Thus it persists after reboots too, as flannel starts on every k8s node boot and creates it's interface.

xetys · 2018-08-24T08:27:29Z

Then I think that is great material for a PR, WDYT?

quorak · 2018-09-23T07:18:06Z

Hey guys,

just read through the issue here. Do I understand correctly, that the traffic between the nodes is not encrypted, opposed to #128 ?

Maybe we should declare this explicit in the repos README, as long as we got this covered. Might be a deal breaker for some evaluators.

best

segator · 2019-02-24T11:49:41Z

did you try hostgw instead of vxlan over wireguard?
The performance should be better I think.

ghost · 2020-05-21T01:28:51Z

I have the same question, currently the traffic inter nodes is not encrypted? So example if my webapp queries my mysql backend on a second node, the data can be sniffed?

renanqts · 2021-05-15T17:35:48Z

hostgw

It didn't work because it creates a route pointing CIDR of the PODs to Wireguard IPs as a gateway, but Wireguard needs CIDR in AllowedIPs.

segator · 2021-05-16T10:55:39Z

hostgw

It didn't work because it creates a route pointing CIDR of the PODs to Wireguard IPs as a gateway, but Wireguard needs CIDR in AllowedIPs.

Yes and you can configure the CIDR on the allowed IPs on each WG node, I have it working since year ago :)

renanqts · 2021-05-16T11:11:33Z

~~with hostgw?~~
~~I saw it in the logs when I tried.~~

Replacing existing route to 10.42.0.0/24 via 10.42.0.0 dev index 17 with 10.42.0.0/24 via 10.253.3.1 dev index 4

Forget, It work even with this log :D

xetys added this to To do in 1.0 via automation Jun 29, 2018

voron linked a pull request Aug 25, 2018 that will close this issue

Route flannel via existing wireguard interface #178

Open

xetys moved this from To do to In progress in 1.0 Mar 31, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Routing of Flannel traffic over wireguard #139

Routing of Flannel traffic over wireguard #139

jowenn commented Jun 1, 2018

pierreozoux commented Jun 6, 2018

jowenn commented Jun 10, 2018

pierreozoux commented Jun 10, 2018

xetys commented Jun 17, 2018

simonkern commented Jun 26, 2018 •

edited

Loading

pierreozoux commented Jul 11, 2018

monofone commented Aug 8, 2018

voron commented Aug 17, 2018

monofone commented Aug 20, 2018

voron commented Aug 20, 2018

xetys commented Aug 23, 2018

voron commented Aug 23, 2018 •

edited

Loading

xetys commented Aug 24, 2018

voron commented Aug 24, 2018 •

edited

Loading

xetys commented Aug 24, 2018

quorak commented Sep 23, 2018

segator commented Feb 24, 2019

ghost commented May 21, 2020 •

edited by ghost

Loading

renanqts commented May 15, 2021 •

edited

Loading

segator commented May 16, 2021

renanqts commented May 16, 2021 •

edited

Loading

Routing of Flannel traffic over wireguard #139

Routing of Flannel traffic over wireguard #139

Comments

jowenn commented Jun 1, 2018

pierreozoux commented Jun 6, 2018

jowenn commented Jun 10, 2018

pierreozoux commented Jun 10, 2018

xetys commented Jun 17, 2018

simonkern commented Jun 26, 2018 • edited Loading

pierreozoux commented Jul 11, 2018

monofone commented Aug 8, 2018

voron commented Aug 17, 2018

monofone commented Aug 20, 2018

voron commented Aug 20, 2018

xetys commented Aug 23, 2018

voron commented Aug 23, 2018 • edited Loading

xetys commented Aug 24, 2018

voron commented Aug 24, 2018 • edited Loading

xetys commented Aug 24, 2018

quorak commented Sep 23, 2018

segator commented Feb 24, 2019

ghost commented May 21, 2020 • edited by ghost Loading

renanqts commented May 15, 2021 • edited Loading

segator commented May 16, 2021

renanqts commented May 16, 2021 • edited Loading

simonkern commented Jun 26, 2018 •

edited

Loading

voron commented Aug 23, 2018 •

edited

Loading

voron commented Aug 24, 2018 •

edited

Loading

ghost commented May 21, 2020 •

edited by ghost

Loading

renanqts commented May 15, 2021 •

edited

Loading

renanqts commented May 16, 2021 •

edited

Loading