Tailscale and Calico netfilter packet marks conflict with each other #591

danderson · 2020-07-23T18:56:54Z

Calico is a network addon for Kubernetes, which implements both connectivity within the cluster and policy enforcement stuff.

Calico takes the upper 16 bits of the netfilter packet mark for itself (aka 0xffff0000). This conflicts with Tailscale's use of 0x40000 and 0x80000, so trying to use Calico and Tailscale on the same Kubernetes machines will probably break Tailscale, or Calico, or both.

Likely this is just a question of documenting that the two cannot run together, unless you run tailscaled in a k8s pod without host network access (which is still useful as a "VPN hosted in k8s" subnet router), or reconfigure Calico to use different mark bits (which Calico supports).

Alternatively we could try to help ourselves to lower bits that don't conflict with Calico or other known users. There are no currently-known uses of the lower 8 bits of the packet mark, although people seem to steer clear of those as belonging to the local sysadmin for their own uses.

cc @bradfitz @apenwarr

apenwarr · 2020-07-23T19:14:51Z

I really think we should consider more heavily the possibility of wideband masks. The only time those conflict is when two different apps try to tag the same packet. I don't know how often that happens but I suspect it might not be a lot, because if two apps think they're entitled to a particular packet, you're probably screwed for other reasons. ᐧ

…

On Thu, Jul 23, 2020 at 2:57 PM Dave Anderson ***@***.***> wrote: Calico is a network addon for Kubernetes, which implements both connectivity within the cluster and policy enforcement stuff. Calico takes the upper 16 bits of the netfilter packet mark for itself (aka 0xffff0000). This conflicts with Tailscale's use of 0x40000 and 0x80000, so trying to use Calico and Tailscale on the same Kubernetes machines will probably break Tailscale, or Calico, or both. Likely this is just a question of documenting that the two cannot run together, unless you run tailscaled in a k8s pod without host network access (which is still useful as a "VPN hosted in k8s" subnet router), or reconfigure Calico to use different mark bits (which Calico supports). Alternatively we could try to help ourselves to lower bits that don't conflict with Calico or other known users. There are no currently-known uses of the lower 8 bits of the packet mark, although people seem to steer clear of those as belonging to the local sysadmin for their own uses. cc @bradfitz <https://github.com/bradfitz> @apenwarr <https://github.com/apenwarr> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#591>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAFA4AS5ELX6GOWDLFGRHLR5CBYPANCNFSM4PGAOX6Q> .

-- Avery Pennarun // CEO @ Tailscale

danderson · 2020-07-23T20:49:58Z

I really don't want to put us in the camp of "uninstall all other networking software, or else". At a minimum, that will make us straight up incompatible with all Kubernetes-related software (all of which uses the bitmasking style on ~all packets flowing through the machine), and we'll have to deal with a myriad of support requests because people will keep trying to make it work on k8s no matter what we do. On the other hand, with bitmasking I believe we can make it work, because aside from the blanket marking k8s and friends don't actually care about our packets (but will drop/mishandle them if their mark isn't present), so the behaviors aren't in conflict.

You could argue that this is an indictment of how packet processing works on linux, and I'd agree with you. But breaking all k8s-related uses isn't something we should do lightly.

apenwarr · 2020-07-23T20:53:08Z

I think we're talking past each other about how this works. I'm not proposing making us incompatible with all other software, but I think there's a way where we can make them all cooperate, *except* in the one condition where a single packet gets tagged by multiple apps (which will probably mean it's broken at other layers anyway). Might need to discuss in a meeting instead.

…

On Thu, Jul 23, 2020 at 4:50 PM Dave Anderson ***@***.***> wrote: I really don't want to put us in the camp of "uninstall all other networking software, or else". At a minimum, that will make us straight up incompatible with all Kubernetes-related software (all of which uses the bitmasking style on ~all packets flowing through the machine), and we'll have to deal with a myriad of support requests because people will keep trying to make it work on k8s no matter what we do. On the other hand, with bitmasking I believe we can make it work, because aside from the blanket marking k8s and friends don't actually care about our packets (but will drop/mishandle them if their mark isn't present), so the behaviors aren't in conflict. You could argue that this is an indictment of how packet processing works on linux, and I'd agree with you. But breaking all k8s-related uses isn't something we should do lightly. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

-- Avery Pennarun // CEO @ Tailscale

danderson · 2020-07-23T20:56:31Z

Yup, sounds good. I'll put it on the agenda for an eng meeting.

DentonGentry · 2021-07-31T03:23:01Z

Since this bug was filed I believe we've moved ACL rules to a new table id unlikely to conflict. We use FWMARK for:

packets destined to Tailscale's own control server and related infrastructure, in net/netns/netns_linux.go
wgengine/router/router_linux.go:addNetfilterBase4()

danopia · 2021-11-16T13:59:25Z

Hi, I just encountered this conflict as a user, when trying to configure a single host as both a Calico Kubernetes node and a Tailscale 1.16.2 exit node. I haven't had issues with other aspects of the nodes, just Tailscale exit routing.

a subset of nft rules from conflicted node, showing 0x40000 reuse

# nft list ruleset | grep 0x40000
                iifname "tailscale0" counter packets 893 bytes 53580 meta mark set 0x40000 
                mark 0x40000 counter packets 893 bytes 53580 accept
                iifname "tailscale0" counter packets 39 bytes 3120 meta mark set 0x40000 
                mark 0x40000 counter packets 39 bytes 3120 accept
                mark 0x40000 counter packets 0 bytes 0 masquerade 
                mark 0x40000 counter packets 0 bytes 0 masquerade  
                iifname "cali*"  counter packets 0 bytes 0 meta mark set mark or 0x40000 
                 mark and 0x40000 == 0x40000 fib saddr . mark . iif oif 0 counter packets 0 bytes 0 drop
                 mark and 0x40000 == 0x0 counter packets 2586 bytes 536822 jump cali-from-host-endpoint
                iifname "cali*"  counter packets 0 bytes 0 meta mark set mark or 0x40000 
                 mark and 0x40000 == 0x40000 fib saddr . mark . iif oif 0 counter packets 0 bytes 0 drop
                 mark and 0x40000 == 0x0 counter packets 2094 bytes 309770 jump cali-from-host-endpoint

As mentioned already in this thread, Calico supports remapping its marks. I added this envvar to my Calico daemonset which restored proper exit-routing behavior.

            - name: FELIX_IPTABLESMARKMASK
              value: "0xff00ff00"

mayakacz · 2022-10-15T01:06:54Z

Is this still an issue?
It sounds like this shouldn't occur if Tailscale is running in a container in Kubernetes, but may occur if Tailscale is running on a node(?) in a Kubernetes cluster.
There were also changes to exit nodes in 1.20.

DentonGentry · 2022-10-15T01:45:17Z

I suspect it will still be an issue, the use of fwmark can stomp on each other. #3310 (comment) proposes a general fix which might solve this and issue as well.

ncfavier · 2022-10-15T07:50:03Z

This seems orthogonal to #3310 (comment). To avoid the conflict, tailscale would have to use a different bit range.

SnoFox · 2024-02-05T10:06:32Z

I have spent days debugging and working around this by trying to change Tailscales behavior and only finding how to work around this properly by way of a complete outage. As Tailscale offers --netfilter-mode=off I had changed the fwmarks in my manually-created iptables.

FR: Docs change to list known incompatibilities, such as Calico, and potential workarounds, like #591 (comment) (which was a simple change in K8s that cleanly rolled out and fixed everything).

danderson added L2 Few Likelihood P5 Halts deployment Priority level T8 Crash Issue type kubernetes labels Aug 14, 2020

bradfitz added vpn-interop OS-linux labels Aug 2, 2021

This was referenced Oct 15, 2022

Alpine linux: rp_filter=strict prevents exit node from working; save and restore fwmark? #3310

Open

NixOS machines with firewall management enabled can't use exit nodes #4432

Closed

dvcrn mentioned this issue Nov 1, 2022

Kubernetes cluster with Calico can't establish direct connections #6157

Closed

tkaepp mentioned this issue Mar 18, 2024

Allow IPTables mark mask to be configurable for canal rancher/rke2-charts#422

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tailscale and Calico netfilter packet marks conflict with each other #591

Tailscale and Calico netfilter packet marks conflict with each other #591

danderson commented Jul 23, 2020

apenwarr commented Jul 23, 2020 via email

danderson commented Jul 23, 2020

apenwarr commented Jul 23, 2020 via email

danderson commented Jul 23, 2020

DentonGentry commented Jul 31, 2021 •

edited

danopia commented Nov 16, 2021 •

edited

mayakacz commented Oct 15, 2022

DentonGentry commented Oct 15, 2022

ncfavier commented Oct 15, 2022

SnoFox commented Feb 5, 2024 •

edited

Tailscale and Calico netfilter packet marks conflict with each other #591

Tailscale and Calico netfilter packet marks conflict with each other #591

Comments

danderson commented Jul 23, 2020

apenwarr commented Jul 23, 2020 via email

danderson commented Jul 23, 2020

apenwarr commented Jul 23, 2020 via email

danderson commented Jul 23, 2020

DentonGentry commented Jul 31, 2021 • edited

danopia commented Nov 16, 2021 • edited

mayakacz commented Oct 15, 2022

DentonGentry commented Oct 15, 2022

ncfavier commented Oct 15, 2022

SnoFox commented Feb 5, 2024 • edited

DentonGentry commented Jul 31, 2021 •

edited

danopia commented Nov 16, 2021 •

edited

SnoFox commented Feb 5, 2024 •

edited