Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tailscale router fails on alpine 3.19.0, breaking MagicDNS #10540

Closed
isaacbowen opened this issue Dec 9, 2023 · 15 comments
Closed

Tailscale router fails on alpine 3.19.0, breaking MagicDNS #10540

isaacbowen opened this issue Dec 9, 2023 · 15 comments

Comments

@isaacbowen
Copy link

What is the issue?

The main thing:

health("router"): error: setting up filter/ts-input: running [/sbin/iptables -t filter -N ts-input --wait]: exit status 4: iptables v1.8.10 (nf_tables): Could not fetch rule set generation id: Invalid argument

The machine can connect to other machines in the tailnet by ip address, but not via MagicDNS.

Complete logs from a recent bootup:

2023/12/09 13:28:37 logtail started
2023/12/09 13:28:37 Program starting: v1.54.1-tb78b24570, Go 1.21.4: []string{"tailscaled", "--state=/var/lib/tailscale/tailscaled.state", "--socket=/var/run/tailscale/tailscaled.sock"}
2023/12/09 13:28:37 LogID: 4de9ac0d2e29fe4712c0a1e555c47b8e0263df745e9675d34e9a03a39dbdbf22
2023/12/09 13:28:37 logpolicy: using system state directory "/var/lib/tailscale"
logpolicy.ConfigFromFile /var/lib/tailscale/tailscaled.log.conf: open /var/lib/tailscale/tailscaled.log.conf: no such file or directory
logpolicy.Config.Validate for /var/lib/tailscale/tailscaled.log.conf: config is nil
2023/12/09 13:28:37 wgengine.NewUserspaceEngine(tun "tailscale0") ...
2023/12/09 13:28:37 router: default choosing iptables
2023/12/09 13:28:37 router: v6nat = false
2023/12/09 13:28:37 router: failed to determine ip command fwmask support: exit status 1
2023/12/09 13:28:37 dns: [rc=unknown ret=direct]
2023/12/09 13:28:37 dns: using "direct" mode
2023/12/09 13:28:37 dns: using *dns.directManager
2023/12/09 13:28:37 link state: interfaces.State{defaultRoute=eth0 ifs={eth0:[172.19.0.26/29 172.19.0.27/29 2604:1380:4500:b1e:0:f520:de0:1/127 fdaa:1:dac8:a7b:11a:f520:de0:2/112 llu6]} v4=true v6=true}
2023/12/09 13:28:37 magicsock: disco key = d:37c839d5cdab041c
2023/12/09 13:28:37 Creating WireGuard device...
2023/12/09 13:28:37 Bringing WireGuard device up...
2023/12/09 13:28:37 Bringing router up...
2023/12/09 13:28:37 Clearing router settings...
2023/12/09 13:28:37 Starting network monitor...
2023/12/09 13:28:37 Engine created.
2023/12/09 13:28:37 external route: up
2023/12/09 13:28:37 pm: migrating "_daemon" profile to new format
2023/12/09 13:28:37 envknob: PORT="8080"
2023/12/09 13:28:37 logpolicy: using system state directory "/var/lib/tailscale"
2023/12/09 13:28:37 got LocalBackend in 28ms
2023/12/09 13:28:37 Start
2023/12/09 13:28:37 Backend: logs: be:4de9ac0d2e29fe4712c0a1e555c47b8e0263df745e9675d34e9a03a39dbdbf22 fe:
2023/12/09 13:28:37 Switching ipn state NoState -> NeedsLogin (WantRunning=false, nm=false)
2023/12/09 13:28:37 blockEngineUpdates(true)
2023/12/09 13:28:37 wgengine: Reconfig: configuring userspace WireGuard config (with 0/0 peers)
2023/12/09 13:28:37 wgengine: Reconfig: configuring router
2023/12/09 13:28:37 wgengine: Reconfig: configuring DNS
2023/12/09 13:28:37 dns: Set: {DefaultResolvers:[] Routes:{} SearchDomains:[] Hosts:0}
2023/12/09 13:28:37 dns: Resolvercfg: {Routes:{} Hosts:0 LocalDomains:[]}
2023/12/09 13:28:37 dns: OScfg: {}
2023/12/09 13:28:37 health("overall"): error: state=NeedsLogin, wantRunning=false
2023/12/09 13:28:37 Start
2023/12/09 13:28:37 generating new machine key
2023/12/09 13:28:37 machine key written to store
2023/12/09 13:28:37 control: client.Shutdown()
2023/12/09 13:28:37 control: client.Shutdown
2023/12/09 13:28:37 control: mapRoutine: exiting
2023/12/09 13:28:37 control: authRoutine: exiting
2023/12/09 13:28:37 control: updateRoutine: exiting
2023/12/09 13:28:37 control: Client.Shutdown done.
2023/12/09 13:28:37 Backend: logs: be:4de9ac0d2e29fe4712c0a1e555c47b8e0263df745e9675d34e9a03a39dbdbf22 fe:
2023/12/09 13:28:37 Switching ipn state NoState -> NeedsLogin (WantRunning=true, nm=false)
2023/12/09 13:28:37 blockEngineUpdates(true)
2023/12/09 13:28:37 Reconfig(down): no changes made to Engine config
2023/12/09 13:28:37 StartLoginInteractive: url=false
2023/12/09 13:28:37 control: client.Login(false, 2)
2023/12/09 13:28:37 control: LoginInteractive -> regen=true
2023/12/09 13:28:37 control: doLogin(regen=true, hasUrl=false)
2023/12/09 13:28:38 control: control server key from https://controlplane.tailscale.com: ts2021=[fSeS+], legacy=[nlFWp]
2023/12/09 13:28:38 control: Generating a new nodekey.
2023/12/09 13:28:38 control: RegisterReq: onode= node=[1vQtb] fup=false nks=false
2023/12/09 13:28:38 control: RegisterReq: got response; nodeKeyExpired=false, machineAuthorized=true; authURL=false
2023/12/09 13:28:38 blockEngineUpdates(false)
2023/12/09 13:28:39 control: netmap: got new dial plan from control
2023/12/09 13:28:39 active login: [redacted]
2023/12/09 13:28:39 monitor: gateway and self IP changed: gw=172.19.0.25 self=172.19.0.26
2023/12/09 13:28:39 Switching ipn state NeedsLogin -> Starting (WantRunning=true, nm=true)
2023/12/09 13:28:39 magicsock: SetPrivateKey called (init)
2023/12/09 13:28:39 wgengine: Reconfig: configuring userspace WireGuard config (with 0/21 peers)
2023/12/09 13:28:39 wgengine: Reconfig: configuring router
2023/12/09 13:28:39 health("router"): error: setting up filter/ts-input: running [/sbin/iptables -t filter -N ts-input --wait]: exit status 4: iptables v1.8.10 (nf_tables): Could not fetch rule set generation id: Invalid argument
2023/12/09 13:28:39 peerapi: serving on http://[redacted]:35402
2023/12/09 13:28:39 peerapi: serving on http://[redacted]:34698
2023/12/09 13:28:39 magicsock: home is now derp-1 (nyc)
2023/12/09 13:28:39 magicsock: endpoints changed: 147.75.50.175:8440 (stun), 147.75.50.175:8080 (stun4localport), [2604:1380:4500:b1e:0:f520:de0:1]:8080 (stun), 172.19.0.26:8080 (local), 172.19.0.27:8080 (local)
2023/12/09 13:28:39 magicsock: adding connection to derp-1 for home-keep-alive
2023/12/09 13:28:39 magicsock: 1 active derp conns: derp-1=cr0s,wr0s
2023/12/09 13:28:39 Switching ipn state Starting -> Running (WantRunning=true, nm=true)
2023/12/09 13:28:39 control: NetInfo: NetInfo{varies=true hairpin=false ipv6=true ipv6os=true udp=true icmpv4=false derp=#1 portmap= link="" firewallmode="ipt-default"}
2023/12/09 13:28:39 derphttp.Client.Connect: connecting to derp-1 (nyc)

Steps to reproduce

tailscale up --authkey=${TAILSCALE_AUTHKEY} --timeout=60s

Are there any recent changes that introduced the issue?

The error shows up using the ruby:alpine docker image, which just received an update to alpine 3.19.

docker-library/ruby#433
https://www.alpinelinux.org/posts/Alpine-3.19.0-released.html

The new version of alpine bumps the version of iptables:

# /sbin/iptables --version
iptables v1.8.10 (nf_tables)

For comparison, this was the previous version used:

# /sbin/iptables --version
iptables v1.8.9 (legacy)

OS

Linux

OS version

Alpine 3.19

Tailscale version

1.54.1

Other software

No response

Bug report

BUG-4de9ac0d2e29fe4712c0a1e555c47b8e0263df745e9675d34e9a03a39dbdbf22-20231209135110Z-ea37df7f8d02ead5

@irbekrm
Copy link
Contributor

irbekrm commented Dec 10, 2023

Thank you for the detailed issue description.

As a temporary workaround are you able to use nftables on your system?
To force tailscale to use nftables, you can set TS_DEBUG_FIREWALL_MODE=nftables

@irbekrm
Copy link
Contributor

irbekrm commented Dec 10, 2023

I built a tailscale container from alpine:3.19 base and experimented with this a bit more.

I was able to reproduce the error that you saw in cases where the underlying host does not support nftables (GKE with Google COS nodes)

On hosts that do support nftables (GKE with Ubuntu nodes), it seemed like tailscale was able to run in an alpine:3.19 container even when using iptables to configure firewall (I have tested with our Kubernetes ingress/egress proxies that install the regular Tailscale firewall rules as well as some additional rules.)

My understanding of the alpine:3.19 change that links iptables binary to iptables-nft

If that's correct, you might want to check whether your host supports nftables. There are some instructions here- I have personally always tested by running nft commands directly since I usually test on Kubernetes nodes where it is not always possible to run modprobe etc.

If my assumption about their change is correct, there is also nothing we can do from our side. I am not sure what we can do to upgrade our tailscale/tailscale base image to avoid breaking a portion of GKE users. Perhaps we'll have to switch the base image or change their links.

Some logs from within alpine:3.19 tailscale container that was able to run using iptables to configure firewall:

/ # tailscale --version 1.55.157 track: unstable (dev); frequent updates and bugs are likely tailscale commit: bac0df6-dirty go version: go1.21.5

/ # iptables --version
iptables v1.8.10 (nf_tables)

/ # readlink -f /sbin/iptables
/sbin/xtables-nft-multi

/ # iptables -t filter -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
ts-input all -- anywhere anywhere

Chain FORWARD (policy ACCEPT)
target prot opt source destination
ts-forward all -- anywhere anywhere

Chain OUTPUT (policy ACCEPT)
target prot opt source destination

Chain PREROUTING (0 references)
target prot opt source destination

Chain ts-forward (1 references)
target prot opt source destination
MARK all -- anywhere anywhere MARK xset 0x40000/0xff0000
ACCEPT all -- anywhere anywhere mark match 0x40000/0xff0000
DROP all -- 100.64.0.0/10 anywhere
ACCEPT all -- anywhere anywhere

Chain ts-input (1 references)
target prot opt source destination
ACCEPT all -- 100.64.86.94 anywhere
RETURN all -- 100.115.92.0/23 anywhere
DROP all -- 100.64.0.0/10 anywhere
ACCEPT all -- anywhere anywhere
ACCEPT udp -- anywhere anywhere udp dpt:45620
/ # exit

@isaacbowen
Copy link
Author

I'm in a little bit over my head here 😅 but am working my way through this.

In the meantime, for context, I'm running these containers on Fly.io. I'll send this issue their way to see if they've got input.

@irbekrm
Copy link
Contributor

irbekrm commented Dec 11, 2023

I'll send this issue their way to see if they've got input.

Keen to hear what they say. I see this issue from earlier in the year where someone asks about nftables support https://community.fly.io/t/nftables-support/10259

Are you providing your own VM or using one of theirs (for context, I don't know much about Fly.io)?

@isaacbowen
Copy link
Author

Are you providing your own VM or using one of theirs (for context, I don't know much about Fly.io)?

Both? :D (Again, slightly out of my depth here.) Fly uses Firecracker, so they're constructing VMs from our Docker images.

The crew at Fly got back to me, and had this to say:

It's indeed on our side - our guest kernels don't have nftables support:

56833260f77d18:/# zcat /proc/config.gz |grep CONFIG_NF_TABLES
# CONFIG_NF_TABLES is not set

iptables is fully supported though, so one thing you can do is swap iptables-nft (which is what you're using, probably due to a symlink) to iptables-legacy; that should work and keep tailscale happy.

I'm not sure how to do it in Alpine-based distros; on Debian-based ones you'd update-alternatives --set iptables /usr/sbin/iptables-legacy . Assuming tailscale just calls iptables that should point it to the right thing.

I haven't been successful yet, but I'll keep on keeping on. :)

@isaacbowen
Copy link
Author

Yup, the approach Fly suggested is working! Our Dockerfiles now include this bit:

# tailscale dependencies (nb: fly doesn't support nftables, so we gotta use iptables-legacy)
RUN apk add --no-cache ca-certificates iptables iptables-legacy
RUN rm /sbin/iptables && ln -s /sbin/iptables-legacy /sbin/iptables
RUN rm /sbin/ip6tables && ln -s /sbin/ip6tables-legacy /sbin/ip6tables

I don't have any idea what this means for this issue. :) @irbekrm, back to you?

@irbekrm
Copy link
Contributor

irbekrm commented Dec 12, 2023

Hey @isaacbowen thank you very much for letting us know the feedback from Fly.io folks! I am glad that you were able to get it working.

My understanding is that nftables is the future. I would imagine that Fly folks will eventually start supporting it too- perhaps there are some technical issues that are making it hard for them. Generally, I am guessing it is now expected that most cloud providers etc would support nftables- it seems like that was the assumption from Alpine when they linked the iptables to nftables.

From our side I think we wil have to update our documentation for Fly.io https://tailscale.com/kb/1132/flydotio/#step-2-configure-your-dockerfile-to-install-tailscale- at the moment I am not sure what else we could do.

Yup, the approach Fly suggested is working! Our Dockerfiles now include this bit

Do you foresee any issues with this flow (it looks fine to me, I'm just curious)?

@isaacbowen
Copy link
Author

I'm on Fly specifically for the philosophy in their approach; from what I understand, I would be very surprised if they never gained nftables support. :)

No, I don't see any issues with the current setup. (Thanks for asking!) I don't love having to choose a legacy implementation, obviously, but I don't mind it here. Feels transitional.

@irbekrm
Copy link
Contributor

irbekrm commented Dec 16, 2023

@isaacbowen we've updated our docs for Fly.io - using bits of the Dockerfile config that you shared- thank you for that!

(I also ran a test on Fly.io and could reproduce both the issue that you saw and also verify the fix).

I am guessing we can now close this issue.

@isaacbowen
Copy link
Author

Fantastic! Yep, that sounds like all there is to be done. I’ll close it out.

Also, thanks for this exchange! I’m very grateful. :)

@isaacbowen
Copy link
Author

Fly now supports nftables! Yay! :D

https://community.fly.io/t/kernel-nftables-support/17669

@irbekrm I'm not sure what the right process is for updating Tailscale docs; can you have a look?

@irbekrm
Copy link
Contributor

irbekrm commented Jan 16, 2024

Hi @isaacbowen thank you very much for the heads up and thanks for raising this issue with the Fly folks!

I've tried it out and can confirm that it was working for me too (including MagicDNS functioning) after updating the machine as per their docs. I'm going to update our docs. It's awesome that they fixed this so quick :)

@lane-ftw
Copy link

lane-ftw commented Jan 24, 2024

Took me an hour of googling to find this.

# tailscale dependencies (nb: fly doesn't support nftables, so we gotta use iptables-legacy)
RUN apk add --no-cache ca-certificates iptables iptables-legacy
RUN rm /sbin/iptables && ln -s /sbin/iptables-legacy /sbin/iptables
RUN rm /sbin/ip6tables && ln -s /sbin/ip6tables-legacy /sbin/ip6tables

Works on a new LXC install of alpine 3.19 on proxmox. Not using docker. Thanks for the fix y'all.

@hugorn
Copy link

hugorn commented Feb 1, 2024

# tailscale dependencies (nb: fly doesn't support nftables, so we gotta use iptables-legacy)
RUN apk add --no-cache ca-certificates iptables iptables-legacy
RUN rm /sbin/iptables && ln -s /sbin/iptables-legacy /sbin/iptables
RUN rm /sbin/ip6tables && ln -s /sbin/ip6tables-legacy /sbin/ip6tables

Can you share the download link for the proxmox lxc template for alpine version 3.19?

@lane-ftw
Copy link

# tailscale dependencies (nb: fly doesn't support nftables, so we gotta use iptables-legacy)
RUN apk add --no-cache ca-certificates iptables iptables-legacy
RUN rm /sbin/iptables && ln -s /sbin/iptables-legacy /sbin/iptables
RUN rm /sbin/ip6tables && ln -s /sbin/ip6tables-legacy /sbin/ip6tables

Can you share the download link for the proxmox lxc template for alpine version 3.19?

Should have been more specific, Proxmox LXC is 3.18, upgraded to 3.19 via https://wiki.alpinelinux.org/wiki/Upgrading_Alpine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants