Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

router_linux: starts off by assuming route table 52 is empty #3309

Closed
apenwarr opened this issue Nov 15, 2021 · 5 comments
Closed

router_linux: starts off by assuming route table 52 is empty #3309

apenwarr opened this issue Nov 15, 2021 · 5 comments
Labels

Comments

@apenwarr
Copy link
Member

apenwarr commented Nov 15, 2021

What is the issue?

If tailscale aborts abnormally, it will leave routes in table 52 (ip route list table 52). On the next run, it will assume that table is empty, and then try to re-add all the same routes.

Example logs:

Nov 07 17:14:26 tailscaled[2293337]: router: route add failed: running "ip route add 192.168.0.0/24 dev tailscale0 table 52" failed: exit status 2
Nov 07 17:14:26 tailscaled[2293337]: RTNETLINK answers: File exists
Nov 07 17:14:26 tailscaled[2293337]: router: route add failed: running "ip route add fe80::/64 dev tailscale0 table 52" failed: exit status 2
Nov 07 17:14:26 tailscaled[2293337]: RTNETLINK answers: File exists
Nov 07 17:14:26 tailscaled[2293337]: router: route add failed: running "ip route add fe80::/64 dev tailscale0 table 52" failed: exit status 2
Nov 07 17:14:26 tailscaled[2293337]: RTNETLINK answers: File exists

I guess we should do one of two things:

  1. Read table 52 on startup; or
  2. Flush table 52 on startup.

The latter seems like a more direct and foolproof way to accomplish the goal.

I didn't investigate far enough to see if this actually caused a problem for the customer who reported it. I suspect the errors are just silently ignored after printing, but I didn't read the code carefully enough to be sure. Either way, all the errors in the log are disconcerting and make diagnosing other problems harder. Also, leaving old obsolete routes in the table between runs could lead to surprising problems later.

cc: @danderson @maisem

Steps to reproduce

No response

Are there any recent changes that introduced the issue?

No response

OS

Linux

OS version

No response

Tailscale version

1.16.2

Bug report

BUG-49b7cb276faa4102c241c14f001cecf50b20402bcb502990ad1b265d935ec982-20211110164123Z-073cb916921f8acb

Front logo Front conversations

@apenwarr
Copy link
Member Author

Ah, and I guess we should do a similar table flush in the cleanup() function.

@danderson
Copy link
Member

in wgengine/router/router_liniux.go:

func cleanup(logf logger.Logf, interfaceName string) {
	// TODO(dmytro): clean up iptables.
}

@danderson
Copy link
Member

But also, this should not happen, because deleting the tailscale0 interface deletes these routes - hence why we don't bother cleaning them up, the kernel does so on our behalf. This sounds more like a bug in our route synchronization logic (note it's trying to add 192.168.0.0/24), not a bug in cleanup.

@apenwarr
Copy link
Member Author

Hmm, yeah, I guess I would expect them to go away automatically.

For the customer in question I'm now suspecting they managed to get two copies of tailscale running simultaneously somehow. But I think they'd both try to reserve tailscale0 and one should exit out sooner?

@bradfitz
Copy link
Member

This should be fixed in main as part of the work done on #391 so far (dc2fbf5 looks like the commit in the series which did routes). It now programs this with netlink and thus doesn't get confused by error codes from the "ip" command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants