Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tailscale loses control plane + DERP connectivity when a node loses IPv6 internet connectivity #1726

Closed
danderson opened this issue Apr 18, 2021 · 4 comments

Comments

@danderson
Copy link
Member

Reported by a user on a dual-stacked ISP connection. The ISP had an IPv6 outage, so the ISP router stopped doing v6 router advertisements, and over about 1h the various machines on the LAN lost their global IPv6 address. IPv4 connectivity continued uninterrupted.

However, tailscaled remained stuck on trying to speak IPv6 to control, log, derp, everything... And so had no connectivity. The logs (gracefully pulled out of journald by the user, since debug log collection was also borked) were full of bootstrap DNS attempts at connecting to IPv6 addresses, and failing with "network unreachable", since... yeah, nothing on the machine has any v6 state. Tailscale seemingly was only trying to connect over IPv6, never IPv4.

Restarting tailscaled made it correctly connect to everything over IPv4, so this seems to be an issue with a runtime transition from working v6 to no v6.

Machines affected, for debug log purposes: 100.67.182.67 was trying to ssh to 100.102.43.14 today at 18:56 Pacific. Tailscale on both machines was offline from this issue, so logs are going to be timestamped around 19:18 server time.

@DentonGentry
Copy link
Contributor

Presumably this means tailscaled had resolved the hostnames to an IPv6 address, and never tried any of the IPv4 addresses. I imagine this is still the case now.

@dunmatt
Copy link

dunmatt commented Mar 2, 2023

It is indeed still an issue, I'm running into it now. The strange thing is that we have dozens of supposedly identical machines and only one of them is having this issue.

@DentonGentry
Copy link
Contributor

What would be most useful is a tailscale bugreport from one of the identical systems which is not experiencing a problem and from the system which is. We can look at what their telemetry shows as being different.

@DentonGentry
Copy link
Contributor

We would need a bugreport from a system to be able to diagnose this.

I believe the original issue is no longer actionable, we wouldn't be able to reconstruct what happened now. If a similar symptom is seen again please open a new issue.

@DentonGentry DentonGentry closed this as not planned Won't fix, can't repro, duplicate, stale Jun 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants