Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS resolution not working after turning exit node #3842

Closed
cepera-ang opened this issue Jan 30, 2022 · 17 comments
Closed

DNS resolution not working after turning exit node #3842

cepera-ang opened this issue Jan 30, 2022 · 17 comments
Labels
bug Bug dns exit-node Exit node related L2 Few Likelihood P2 Aggravating Priority level T5 Usability Issue type

Comments

@cepera-ang
Copy link

What is the issue?

After updating from Tailscale 1.18 to Tailscale 1.20.2 I no longer can use exit node functionality. I have ubuntu cloud machine as exit node (named vpn) and a windows machine. After enabling exit node on windows I get all DNS requests going to 100.100.100.100 and dying in timeout. The same requests from older version work flawlessly.

For ex, windows, 1.20.2, exit node off:

λ nslookup github.com
Server:  one.one.one.one
Address:  1.1.1.1

Non-authoritative answer:
Name:    github.com
Address:  140.82.121.3

Windows, 1.20.2, exit node on:

λ nslookup github.com
Server:  UnKnown
Address:  100.100.100.100

DNS request timed out.
    timeout was 2 seconds.
DNS request timed out.
    timeout was 2 seconds.
DNS request timed out.
    timeout was 2 seconds.

Linux, 1.18.2, exit node on or off, whatever (same result) :

user@user-pi:~$ nslookup github.com 100.100.100.100
Server:         100.100.100.100
Address:        100.100.100.100#53

Non-authoritative answer:
Name:   github.com
Address: 140.82.121.4

Linux, 1.20.2, exit node on:

user@user-pi:~$ tailscale version
1.20.2
  tailscale commit: 312750ddd288cf4073cfaef56a45102b9c1e8421
  other commit: 2c164d9c7443e2f3014fa54ea45e946b35152680
  go version: go1.17.6-tse44d304e54
user@user-pi:~$ nslookup github.com 100.100.100.100
;; connection timed out; no servers could be reached

Well, anyway, it seems like 100.100.100.100 not working anywhere in 1.20.2 for me.

I see some changes related to DNS and exit nodes in release notes. Is there some configuration I have to do, in order to get this working again?

Steps to reproduce

No response

Are there any recent changes that introduced the issue?

Updated Tailscale everywhere to the latest version.

OS

Linux, Windows

OS version

Ubuntu 20.04.3 LTS (GNU/Linux 5.11.0-1027-oracle aarch64), Microsoft Windows [Version 10.0.19044.1466]

Tailscale version

1.20.2

Bug report

BUG-c2f835af9713719097081eaf7976601903d023065d119901ad8e2e1799922664-20220130093427Z-bd88e452804a0817

@Murgeye
Copy link

Murgeye commented Jan 30, 2022

I had the same issue and it was a firewall issue. After setting

ufw allow in on tailscale0 to any port 64707

on the Exit node, DNS works again.

@cepera-ang
Copy link
Author

Well, I have no firewall running on exit node at all. There is also nothing in the logs of exit node indicating any kind of connection attempt.

Ok, looked at logs at both sides and there are this log on the client side (tailscale on ubuntu on raspberry pi) while trying to make a lookup:

янв 30 20:54:59 user-pi tailscaled[884]: dns: error: Post "http://100.127.227.19:38798/dns-query": dial tcp 100.127.227.19:38798: connect: no route to host
янв 30 20:54:59 user-pi tailscaled[884]: open-conn-track: timeout opening (TCP 100.102.69.110:46312 => 100.127.227.19:38798) to node [Vdhdo]; online=yes, lastRecv=5s
янв 30 20:55:04 user-pi tailscaled[884]: dns: error: Post "http://100.127.227.19:38798/dns-query": dial tcp 100.127.227.19:38798: connect: no route to host
янв 30 20:55:04 user-pi tailscaled[884]: open-conn-track: timeout opening (TCP 100.102.69.110:46314 => 100.127.227.19:38798) to node [Vdhdo]; online=yes, lastRecv=10s
янв 30 20:55:04 user-pi tailscaled[884]: dns: error: Post "http://100.127.227.19:38798/dns-query": dial tcp 100.127.227.19:38798: connect: no route to host
янв 30 20:55:04 user-pi tailscaled[884]: open-conn-track: timeout opening (TCP 100.102.69.110:46316 => 100.127.227.19:38798) to node [Vdhdo]; online=yes, lastRecv=10s
янв 30 20:55:09 user-pi tailscaled[884]: open-conn-track: timeout opening (TCP 100.102.69.110:46318 => 100.127.227.19:38798) to node [Vdhdo]; online=yes, lastRecv=5s
янв 30 20:55:09 user-pi tailscaled[884]: dns: error: Post "http://100.127.227.19:38798/dns-query": dial tcp 100.127.227.19:38798: connect: no route to host
янв 30 20:55:10 user-pi tailscaled[884]: Accept: TCP{100.102.69.110:46320 > 100.127.227.19:38798} 60 ok out
янв 30 20:55:15 user-pi tailscaled[884]: open-conn-track: timeout opening (TCP 100.102.69.110:46320 => 100.127.227.19:38798) to node [Vdhdo]; online=yes, lastRecv=10s
янв 30 20:55:15 user-pi tailscaled[884]: dns: error: Post "http://100.127.227.19:38798/dns-query": dial tcp 100.127.227.19:38798: connect: no route to host
янв 30 20:55:20 user-pi tailscaled[884]: dns: error: Post "http://100.127.227.19:38798/dns-query": dial tcp 100.127.227.19:38798: connect: no route to host
янв 30 20:55:20 user-pi tailscaled[884]: open-conn-track: timeout opening (TCP 100.102.69.110:46322 => 100.127.227.19:38798) to node [Vdhdo]; online=yes, lastRecv=5s
янв 30 20:55:20 user-pi tailscaled[884]: Accept: TCP{100.102.69.110:46324 > 100.127.227.19:38798} 60 ok out
янв 30 20:55:25 user-pi tailscaled[884]: dns: error: Post "http://100.127.227.19:38798/dns-query": dial tcp 100.127.227.19:38798: connect: no route to host
янв 30 20:55:25 user-pi tailscaled[884]: open-conn-track: timeout opening (TCP 100.102.69.110:46324 => 100.127.227.19:38798) to node [Vdhdo]; online=yes, lastRecv=10s
янв 30 20:55:30 user-pi tailscaled[884]: portmapper: saw UPnP type WANIPConnection1 at http://192.168.88.1:2828/gateway.xml; MikroTik Router (MikroTik)
янв 30 20:55:30 user-pi tailscaled[884]: open-conn-track: timeout opening (TCP 100.102.69.110:46328 => 100.127.227.19:38798) to node [Vdhdo]; online=yes, lastRecv=5s
янв 30 20:55:30 user-pi tailscaled[884]: dns: error: Post "http://100.127.227.19:38798/dns-query": dial tcp 100.127.227.19:38798: connect: no route to host
янв 30 20:55:31 user-pi tailscaled[884]: Accept: TCP{100.102.69.110:46330 > 100.127.227.19:38798} 60 ok out
янв 30 20:55:36 user-pi tailscaled[884]: dns: error: Post "http://100.127.227.19:38798/dns-query": dial tcp 100.127.227.19:38798: connect: no route to host
янв 30 20:55:36 user-pi tailscaled[884]: open-conn-track: timeout opening (TCP 100.102.69.110:46330 => 100.127.227.19:38798) to node [Vdhdo]; online=yes, lastRecv=10s
янв 30 20:55:41 user-pi tailscaled[884]: open-conn-track: timeout opening (TCP 100.102.69.110:46332 => 100.127.227.19:38798) to node [Vdhdo]; online=yes, lastRecv=5s
янв 30 20:55:41 user-pi tailscaled[884]: dns: error: Post "http://100.127.227.19:38798/dns-query": dial tcp 100.127.227.19:38798: connect: no route to host
янв 30 20:55:41 user-pi tailscaled[884]: Accept: TCP{100.102.69.110:46334 > 100.127.227.19:38798} 60 ok out
янв 30 20:55:46 user-pi tailscaled[884]: open-conn-track: timeout opening (TCP 100.102.69.110:46334 => 100.127.227.19:38798) to node [Vdhdo]; online=yes, lastRecv=10s
янв 30 20:55:46 user-pi tailscaled[884]: dns: error: Post "http://100.127.227.19:38798/dns-query": dial tcp 100.127.227.19:38798: connect: no route to host
янв 30 20:55:51 user-pi tailscaled[884]: dns: error: Post "http://100.127.227.19:38798/dns-query": dial tcp 100.127.227.19:38798: connect: no route to host
янв 30 20:55:51 user-pi tailscaled[884]: open-conn-track: timeout opening (TCP 100.102.69.110:46336 => 100.127.227.19:38798) to node [Vdhdo]; online=yes, lastRecv=5s
янв 30 20:55:52 user-pi tailscaled[884]: Accept: TCP{100.102.69.110:46338 > 100.127.227.19:38798} 60 ok out
янв 30 20:55:52 user-pi tailscaled[884]: portmapper: saw UPnP type WANIPConnection1 at http://192.168.88.1:2828/gateway.xml; MikroTik Router (MikroTik)

Strangely, pinging that ip address works just fine:

user@user-pi:~$ ping 100.127.227.19
PING 100.127.227.19 (100.127.227.19) 56(84) bytes of data.
64 bytes from 100.127.227.19: icmp_seq=1 ttl=64 time=97.5 ms
64 bytes from 100.127.227.19: icmp_seq=2 ttl=64 time=97.6 ms
^C
--- 100.127.227.19 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 97.545/97.578/97.611/0.033 ms

Trace also works:

user@user-pi:~$ tracepath 100.127.227.19
 1?: [LOCALHOST]                      pmtu 1280
 1:  vpn.***.beta.tailscale.net          98.382ms !H
 1:  vpn.***.beta.tailscale.net          98.104ms !H
     Resume: pmtu 1280

However, telnet shows:

user@user-pi:~$ telnet  100.127.227.19  38798
Trying 100.127.227.19...
telnet: Unable to connect to remote host: No route to host

Never encountered anything like that, honestly. No route to host for telnet and pinging just fine, what?

@cepera-ang
Copy link
Author

There is another interesting piece of information in logs:

янв 30 21:19:23 user-pi tailscaled[884]: dns: error: Post "http://100.127.227.19:38798/dns-query": dial tcp 100.127.227.19:38798: connect: no route to host

янв 30 21:19:23 user-pi tailscaled[884]: open-conn-track: timeout opening (TCP 100.102.69.110:46430 => 100.127.227.19:38798); target node [Vdhdo] in netmap but unknown to wireguard

@bradfitz bradfitz added dns exit-node Exit node related labels Jan 30, 2022
@bradfitz
Copy link
Member

in netmap but unknown to wireguard is definitely weird. Also in logs:

2022-01-30 17:33:42.0712399 +0800 +0800: IPv4 packet with disallowed source address from [Vdhdo]
2022-01-30 17:33:53.0831781 +0800 +0800: IPv4 packet with disallowed source address from [Vdhdo]
2022-01-30 17:34:16.3743425 +0800 +0800: IPv4 packet with disallowed source address from [Vdhdo]

Hopefully I'll find time to investigate soon.

bradfitz added a commit that referenced this issue Jan 31, 2022
We're finding a bunch of host operating systems/firewalls interact poorly
with peerapi. We either get ICMP errors from the host or users need to run
commands to allow the peerapi port:

#3842 (comment)

... even though the peerapi should be an internal implementation detail.

Rather than fight the host OS & firewalls, this change handles the
server side of peerapi entirely in netstack (except on iOS), so it
never makes its way to the host OS where it might be messed with. Two
main downsides are:

1) netstack isn't as fast, but we don't really need speed for peerapi.
   And actually, with fewer trips to/from the kernel, we might
   actually make up for some of the netstack performance loss by
   staying in userspace.

2) tcpdump / Wireshark etc packet captures will no longer see the peerapi
   traffic. Oh well. Crawshaw's been wanting to add packet capture server
   support to tailscaled, so we'll probably do that sooner now.

A future change might also then use peerapi for the client-side
(except on iOS).

Updates #3842 (probably fixes, as well as many exit node issues I bet)

Change-Id: Ibc25edbb895dc083d1f07bd3cab614134705aa39
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
@bradfitz
Copy link
Member

@JayWStapleton was investigating https://forum.tailscale.com/t/exit-node-on-oracle-oci/1662/7 and could reproduce the connect: no route to host.

It turned out to be ICMP errors from the exit node:

16:01:26.301186 IP 100.79.194.93 > 100.81.38.34: ICMP host 100.79.194.93 unreachable - admin prohibited, length 68
16:01:43.916036 IP 100.79.194.93 > 100.81.38.34: ICMP host 100.79.194.93 unreachable - admin prohibited, length 68
16:01:44.411897 IP 100.79.194.93 > 100.81.38.34: ICMP host 100.79.194.93 unreachable - admin prohibited, length 68

We can just handle peerapi entirely in netstack, though: I sent #3851.

bradfitz added a commit that referenced this issue Jan 31, 2022
We're finding a bunch of host operating systems/firewalls interact poorly
with peerapi. We either get ICMP errors from the host or users need to run
commands to allow the peerapi port:

#3842 (comment)

... even though the peerapi should be an internal implementation detail.

Rather than fight the host OS & firewalls, this change handles the
server side of peerapi entirely in netstack (except on iOS), so it
never makes its way to the host OS where it might be messed with. Two
main downsides are:

1) netstack isn't as fast, but we don't really need speed for peerapi.
   And actually, with fewer trips to/from the kernel, we might
   actually make up for some of the netstack performance loss by
   staying in userspace.

2) tcpdump / Wireshark etc packet captures will no longer see the peerapi
   traffic. Oh well. Crawshaw's been wanting to add packet capture server
   support to tailscaled, so we'll probably do that sooner now.

A future change might also then use peerapi for the client-side
(except on iOS).

Updates #3842 (probably fixes, as well as many exit node issues I bet)

Change-Id: Ibc25edbb895dc083d1f07bd3cab614134705aa39
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
@cepera-ang
Copy link
Author

Great, I use Ubuntu on OCI too.

bradfitz added a commit that referenced this issue Jan 31, 2022
We're finding a bunch of host operating systems/firewalls interact poorly
with peerapi. We either get ICMP errors from the host or users need to run
commands to allow the peerapi port:

#3842 (comment)

... even though the peerapi should be an internal implementation detail.

Rather than fight the host OS & firewalls, this change handles the
server side of peerapi entirely in netstack (except on iOS), so it
never makes its way to the host OS where it might be messed with. Two
main downsides are:

1) netstack isn't as fast, but we don't really need speed for peerapi.
   And actually, with fewer trips to/from the kernel, we might
   actually make up for some of the netstack performance loss by
   staying in userspace.

2) tcpdump / Wireshark etc packet captures will no longer see the peerapi
   traffic. Oh well. Crawshaw's been wanting to add packet capture server
   support to tailscaled, so we'll probably do that sooner now.

A future change might also then use peerapi for the client-side
(except on iOS).

Updates #3842 (probably fixes, as well as many exit node issues I bet)

Change-Id: Ibc25edbb895dc083d1f07bd3cab614134705aa39
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
@bradfitz
Copy link
Member

Okay, the fix is in the latest unstable build, in version 1.21.43 or later.

Can somebody try it out? @cepera-ang?

We're not sure yet whether we'll backport it to the 1.20.x branch yet. First we want to see how many people's problems it fixes.

@cepera-ang
Copy link
Author

Yep, quick testing show that it's working now (updated only the exit node)

@mrzv
Copy link

mrzv commented Jan 31, 2022

Same here. Updating to that unstable build on the exit node fixed the problem.

@bradfitz
Copy link
Member

@mrzv, which OS was your exit node before?

And @Murgeye, once you update to that build, you won't need your ufw firewall updates, as we no longer even give the host operating system a chance to see this traffic, so it can't be blocked.

@mrzv
Copy link

mrzv commented Jan 31, 2022

Linux (ArchLinux to be precise)

@Murgeye
Copy link

Murgeye commented Feb 2, 2022

Can confirm that the ufw rules are unnecessary in the current unstable build.

bradfitz added a commit that referenced this issue Feb 4, 2022
We're finding a bunch of host operating systems/firewalls interact poorly
with peerapi. We either get ICMP errors from the host or users need to run
commands to allow the peerapi port:

#3842 (comment)

... even though the peerapi should be an internal implementation detail.

Rather than fight the host OS & firewalls, this change handles the
server side of peerapi entirely in netstack (except on iOS), so it
never makes its way to the host OS where it might be messed with. Two
main downsides are:

1) netstack isn't as fast, but we don't really need speed for peerapi.
   And actually, with fewer trips to/from the kernel, we might
   actually make up for some of the netstack performance loss by
   staying in userspace.

2) tcpdump / Wireshark etc packet captures will no longer see the peerapi
   traffic. Oh well. Crawshaw's been wanting to add packet capture server
   support to tailscaled, so we'll probably do that sooner now.

   A future change might also then use peerapi for the client-side
   (except on iOS).

Updates #3842 (probably fixes, as well as many exit node issues I bet)

Change-Id: Ibc25edbb895dc083d1f07bd3cab614134705aa39
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
(cherry picked from commit bd90781) + edits
(and cherry picked part of commit f3c0023)
bradfitz added a commit that referenced this issue Feb 4, 2022
We're finding a bunch of host operating systems/firewalls interact poorly
with peerapi. We either get ICMP errors from the host or users need to run
commands to allow the peerapi port:

#3842 (comment)

... even though the peerapi should be an internal implementation detail.

Rather than fight the host OS & firewalls, this change handles the
server side of peerapi entirely in netstack (except on iOS), so it
never makes its way to the host OS where it might be messed with. Two
main downsides are:

1) netstack isn't as fast, but we don't really need speed for peerapi.
   And actually, with fewer trips to/from the kernel, we might
   actually make up for some of the netstack performance loss by
   staying in userspace.

2) tcpdump / Wireshark etc packet captures will no longer see the peerapi
   traffic. Oh well. Crawshaw's been wanting to add packet capture server
   support to tailscaled, so we'll probably do that sooner now.

   A future change might also then use peerapi for the client-side
   (except on iOS).

Updates #3842 (probably fixes, as well as many exit node issues I bet)

Change-Id: Ibc25edbb895dc083d1f07bd3cab614134705aa39
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
(cherry picked from commit bd90781) + edits
(and cherry picked part of commit f3c0023)
@DentonGentry DentonGentry added L2 Few Likelihood P2 Aggravating Priority level T5 Usability Issue type and removed needs-triage labels Feb 6, 2022
bradfitz added a commit that referenced this issue Feb 7, 2022
We're finding a bunch of host operating systems/firewalls interact poorly
with peerapi. We either get ICMP errors from the host or users need to run
commands to allow the peerapi port:

#3842 (comment)

... even though the peerapi should be an internal implementation detail.

Rather than fight the host OS & firewalls, this change handles the
server side of peerapi entirely in netstack (except on iOS), so it
never makes its way to the host OS where it might be messed with. Two
main downsides are:

1) netstack isn't as fast, but we don't really need speed for peerapi.
   And actually, with fewer trips to/from the kernel, we might
   actually make up for some of the netstack performance loss by
   staying in userspace.

2) tcpdump / Wireshark etc packet captures will no longer see the peerapi
   traffic. Oh well. Crawshaw's been wanting to add packet capture server
   support to tailscaled, so we'll probably do that sooner now.

   A future change might also then use peerapi for the client-side
   (except on iOS).

Updates #3842 (probably fixes, as well as many exit node issues I bet)

Change-Id: Ibc25edbb895dc083d1f07bd3cab614134705aa39
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
(cherry picked from commit bd90781) + edits
(and cherry picked part of commit f3c0023)
@cepera-ang
Copy link
Author

I have another connectivity issue. I tried to install pi-hole to my vpn server and I unable to connect to its web-interface, nor to tcp DNS resolver. Interestingly, I can connect to ssh and tailscaled simple web-server. I tried from windows and linux machines, same story.

user@user-pi:~$ curl vpn
curl: (7) Failed to connect to vpn port 80: No route to host
user@user-pi:~$ curl vpn:38798
<html>
<meta name="viewport" content="width=device-width, initial-scale=1">
<body>
<h1>Hello, Sergey Mushinskiy (100.102.69.110)</h1>
This is my Tailscale device. Your device is pi.
<p>You are the owner of this node.
user@user-pi:~$

Logs on client BUG-eafc8fc920530480e584258f1a87593e4f3a50ef100996de93f63d2df5d6a318-20220213143122Z-e9516f2deb31fbe1:

фев 13 22:27:48 user-pi tailscaled[2556]: open-conn-track: timeout opening (TCP 100.102.69.110:35120 => 100.127.227.19:80) to node [Vdhdo]; online=yes, lastRecv=5s

Log on the server BUG-497beb7817e8ae7ece3867e0ee1b898b86d2248109879b958ca2faeb2d1d1d82-20220213143146Z-a23a7c7bfafcd8ea

Feb 13 14:27:43 vpn tailscaled[1114]: Accept: TCP{100.102.69.110:35120 > 100.127.227.19:80} 60 tcp ok

@DentonGentry
Copy link
Contributor

@cepera-ang I moved that last comment into a new issue, tailscale/tailscale-www#975

@DentonGentry
Copy link
Contributor

Fixed in 1.22

@kim0
Copy link

kim0 commented Sep 11, 2022

It seems I'm see'ing this return. I'm on v1.30.0 MacOS.

@DentonGentry
Copy link
Contributor

Please open a new bug, with details. It is unlikely you are experiencing the same root cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug dns exit-node Exit node related L2 Few Likelihood P2 Aggravating Priority level T5 Usability Issue type
Projects
None yet
Development

No branches or pull requests

6 participants