New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

confusing dns behavior #192

Closed
rhs opened this Issue Jun 22, 2017 · 6 comments

Comments

5 participants
@rhs
Contributor

rhs commented Jun 22, 2017

Using telepresence to proxy my local shell into a cluster was awesome, but it took me a few minutes to figure out that it was actually working. This was because when I typed 'host ', the output is confusing since it reports the correct ip, but the fully qualified hostname reflects the laptop's local configuration and hence references something that doesn't actually exist:

[rhs@venture hello-kubernetes]$ telepresence -m vpn-tcp -n foo
Starting proxy...
[sudo] password for rhs: 
@gke|bash-4.3$ host users
users.wework.com has address 10.91.248.50
users.wework.com has address 10.91.248.50
@gke|bash-4.3$ 
@itamarst

This comment has been minimized.

Contributor

itamarst commented Jun 22, 2017

Might be able to return a "canonical DNS name" or something in DNS results? I don't know enough about DNS to be sure.

@itamarst

This comment has been minimized.

Contributor

itamarst commented Jul 21, 2017

Actually, if a DNS client library first does users.wework.com and then if that fails does users, maybe just returning NXDOMAIN on users.wework.com (i.e. domain lookups suffixed with the DHCP-induced suffix) will fix this?

@plombardi89 plombardi89 added this to UX in Roadmap Feb 20, 2018

@rhs rhs added this to Enhancement in Buckets Mar 8, 2018

@plombardi89

This comment has been minimized.

Contributor

plombardi89 commented Mar 13, 2018

@exarkun do you have any insight into this?

@exarkun

This comment has been minimized.

Contributor

exarkun commented Mar 14, 2018

It sounds like, in the original report, there was a legitimate users name that is resolvable in the cluster context. It also sounds like the client machine was configured (resolv.conf, presumably) with a dns search suffix of wework.com.

So the behavior was likely that:

  • host users issued a DNS query for users.wework.com which was received by the Telepresence DNS proxy
  • Telepresence DNS proxy stripped wework.com as a recognized search suffix and then did an upstream query for users
  • Kubernetes (or something?) responded with the 10.91.... addresses
  • Telepresence DNS proxy responded to the original query with the 10.91.... addresses
    • At this point, I wonder about the specific contents of the response...
    • Ah great, I can reproduce this - not with users but with kubernetes1
  • The response is for the name kubernetes.wework.com2

I would guess that if we rewrite the name in the response to the suffix-stripped version of the name, host might display the suffix-stripped version. Offhand I'm not sure if this behavior would be a gross violation of the DNS spec. It's possible it might confuse some software. On the other hand, if gethostbyname and getaddrinfo can deal with it, it's probably fine for >99% of cases. It should be a pretty simple experiment to find out if it makes host happy - but I don't believe host uses these standard resolution functions, instead preferring to manage its own DNS interactions. But it should be trivial to extend the experiment to direct calls of gethostbyname and getaddrinfo to see what happens.

I expect the thing to do is to modify _handle_search_suffix in k8s-proxy/resolver.py so that it attaches a callback to the self.query Deferred. The callback should just rewrite the name in the resulting Message and pass it along.

1

(virtualenv) exarkun@baryon:~$ telepresence -m vpn-tcp -n foo
Starting proxy with method 'vpn-tcp', which has the following limitations: All processes are affected, only one telepresence can run per machine, and you can't use other VPNs. You may need to add cloud hosts with --also-proxy. For a full list of method limitations see https://telepresence.io/reference/methods.html
Volumes are rooted at $TELEPRESENCE_ROOT. See https://telepresence.io/howto/volumes.html for details.

No traffic is being forwarded from the remote Deployment to your local machine. You can use the --expose option to specify which ports you want to forward.

Guessing that Services IP range is 10.3.240.0/20. Services started after this point will be inaccessible if are outside this range; restart telepresence if you can't access a new Service.

@gke_datawireio_us-central1-a_telepresence-testing|(virtualenv) exarkun@baryon:~$ grep search /etc/resolv.conf
search wework.com
@gke_datawireio_us-central1-a_telepresence-testing|(virtualenv) exarkun@baryon:~$ host kubernetes
kubernetes.wework.com has address 10.3.240.1
kubernetes.wework.com has address 10.3.240.1
@gke_datawireio_us-central1-a_telepresence-testing|(virtualenv) exarkun@baryon:~$ 

2

$ dig kubernetes.wework.com

; <<>> DiG 9.10.3-P4-Ubuntu <<>> kubernetes.wework.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33488
;; flags: qr ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;kubernetes.wework.com.		IN	A

;; ANSWER SECTION:
kubernetes.wework.com.	0	IN	A	10.3.240.1

;; Query time: 141 msec
;; SERVER: 10.0.0.1#53(10.0.0.1)
;; WHEN: Wed Mar 14 08:40:55 EDT 2018
;; MSG SIZE  rcvd: 55
@ark3

This comment has been minimized.

Contributor

ark3 commented Mar 14, 2018

What do you think of

if a DNS client library first does users.wework.com and then if that fails, does users, maybe just returning NXDOMAIN on users.wework.com (i.e. domain lookups suffixed with the DHCP-induced suffix) will fix this?

In other words, can we avoid all this suffix detection and stripping stuff entirely? Normal split-tunnel VPNs rely on the outside world returning NXDOMAIN to prompt a query into the VPN DNS; we might be able to get away with the reverse.

@exarkun

This comment has been minimized.

Contributor

exarkun commented Mar 14, 2018

Possibly. I don't have any first-hand experience with split-tunnel VPNs. I'm not sure if there are other drawbacks that come with that (applications dealing with the NXDOMAIN poorly? But if getaddrinfo etc deal with it then it's probably fine).

Seems like a similarly easy experiment we could run, at least.

An entirely different solution would be to make a new mount namespace for the Telepresence context and bind-mount a better resolv.conf into it - but I suppose this would be Linux only and so an incomplete solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment