Skip to content

Conversation

@ddtmachado
Copy link
Contributor

@ddtmachado ddtmachado commented Sep 29, 2020

What does this PR do?

Fix the service name resolution to IP addresses for TCP services referenced by hostname/DNS.

Fixes #7167 #7354

Motivation

Currently, for TCP routers the service name resolution is done during the router setup (Proxy object) and never updated again, in this case, any changes to the actual addresses behind that hostname will have no effect unless the Traefik instance is restarted.

Moving the resolution to the connection handling allows for a fast reaction upon address changes, every connection to the server will trigger a lookup of the hostname.

Pros of this approach:

  • Fast reaction to changes in addresses (per connection)
  • Not dependent on pooling (either by the provider config or independent check)
  • Should work for all providers (tested with File and Kubernetes)

Cons:

  • Adds an address lookup per connection

More

  • Added/updated tests
  • Added/updated documentation

Additional Notes

I didn't notice any meaningful impact in latency while testing (basic checks with ncat), compared with v2.2, but a more intensive performance check with thousands of short-lived connections could be done if needed.

@kopaygorodsky
Copy link

kopaygorodsky commented Oct 3, 2020

in #7167 traefik updates IP when config is reloaded, it just ignores some of k8s Endpoints events. It explained here. The problem will disappear once reloading of the config in k8s provider is fixed and no need to do address lookup per request.

@ddtmachado
Copy link
Contributor Author

@kopaygorodsky yes, but a config reload is not a solution when working with a DNS name external service, as kubernetes will never trigger a config update since the object never changes (service definition without endpoints). In this case you would have to rely on restarting Traefik or deleting and creating the router again to update the actual addresses.

@kopaygorodsky
Copy link

kopaygorodsky commented Oct 6, 2020

I thought we can watch for Endpoints resource, can't we? It has the same name as the external service. Update event is fired every time IP list is changed.

@ddtmachado
Copy link
Contributor Author

@kopaygorodsky an external name service generates no endpoints in kubernetes. You can create them manually but it would be ignored by Traefik in this case anyway.

For external name services Traefik resolves the externalName key to an actual TCP address that will be stored at the router service servers list. That was done only during the router setup and this PR changes it to the router's proxy server inside Traefik responsible for handling requests, thus resolving the name for each new request.

TCP based services will usually connect once and keep the connection until the work is done, like databases for example, and for that reason I guess it won't have a negative impact. Even for multiple distinct requests I did not see a noticeable increase in latency, but I would like validation from more community members or senior maintainers.

@kopaygorodsky
Copy link

oh, I got where I was wrong. Thanks a lot for answering :)

@ddtmachado ddtmachado changed the title Fix service name lookup on TCP routers Improve service name lookup on TCP routers Oct 13, 2020
@jspdown
Copy link
Contributor

jspdown commented Oct 28, 2020

@ddtmachado Thanks for your PR, It looks promising!

I wonder if it wouldn't be a more understandable behavior to not have a default value for addrLookupCache. If the user don't specify the addrLookupCache option, it would just resolve the address for every connection. If this option gets configured, the users chose its value and is aware of the side effects. As it's an optimization, it doesn't have to be enabled by default.

Also, what do you think about changing the addrLookupCache into a duration? If there's no default value it doesn't have to be a pointer to an int and we don't need to allow negative values.

@jspdown
Copy link
Contributor

jspdown commented Oct 30, 2020

@ddtmachado After discussing with the team, we think having a cache adds more complexity than it helps. Right now the current implementation has some concurrency issues around the p.target and p.resolvedAt. Fixing concurrency while still avoiding simultaneous resolves and connection bottle neck is doable but would make the code even more complex. The only cases where you want to use this resolution is when using the file provider or with ExternalName services on Kubernetes. As it is an edge case, not caching resolutions is less of a problem.

What do you think about doing the resolution only if the given address is a service name? The most common use case won't be impacted. If users want to avoid stressing too much there DNS server it is still possible to rely on a local DNS cache using Kubernetes nodelocaldns or more generally with dnsmasq.

@ddtmachado
Copy link
Contributor Author

@jspdown yes, it is not synced but it does not cause any race condition as well, in the worst case scenario it will trigger more lookups than necessary. Being so I agree that the complexity to add synchronicity is not worth in this case but I wouldn't mind keeping it like that.

What do you think about doing the resolution only if the given address is a service name?

I wouldn't mind doing this instead, but then I would prefer to do it outside the Proxy to properly resolve the name into multiple ip addresses and avoid relying on the DNS round robin to load balance, wdyt?

Douglas De Toni Machado and others added 5 commits November 12, 2020 10:05
This reverts commit 16924c56126e7c7ca7a960ee5c631ef5f85ded68.
This reverts commit 89cdf6f6fbcc3a07cd28204020617d6626acaa87.
Copy link
Contributor

@jspdown jspdown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Copy link
Collaborator

@jbdoumenjou jbdoumenjou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Copy link
Member

@kevinpollet kevinpollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kopaygorodsky
Copy link

woohoo! Thx for your effort!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants