Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Node-local DNS cache support #550

Merged

Conversation

mumoshu
Copy link
Contributor

@mumoshu mumoshu commented Feb 18, 2019

TL;DR; This is the smallest change to allow enabling node-local DNS cache on eksctl-created nodes.

What

Add a new field named clusterDNS that accepts the IP address to the DNS server used for all the internal/external DNS lookups a.k.a the --cluster-dns flag of kubelet.

nodeGroups:
- name: nodegroup1
  clusterDNS: 169.254.20.10
  # snip

This, in combination with k8s-dns-node-cache deployed as a daemonset on your cluster, allows all the DNS lookups from your pods to firstly routed to the node-local DNS server, which adds more reliability.

Notes

The configuration key clusterDNS is intentionally made per-nodegroup, not per-cluster, so that you can selectively use the node-local DNS. It, in combination with the proper use of node labels/taints, allows you to test the node-local DNS in only a subet of your workload.
It would also be nice to add clusterDNS as a cluster-level config key later. But I believe it isn't a must-have in this change.

Usage

See the cluster/addons/dns/nodelocaldns in the upstream repository for more details.

A concrete steps to enable node-local DNS would look like the below:

  • Decide the which IP addr to be used for binding the node-local DNS. Typically this is 169.254.20.10
  • Add clusterDNS: 169.254.20.10 to your nodegroup in the cluster config
  • Deploy nodelocaldns.yaml, replacing:
    __PILLAR__LOCAL__DNS__ with 169.254.169.254, __PILLAR__DNS__DOMAIN__ with cluster.local, __PILLAR__DNS__SERVER__ with 10.100.0.10 or 172.20.0.10 according to your VPC CIDR

Resolves #542

@mumoshu mumoshu force-pushed the node-local-dns-cache-support branch 2 times, most recently from 5eeb11d to cd9341f Compare February 18, 2019 07:05
Add a new field named `clusterDNS` that accepts the IP address to the DNS server used for all the internal/external DNS lookups a.k.a the `--cluster-dns` flag of `kubelet`.

```yaml
nodeGroups:
- name: nodegroup1
  clusterDNS: 169.254.20.10
  # snip
```

This, in combination with `k8s-dns-node-cache` deployed as a daemonset on your cluster, allows all the DNS lookups from your pods to firstly routed to the node-local DNS server, which adds more reliability.

The configuration key `clusterDNS` is intentionally made per-nodegroup, not per-cluster, so that you can selectively use the node-local DNS. It, in combination with the proper use of node labels/taints, allows you to test the node-local DNS in only a subet of your workload.
It would also be nice to add `clusterDNS` as a cluster-level config key later. But I believe it isn't a must-have in this change.

See the [cluster/addons/dns/nodelocaldns](https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/dns/nodelocaldns) in the upstream repository for more details.

A concrete steps to enable node-local DNS would look like the below:

- Decide the which IP addr to be used for binding the node-local DNS. Typically this is `169.254.20.10`
- Add `clusterDNS: 169.254.20.10` to your nodegroup in the cluster config
- Deploy [nodelocaldns.yaml](https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml), replacing:
  `__PILLAR__LOCAL__DNS__` with `169.254.169.254`, `__PILLAR__DNS__DOMAIN__` with `cluster.local`, `__PILLAR__DNS__SERVER__` with [`10.100.0.10` or `172.20.0.10`](https://github.com/weaveworks/eksctl/blob/master/pkg/nodebootstrap/userdata.go#L87-L94) according to your VPC CIDR

Resolves weaveworks#542
@mumoshu mumoshu changed the title feat: Node-local DNS cache support wip: feat: Node-local DNS cache support Feb 18, 2019
@mumoshu
Copy link
Contributor Author

mumoshu commented Feb 18, 2019

For anyone interested, this is how I've verified this to work.

  1. Add your nodegroup clusterDNS: 169.254.20.10
nodeGroups:
  - name: nodegroup1
    clusterDNS: "169.254.20.10"
    instanceType: m4.large
   # and whatever you like
  1. After bringing up your cluster, kubectl run a pod and try resolving any host name from it. I've used an ubuntu container and a yum update, but whatever works.

This should fail, as kubelet points cluster DNS to 169.254.20.10 as we configured in the cluster.yaml, but nothing binds it yet:

  1. Deploy node-local-dns. After the deployment, any DNS lookup against 169.254.20.10 should work, as the node-local-dns binds it!
$ curl -L https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml | sed 's/__PILLAR__DNS__DOMAIN__g/cluster.local/' | sed 's/__PILLAR__LOCAL__DNS__/169.254.20.10/' | sed 's/__PILLAR__DNS__SERVER__/172.20.0.10/' > node-local-dns.yaml

$ k apply -f node-local-dns.yaml
  1. Repeat the step 2, but expect successful DNS lookup this time:
$ kru
If you don't see a command prompt, try pressing enter.
root@xenial-1550475807:/#
root@xenial-1550475807:/# apt-get update -y
Get:1 http://archive.ubuntu.com/ubuntu xenial InRelease [247 kB]
Get:2 http://security.ubuntu.com/ubuntu xenial-security InRelease [109 kB]
Get:3 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages [785 kB]
Get:4 http://archive.ubuntu.com/ubuntu xenial-updates InRelease [109 kB]
Get:5 http://archive.ubuntu.com/ubuntu xenial-backports InRelease [107 kB]
Get:6 http://archive.ubuntu.com/ubuntu xenial/main amd64 Packages [1558 kB]
*snip*

@mumoshu mumoshu changed the title wip: feat: Node-local DNS cache support feat: Node-local DNS cache support Feb 18, 2019
@errordeveloper errordeveloper merged commit fe7d351 into weaveworks:master Feb 20, 2019
@errordeveloper
Copy link
Contributor

Thanks @mumoshu, also thanks for keeping this small.

func clusterDNS(spec *api.ClusterConfig, ng *api.NodeGroup) string {
if ng.ClusterDNS != "" {
return ng.ClusterDNS
}
// Default service network is 10.100.0.0, but it gets set 172.20.0.0 automatically when pod network
// is anywhere within 10.0.0.0/8
if spec.VPC.CIDR != nil && spec.VPC.CIDR.IP[0] == 10 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a future improvement, we should probably move this into struct defaulting code path, and set the field to the default value. That is so the struct fully represents what's going on.

@rsyvarth
Copy link

Just in case anyone else is implementing DNS caching using instructions here, there is a typo in mumoshu's instructions. Replace __PILLAR__LOCAL__DNS__ with 169.254.20.10 not 169.254.169.254

@mumoshu
Copy link
Contributor Author

mumoshu commented Feb 21, 2019

@rsyvarth Oh! Good catch! Seems like I was too used to type 169.254.169.254.

@mumoshu mumoshu deleted the node-local-dns-cache-support branch February 27, 2019 13:10
@mumoshu
Copy link
Contributor Author

mumoshu commented Feb 27, 2019

For anyone interested in this feature - an alternative way to use the node local DNS would be to specify dnsPolicy: None and dnsConfig in your pod spec:

For example, this:

    dnsConfig:
        nameservers:
        - 169.254.20.10
        searches:
        - default.svc.cluster.local
        - svc.cluster.local
        - cluster.local
        - ap-northeast-1.compute.internal
        - us-west-2.compute.internal
        options:
        - name: attempts
          value: "3"
        - name: timeout
          value: "1"
        - name: rotate
      dnsPolicy: None

would result in pods having /etc/resolv.conf whose content is:

$ cat /etc/resolv.conf
nameserver 169.254.20.10
search default.svc.cluster.local svc.cluster.local cluster.local ap-northeast-1.compute.internal us-west-2.compute.internal
options attempts:3 timeout:1 rotate

whereas the default one used when I omit dnsConfig in my region(ap-northeast-1) is:

$ cat /etc/resolv.conf
nameserver 172.20.0.10
search default.svc.cluster.local svc.cluster.local cluster.local ap-northeast-1.compute.internal us-west-2.compute.internal
options ndots:5

The benefit of dnsConfig over the usage of kubelet's --cluster-dns flag would be that you can gradually migrate to the node-local dns cache.

You can even add the default cluster DNS as the secondary DNS server:

    dnsConfig:
        nameservers:
        - 169.254.20.10
        - 172.20.0.10

This way, even while the node-local dns is down due to a rolling update or a transient failure you'll be unlikely to notice the down time because your DNS client(from e.g. glibc) is likely to handle the failure with retries and/or parallel query(I believe it depends on the DNS client you rely on).

@StevenACoffman
Copy link
Contributor

StevenACoffman commented Apr 24, 2019

@mumoshu

Is it possible to add the default cluster DNS as the secondary DNS server for the node this way:

nodeGroups:
  - name: nodegroup1
    clusterDNS: "169.254.20.10,172.20.0.10"
    instanceType: m4.large
   # and whatever you like

Since kubelet options are:

Kubelet Option Description
--cluster-dns stringSlice Comma-separated list of DNS server IP address.

This way, even while the node-local dns is down due to a rolling update or a transient failure you'll be unlikely to notice the down time because your DNS client(from e.g. glibc) is likely to handle the failure with retries and/or parallel query(I believe it depends on the DNS client you rely on).

@StevenACoffman
Copy link
Contributor

Seems like it works! Hurrah!

@StevenACoffman
Copy link
Contributor

StevenACoffman commented Apr 25, 2019

The KEP warns against this for some reason:

Populating both the nodelocal cache ip address and kube-dns ip address in resolv.conf is not a reliable option. Depending on underlying implementation, this can result in kube-dns being queried only if cache ip does not respond, or both queried simultaneously.

If musl queries nodelocal cache via tcp and kube-dns via udp in parallel, and one responds quickly and the other times out, why is that a problem?

@mumoshu
Copy link
Contributor Author

mumoshu commented May 17, 2019

@StevenACoffman I guess you'll be interested on this line in the updated KEP https://github.com/kubernetes/enhancements/pull/1005/files#diff-a43ddcc01ee886cc9ca7c60a0900e436R166

The problem:

so if we use both kube-dns IP as well as the link-local IP used by NodeLocal DNSCache, we could make the DNS query explosion problem worse. More queries means more conntrack entries and more DNATs.

The updated KEP also note:

This workaround could be viable for client implementations that do round-robin.

But when I experimented I saw on recent distros the resolver worked parallely.

So my best bet that would work TODAY is that you have two nodelocal-dns daemonset, each listens on different virtual IP. All the application pods that wants to leverage the H/A nodelocal-dns-cache, the pod spec must be updated to include a dnsConfig section with the two virtual IPs in it.

@mumoshu
Copy link
Contributor Author

mumoshu commented May 17, 2019

I have one more alternative: Just try running two nodelocaldns, both listening on the same virtual IP address and the port. Surveying github issues and the codedns code, it turned out that coredns sets SO_REUSEPORT whereas available.

So two or more nodelocal dns pods with the same IP:Port should just work. This allows you to use H/A nodelocaldns behind a single IP address for your cluster dns(set via a kubelet option), which is automatically used when dnsConfig is inexistent in your pod specs.

@StevenACoffman
Copy link
Contributor

Thank you! That last alternative is very interesting.
By the way, In order to successfully start a fresh cluster with clusterDNS: "169.254.20.10" initially:

  1. eksctl create cluster --config-file=$EKSCONFIGFILE --without-nodegroup
  2. Apply nodelocaldns and any aws-node fixes using kubectl
  3. eksctl create nodegroup --config-file=$EKSCONFIGFILE

If the nodegroups are created before nodelocaldns is applied, they will not come up successfully.

torredil pushed a commit to torredil/eksctl that referenced this pull request May 20, 2022
…-error

Clarify error message when unsupported volume capabilities are provided
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Node-local DNS cache support
4 participants