Efficient Kubernetes API Updates using Informers #75

rjackson90 · 2019-09-30T16:26:55Z

We noticed our AWS bill was higher than expected over a month ago, and when we looked into it we saw that the config reloader was making a lot of requests to our Kubernetes API server. These requests were causing a lot of cross-AZ traffic, both on their own and from our api servers to our etcd nodes in different AZes. Turning down the update frequency for the config reloader from 5s to 10m ameliorated the immediate monetary problem, at the cost of introducing significant lag in our config updates.

I took a look at the source code for the config reloader, and saw that there was an opportunity to significantly reduce the number of requests being made to the Kubernetes API server by making use of Informers, a facility shipped with the Kubernetes Go client SDK to efficiently track updates to API resources. I've been running this updated implementation in our production kubernetes cluster for several weeks and it's been working really well. There was an immediate reduction in API request volume from the config reloader as soon as I switched over to our fork. We were able to turn back up the update interval to 5 seconds with no impact on the number of requests being made of the kubernetes API server. We don't have very many compute nodes, 20-30 at any given time, but I imagine this would be a pretty serious scalability issue for larger clusters or other people like us who are running on AWS and have to worry about the costs of cross-AZ traffic between HA cluster components.

I've left more detailed explanations of the "why" and "how" in my commit messages and as comments in the source. I've been running this fork for a few weeks and it's been well behaved, however we only ever used the MultiMap datasource so I can't really vouch for compatibility with the behavior of the default kubernetes datasource. I don't think it would be difficult to make this new datasource compatible enough with both existing datasources that they could be removed entirely. Indeed, one of my commits does precisely that, but I'm open to leaving them in if there are incompatibilities discovered in the new datasource.

vmwclabot · 2019-09-30T16:26:57Z

@rjackson90, you must sign every commit in this pull request acknowledging our Developer Certificate of Origin before your changes are merged. This can be done by adding Signed-off-by: John Doe <john.doe@email.org> to the last line of each Git commit message. The e-mail address used to sign must match the e-mail address of the Git author. Click here to view the Developer Certificate of Origin agreement.

Commit b419075 must be signed by richard.jackson@55places.com
Commit 02c8146 must be signed by richard.jackson@55places.com
Commit 137c264 must be signed by richard.jackson@55places.com

vmwclabot · 2019-09-30T16:40:10Z

@rjackson90, you must sign every commit in this pull request acknowledging our Developer Certificate of Origin before your changes are merged. This can be done by adding Signed-off-by: John Doe <john.doe@email.org> to the last line of each Git commit message. The e-mail address used to sign must match the e-mail address of the Git author. Click here to view the Developer Certificate of Origin agreement.

Commit b104666 must be signed by richard.jackson@55places.com
Commit d0e3b9f must be signed by richard.jackson@55places.com
Commit 05b8db0 must be signed by richard.jackson@55places.com

Within the Go API client, the Kubernetes project has created tools for keeping local references to API resources up to date with the server state. By creating an Informer, the kubernetes API client can perform an intial sync of objects we are interested in and then watch for updates to those objects instead of polling the server on a regular interval. Changing the access pattern of the config reloader is important for scaling kubernetes clusters. If the config reloader is set to poll for updates on a short interval (such as 5 seconds) it can generate quite a significant amount of traffic on the API server even in relatively modest clusters composed of a couple dozen compute nodes. This increase in request volume does not provide any value to the config reloader, and is a potentially expensive side-effect on AWS if some nodes are in a different availability zone from the kubernetes API server(s). By creating this new Datasource, the config reloader can be configured with an arbitrarily short update interval without having to worry about drowning the kubernetes API server in a flood of pointless requests. The Informer datasource maintains watches on important API objects in a separate goroutine, and will make changes available to configreloader updates very quickly, usually on the update immediately following the change in the upstream API. The new Datasource has been designed to be able to replace both existing Kubernetes Datasources, but has only been tested against the behavior of the Kubernetes Multimap Datasource. Signed-off-by: Richard Jackson <richard.jackson@55places.com>

Update the controller to only use the new Informer-based datasource to replace both existing kubernetes datasource implementations. Note: This change has been tested in multiple production-quality clusters for several weeks, but only in a configuration that formerly used the Multimap datasource. Only cursory testing has been done to verify that the new datasource is compatible with the default Kubernetes datasource. Signed-off-by: Richard Jackson <richard.jackson@55places.com>

The new Informer-based datasource for Kubernetes has clear benefits in terms of scalability, and should be compatible with the behavior of both existing datasources. Signed-off-by: Richard Jackson <richard.jackson@55places.com>

rjackson90 · 2019-10-29T17:18:15Z

Hi @jvassev do you have any questions about this proposed change that I can answer? Is there anything I can do to make review easier?

jvassev · 2019-11-10T07:02:10Z

Hi @rjackson90, this looks like a great improvement! Since you've tested mainly against the multimap datasource, don't you think it is safer to just change how the multimap is handled and leave the default datasource path untouched?

jvassev · 2019-11-15T12:16:03Z

On second thought, let's merge it as is because it is definitely a huge improvement. Thanks for making it happen!
If there is a regression the revert for the default datasource would be trivial.

rjackson90 · 2019-11-15T16:04:22Z

I'm happy to help, if any problems come up an immediate revert is wise. I was torn on whether or not to submit the PR with the default datasource modified, but landed in roughly the same place you did where it seemed sub-optimal for users to experience significantly different API call efficiency based on whether they selected a relatively minor configuration change.

vmwclabot added the dco-required label Sep 30, 2019

rjackson90 force-pushed the feature/efficient-kubernetes-datasource-updates branch from 137c264 to 05b8db0 Compare September 30, 2019 16:40

Richard Jackson added 3 commits September 30, 2019 11:43

Remove unused datasources

7f9d9be

The new Informer-based datasource for Kubernetes has clear benefits in terms of scalability, and should be compatible with the behavior of both existing datasources. Signed-off-by: Richard Jackson <richard.jackson@55places.com>

rjackson90 force-pushed the feature/efficient-kubernetes-datasource-updates branch from 05b8db0 to 7f9d9be Compare September 30, 2019 16:44

vmwclabot removed the dco-required label Sep 30, 2019

jvassev merged commit 8ade99e into vmware:master Nov 15, 2019

tommasopozzetti mentioned this pull request Apr 9, 2020

Introduce 'crd' datasource to exploit custom k8s resources #111

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient Kubernetes API Updates using Informers #75

Efficient Kubernetes API Updates using Informers #75

rjackson90 commented Sep 30, 2019

vmwclabot commented Sep 30, 2019

vmwclabot commented Sep 30, 2019

rjackson90 commented Oct 29, 2019

jvassev commented Nov 10, 2019

jvassev commented Nov 15, 2019

rjackson90 commented Nov 15, 2019

Efficient Kubernetes API Updates using Informers #75

Efficient Kubernetes API Updates using Informers #75

Conversation

rjackson90 commented Sep 30, 2019

vmwclabot commented Sep 30, 2019

vmwclabot commented Sep 30, 2019

rjackson90 commented Oct 29, 2019

jvassev commented Nov 10, 2019

jvassev commented Nov 15, 2019

rjackson90 commented Nov 15, 2019