-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient Kubernetes API Updates using Informers #75
Efficient Kubernetes API Updates using Informers #75
Conversation
@rjackson90, you must sign every commit in this pull request acknowledging our Developer Certificate of Origin before your changes are merged. This can be done by adding
|
137c264
to
05b8db0
Compare
@rjackson90, you must sign every commit in this pull request acknowledging our Developer Certificate of Origin before your changes are merged. This can be done by adding
|
Within the Go API client, the Kubernetes project has created tools for keeping local references to API resources up to date with the server state. By creating an Informer, the kubernetes API client can perform an intial sync of objects we are interested in and then watch for updates to those objects instead of polling the server on a regular interval. Changing the access pattern of the config reloader is important for scaling kubernetes clusters. If the config reloader is set to poll for updates on a short interval (such as 5 seconds) it can generate quite a significant amount of traffic on the API server even in relatively modest clusters composed of a couple dozen compute nodes. This increase in request volume does not provide any value to the config reloader, and is a potentially expensive side-effect on AWS if some nodes are in a different availability zone from the kubernetes API server(s). By creating this new Datasource, the config reloader can be configured with an arbitrarily short update interval without having to worry about drowning the kubernetes API server in a flood of pointless requests. The Informer datasource maintains watches on important API objects in a separate goroutine, and will make changes available to configreloader updates very quickly, usually on the update immediately following the change in the upstream API. The new Datasource has been designed to be able to replace both existing Kubernetes Datasources, but has only been tested against the behavior of the Kubernetes Multimap Datasource. Signed-off-by: Richard Jackson <richard.jackson@55places.com>
Update the controller to only use the new Informer-based datasource to replace both existing kubernetes datasource implementations. Note: This change has been tested in multiple production-quality clusters for several weeks, but only in a configuration that formerly used the Multimap datasource. Only cursory testing has been done to verify that the new datasource is compatible with the default Kubernetes datasource. Signed-off-by: Richard Jackson <richard.jackson@55places.com>
The new Informer-based datasource for Kubernetes has clear benefits in terms of scalability, and should be compatible with the behavior of both existing datasources. Signed-off-by: Richard Jackson <richard.jackson@55places.com>
05b8db0
to
7f9d9be
Compare
Hi @jvassev do you have any questions about this proposed change that I can answer? Is there anything I can do to make review easier? |
Hi @rjackson90, this looks like a great improvement! Since you've tested mainly against the multimap datasource, don't you think it is safer to just change how the multimap is handled and leave the default datasource path untouched? |
On second thought, let's merge it as is because it is definitely a huge improvement. Thanks for making it happen! |
I'm happy to help, if any problems come up an immediate revert is wise. I was torn on whether or not to submit the PR with the default datasource modified, but landed in roughly the same place you did where it seemed sub-optimal for users to experience significantly different API call efficiency based on whether they selected a relatively minor configuration change. |
We noticed our AWS bill was higher than expected over a month ago, and when we looked into it we saw that the config reloader was making a lot of requests to our Kubernetes API server. These requests were causing a lot of cross-AZ traffic, both on their own and from our api servers to our etcd nodes in different AZes. Turning down the update frequency for the config reloader from 5s to 10m ameliorated the immediate monetary problem, at the cost of introducing significant lag in our config updates.
I took a look at the source code for the config reloader, and saw that there was an opportunity to significantly reduce the number of requests being made to the Kubernetes API server by making use of Informers, a facility shipped with the Kubernetes Go client SDK to efficiently track updates to API resources. I've been running this updated implementation in our production kubernetes cluster for several weeks and it's been working really well. There was an immediate reduction in API request volume from the config reloader as soon as I switched over to our fork. We were able to turn back up the update interval to 5 seconds with no impact on the number of requests being made of the kubernetes API server. We don't have very many compute nodes, 20-30 at any given time, but I imagine this would be a pretty serious scalability issue for larger clusters or other people like us who are running on AWS and have to worry about the costs of cross-AZ traffic between HA cluster components.
I've left more detailed explanations of the "why" and "how" in my commit messages and as comments in the source. I've been running this fork for a few weeks and it's been well behaved, however we only ever used the
MultiMap
datasource so I can't really vouch for compatibility with the behavior of the default kubernetes datasource. I don't think it would be difficult to make this new datasource compatible enough with both existing datasources that they could be removed entirely. Indeed, one of my commits does precisely that, but I'm open to leaving them in if there are incompatibilities discovered in the new datasource.