-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introducing kubernetes ratelimit queue for configuration reloading #3147
Comments
Note that Traefik already does some form of throttling after a configuration has been rendered: The Throttling even further at the provider level would move the rate limiting closer to the source of the event. I'm not entirely sure how much of a win that would be though given that the provider work happens all in-memory; I suppose it largely depends on the number of objects being managed by Traefik as you mentioned as well. To my understanding, there's another use case for the work queue, which is to control the rate by which we should try re-processing events that we could not apply. This would be helpful to replace the static resync period that we have right now by a more reasonable exponential backoff pattern. WDYT? |
@timoreimann Umm I am sort of confused about the usage of exponential backoff here. |
My experience with the work queue in client-go is still fairly limited, so take everything I say or claim with a grain of salt. :-) From what I know, it is possible to put elements back into the queue and revisit them at a periodicity dictated by an exponential factor. The idea is that, if an element cannot be processed for some reason (say, we cannot set up an Ingress because we can't access the secret for RBAC reasons), we want to conserve resources by retrying at longer and longer intervals, hoping that the situation will eventually be remedied. On second thought, making this work may require changing how the Ingress controller in Traefik operates as we currently treat every event as a signal to process everything again. I suppose we'd have to carefully assess if the change in design is really worth it. I also assumed that the backoff applied when putting an element back into the queue for failure reasons can be different from the backoff affecting new, incoming events. Ideally, the backoff should be indexable so that offending Ingresses can be backed off while others continue to be processed without any penalty whatsoever. Again, I'm lacking experience on what's possible in this field. In summary, we may or may not have two separate feature requests hidden under the surface. Anyways... I'm not opposed to the idea to throttle the rate of incoming events per se. I'd like to make sure though that there are actual users who have measured a negative impact of too frequently changing configurations and would benefit from the proposed feature. If that user is you, @yue9944882, please let me know. :-) |
@timoreimann Thanks. Maybe we could listen if any user hits this issue. I'm not having any negative impact from this currently. Just theoritically concerning if frequent changes will impact. |
Sure, let's keep our eyes and ears open and come back to this issue when/if we need to. I'll close the issue if you don't mind until we can observe a real-world business need. Thanks! |
Do you want to request a feature or report a bug?
Feature Request
What did you expect to see?
After #2848, we have unlocked some newer utilities of kubernetes client-go. I think it's the time to introduce
k8s.io/client-go/util/workqueue
to traefik.This package provides a queue with rate limit and we can use it to prevent some ramification and overhead e.g. when the ingress or other resources is changing frequently. This will be helpful for large-scale deployment in kubernetes.
Also we can provide an item to configure the limit rate in the [kubernetes] section of the toml file and default the ratelimit to 10s-ish (not sure about this number).
What did you see instead?
Currently every small changes in kubernetes will trigger a configuration reload instantly.
The text was updated successfully, but these errors were encountered: