Introducing kubernetes ratelimit queue for configuration reloading #3147

yue9944882 · 2018-04-07T06:47:56Z

Do you want to request a feature or report a bug?

Feature Request

What did you expect to see?

After #2848, we have unlocked some newer utilities of kubernetes client-go. I think it's the time to introduce k8s.io/client-go/util/workqueue to traefik.

This package provides a queue with rate limit and we can use it to prevent some ramification and overhead e.g. when the ingress or other resources is changing frequently. This will be helpful for large-scale deployment in kubernetes.

Also we can provide an item to configure the limit rate in the [kubernetes] section of the toml file and default the ratelimit to 10s-ish (not sure about this number).

What did you see instead?

Currently every small changes in kubernetes will trigger a configuration reload instantly.

The text was updated successfully, but these errors were encountered:

timoreimann · 2018-04-09T21:20:43Z

Note that Traefik already does some form of throttling after a configuration has been rendered: The ProvidersThrottleDuration parameter controls how long to delay a configuration update to make sure that we apply a new configuration at most every configured time units. The default value is 2 seconds.

Throttling even further at the provider level would move the rate limiting closer to the source of the event. I'm not entirely sure how much of a win that would be though given that the provider work happens all in-memory; I suppose it largely depends on the number of objects being managed by Traefik as you mentioned as well.

To my understanding, there's another use case for the work queue, which is to control the rate by which we should try re-processing events that we could not apply. This would be helpful to replace the static resync period that we have right now by a more reasonable exponential backoff pattern.

WDYT?

yue9944882 · 2018-04-10T17:27:37Z

This would be helpful to replace the static resync period that we have right now by a more reasonable exponential backoff pattern.

@timoreimann Umm I am sort of confused about the usage of exponential backoff here.
Surely the backoff could prevent traefik from doing some useless re-process, OTOH it will also delay the correct event we need when it arrives. So how could we clearify the possible correct event?
Alternatively, for prevent noisy k8s event flooding, we can set a max backoff delay for the event queue so that too many noisy events will not delay the correct event for too long. In this way, maybe we could use ItemExponentialFailureRateLimiter from client-go/utils/workqueue

timoreimann · 2018-04-12T16:47:59Z

My experience with the work queue in client-go is still fairly limited, so take everything I say or claim with a grain of salt. :-) From what I know, it is possible to put elements back into the queue and revisit them at a periodicity dictated by an exponential factor. The idea is that, if an element cannot be processed for some reason (say, we cannot set up an Ingress because we can't access the secret for RBAC reasons), we want to conserve resources by retrying at longer and longer intervals, hoping that the situation will eventually be remedied.

On second thought, making this work may require changing how the Ingress controller in Traefik operates as we currently treat every event as a signal to process everything again. I suppose we'd have to carefully assess if the change in design is really worth it.

I also assumed that the backoff applied when putting an element back into the queue for failure reasons can be different from the backoff affecting new, incoming events. Ideally, the backoff should be indexable so that offending Ingresses can be backed off while others continue to be processed without any penalty whatsoever. Again, I'm lacking experience on what's possible in this field.

In summary, we may or may not have two separate feature requests hidden under the surface.

Anyways... I'm not opposed to the idea to throttle the rate of incoming events per se. I'd like to make sure though that there are actual users who have measured a negative impact of too frequently changing configurations and would benefit from the proposed feature. If that user is you, @yue9944882, please let me know. :-)

yue9944882 · 2018-04-17T03:25:22Z

I'd like to make sure though that there are actual users who have measured a negative impact of too frequently changing configurations and would benefit from the proposed feature.

@timoreimann Thanks. Maybe we could listen if any user hits this issue. I'm not having any negative impact from this currently. Just theoritically concerning if frequent changes will impact.

timoreimann · 2018-04-27T16:54:19Z

Sure, let's keep our eyes and ears open and come back to this issue when/if we need to.

I'll close the issue if you don't mind until we can observe a real-world business need. Thanks!

ldez added area/provider/k8s/ingress status/0-needs-triage labels Apr 7, 2018

nmengin added kind/enhancement a new or improved feature. priority/P3 maybe and removed status/0-needs-triage labels Apr 9, 2018

timoreimann closed this as completed Apr 27, 2018

traefik locked and limited conversation to collaborators Sep 1, 2019

traefiker added the status/5-frozen-due-to-age label Sep 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introducing kubernetes ratelimit queue for configuration reloading #3147

Introducing kubernetes ratelimit queue for configuration reloading #3147

yue9944882 commented Apr 7, 2018 •

edited by timoreimann

timoreimann commented Apr 9, 2018

yue9944882 commented Apr 10, 2018 •

edited

timoreimann commented Apr 12, 2018

yue9944882 commented Apr 17, 2018 •

edited

timoreimann commented Apr 27, 2018

Introducing kubernetes ratelimit queue for configuration reloading #3147

Introducing kubernetes ratelimit queue for configuration reloading #3147

Comments

yue9944882 commented Apr 7, 2018 • edited by timoreimann

Do you want to request a feature or report a bug?

What did you expect to see?

What did you see instead?

timoreimann commented Apr 9, 2018

yue9944882 commented Apr 10, 2018 • edited

timoreimann commented Apr 12, 2018

yue9944882 commented Apr 17, 2018 • edited

timoreimann commented Apr 27, 2018

yue9944882 commented Apr 7, 2018 •

edited by timoreimann

yue9944882 commented Apr 10, 2018 •

edited

yue9944882 commented Apr 17, 2018 •

edited