feat: add health track to ratelimit middleware #12042

david-garcia-garcia · 2025-09-05T07:29:01Z

What does this PR do?

In v3.4 a redis backend was introduced for the ratelimit middleware.

This introduces a hard dependency in the critical path: if redis is down or has a transient failure, the whole middleware goes down.

This is not a problem with the in memory backend because the chance of failure when updating the bucket is zero. But not with redis where you have a remote backend and many things can glitch (network, the backend itself, etc.)

This PR introduces three new configuration settings:

backoffDuration
backoffTimeout
backoffThreshold

The ratelimit will stop working (requests will go through without issues) for a period of backoffDuration when backoffThreshold requests have failed in a time window of backoffThreshold.

I considered adding a simple "denyOnError" flag, but from my experience, this would not be good because when the backend is down performance will be severly hit as all requests will try to connect to redis until the timeout is reached.

Related issues:

When not configured, current behaviour is honored, where failure to update the bucket results in the request returning a 500 error.

I had some additional ideas that were finally not implemented to try and keep this focused and as simple as possible:

We might want to limit how long the backend is unhealthy, and then fully shutdown down the middleware when we consider it is totally dead. So current backoff behaviour that lets requests through would be a "transient" shutdown, and then have a second parameter where if the backoff persists for too long, it will truly be shutdown and users receive a 500.
Instead of a fixed backoff, use exponential backoff until reaching backoffDuration

I implemented this at the Ratelimit middleware instead of inside the Redis Limiter itself because I believe that is where the responsibility should be at, i.e. we could have other limiters in the future that could use this, and the purpose is dealing with failures in the limiter component.

Motivation

Ingress components are critical and should be resilient to failures when possible.

More

Added/updated tests
Added/updated documentation

Additional Notes

Fixes #12043

nmengin · 2025-09-08T12:23:00Z

Hello @david-garcia-garcia,
Thank you for your contribution.

We've set the status to "design-review" to allow us to check the PR and ensure there is no deep impact on Traefik before moving forward.

We'll keep you updated once the analysis is done.

feat: add health track to ratelimit middleware

c76deea

traefiker added area/middleware size/L status/0-needs-triage labels Sep 5, 2025

mmatur added kind/enhancement a new or improved feature. status/1-needs-design-review and removed status/0-needs-triage labels Sep 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: add health track to ratelimit middleware #12042

feat: add health track to ratelimit middleware #12042

Uh oh!

david-garcia-garcia commented Sep 5, 2025 •

edited

Loading

Uh oh!

nmengin commented Sep 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

feat: add health track to ratelimit middleware #12042

Are you sure you want to change the base?

feat: add health track to ratelimit middleware #12042

Uh oh!

Conversation

david-garcia-garcia commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

More

Additional Notes

Uh oh!

nmengin commented Sep 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

david-garcia-garcia commented Sep 5, 2025 •

edited

Loading