New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cache: implement the circuit breaker pattern for asynchronous set operations in the cache client #7010
Conversation
8d5d504
to
ec38a52
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it is great work and it makes a lot of sense. Thanks.
We need to update docs.
pkg/cacheutil/memcached_client.go
Outdated
SetAsyncCircuitBreakerOpenDuration: 5 * time.Second, | ||
SetAsyncCircuitBreakerMinRequests: 50, | ||
SetAsyncCircuitBreakerConsecutiveFailures: 5, | ||
SetAsyncCircuitBreakerFailurePercent: 0.05, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need to keep an eye on this to make sure those values sane.
So far, it looks good to me. Would be better to have another pair of eyes on it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update docs with new fields + add a paragraph explaining how this works 😄
… in the cache client Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>
44ca87c
to
c69653b
Compare
Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>
2ad5763
to
7ae1164
Compare
7ae1164
to
808bdc9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
We need to fix E2E test. @damnever https://github.com/thanos-io/thanos/actions/runs/7665969811/job/20908734057?pr=7010 |
1d56ca5
to
b00e6ed
Compare
Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>
bed1dea
to
f76994e
Compare
@damnever There is one more lint to fix before we can merge. Thanks |
Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>
@GiedriusS Wanna take another look before I merge this code change? |
pkg/cacheutil/redis_client.go
Outdated
MaxRequests: config.SetAsyncCircuitBreakerHalfOpenMaxRequests, | ||
Interval: 10 * time.Second, | ||
Timeout: config.SetAsyncCircuitBreakerOpenDuration, | ||
ReadyToTrip: func(counts gobreaker.Counts) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be factored out into another function? I think it's exactly the same in memcached_client
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have separated the entire configuration and logic, including this function. Please take another look.
docs/components/store.md
Outdated
@@ -340,6 +346,12 @@ While the remaining settings are **optional**: | |||
- `max_async_concurrency`: maximum number of concurrent asynchronous operations can occur. | |||
- `max_async_buffer_size`: maximum number of enqueued asynchronous operations allowed. | |||
- `max_get_multi_concurrency`: maximum number of concurrent connections when fetching keys. If set to `0`, the concurrency is unlimited. | |||
- `set_async_circuit_breaker_enabled`: `true` to enable circuite breaker for asynchronous operations. The circuit breaker consists of three states: closed, half-open, and open. It begins in the closed state. When the total requests exceed `set_async_circuit_breaker_min_requests`, and either consecutive failures occur or the failure percentage is excessively high according to the configured values, the circuit breaker transitions to the open state. This results in the rejection of all asynchronous operations. After `set_async_circuit_breaker_open_duration`, the circuit breaker transitions to the half-open state, where it allows `set_async_circuit_breaker_half_open_max_requests` asynchronous operations to be processed in order to test if the conditions have improved. If they have not, the state transitions back to open; if they have, it transitions to the closed state. Following each 10 seconds interval in the closed state, the circuit breaker resets its metrics and repeats this cycle. | |||
- `set_async_circuit_breaker_half_open_max_requests`: maximum number of requests allowed to pass through when the circuit breaker is half-open. If set to 0, the circuit breaker allows only 1 request. | |||
- `set_async_circuit_breaker_open_duration`: the period of the open state after which the state of the circuit breaker becomes half-open. If set to 0, the circuit breaker resets it to 60 seconds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If set to 0, the circuit breaker resets it to 60 seconds.
What does it mean? That the default value of this is 60 seconds
? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, github.com/sony/gobreaker uses this default value.
Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>
E2E test failure
Doesn't seem like related. Ignoring for now |
Hey @GiedriusS, I will merge this pr now. If you find anything else that's worth improving, feel free to create a separate issue and we can address later. |
👍 I am also thinking about whether it makes sense to only protect SET operations through the circuit breaker pattern? I mean if the cache is down then GETs won't work either. |
I think what you said makes sense. @GiedriusS |
I agree; I think GET is more sensitive to latency and should be assigned a higher priority. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So maybe let's remove the set_
prefix from config and put gets under the circuit breaker too? Should be a small change. Ideally this would be done before the next release
Removing the |
…rations in the cache client (thanos-io#7010) * Implement the circuit breaker pattern for asynchronous set operations in the cache client Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com> * Add feature flag for circuitbreaker Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com> * Sync docs Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com> * Skip configuration validation if the circuit breaker is disabled Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com> * Make lint happy Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com> * Abstract the logic of the circuit breaker Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com> --------- Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>
Ref #6975
Changes
Wrap SetAsync with a circuit breaker to dynamically limit the cache backfilling requests. Both the memcached and redis clients have added the same circuit breaker configuration.
Verification
Added some unit tests.