-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The minimal region cache #291
Conversation
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
… rwlocks Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
9b8ed23
to
c16c6dd
Compare
Signed-off-by: ekexium <ekexium@gmail.com>
d24e490
to
2668d9f
Compare
Signed-off-by: ekexium <ekexium@gmail.com>
59eb893
to
dbddc98
Compare
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
Self::handle_region_error(pd_client.clone(), e, region_store).await?; | ||
// don't sleep if we have resolved the region error | ||
if !region_error_resolved { | ||
futures_timer::Delay::new(duration).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not so related to the topic of the PR. As we assume the user has tokio runtime, the tokio timer may be preferred. (And we can remove the futures_timer
dependency)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a wish that we can remove the dependency of tokio one day...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I refused to introduce the runtime too. But I must admit, with a runtime, it is both easy for us to design the API and code, and for the users to use...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The client inevitably needs some background work to do. Rust does not provide an async runtime interface, and it is not so suitable to start a runtime by the client. Relying on a specific runtime looks like the only solution in the near future :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right. We can do it in another PR. #246
src/request/plan.rs
Outdated
Ok(false) | ||
} | ||
} | ||
// errors from PD requests, backoff and retry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emm, I don't see why there can be PD errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't considerably think whether it is possible. Just in case :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function name is handle_region_error
, so it is unreasonable that non-region errors occur here. Maybe the definition of HasRegionError::region_error
can be changed to just return an errorpb::Error)
.
But it indeed reminds me that the error handling for PD errors is coarse grained. Not a big problem for now as there is typically not much trouble PD can cause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function name is handle_region_error, so it is unreasonable that non-region errors occur here.
Right. I just made the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the definition of HasRegionError::region_error can be changed to just return an errorpb::Error)
Good idea
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the current code.
By the way, the backoff design is a bit from the client go. It is simpler because it always use the same backoff type (for different errors) and does not record the backoff history. These features are nice to have, but not an urgent task.
Yep, there is an issue there #310 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The minimal region cache LGTM.
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
Please fix the clippy warnings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The PR adds a minimal region cache implementation. It's far away from maturity. It's intended to discuss how the structure should look.
Related issue: #299
Main changes:
PdRpcClient
.PdRpcClient
-> region cache -> retry client (the low level pd client)Credit to #116, #121 and #139. Since there have been some refactoring after the previous work, I didn't continue development based on those commits.
Naive manual benchmark
Environment: single machine, 1 PD, 3 TiKV, default scheduler + shuffle-region + shuffle-leader
Workload: 6 threads, 512 workers, 50% single-key-read, 50% single-key-write
master: TPS: 1943.9. Latency: mean: 41ms, P99: 1018ms, maximum: 1815ms
This PR: TPS 2108.5. Latency: mean: 12ms, P99: 108ms, maximum: 1061ms
master: TPS: 4279.4, mean: 70ms, P99: 1048ms, maximum: 2601ms
This PR: TPS: 21129.8, mean: 22ms, P99: 44ms, maximum: 286ms
Todo
read_through_cache
parameters, if we can always know the invalid cache entry