Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The minimal region cache #291

Merged
merged 24 commits into from
Jul 21, 2021
Merged

The minimal region cache #291

merged 24 commits into from
Jul 21, 2021

Conversation

ekexium
Copy link
Collaborator

@ekexium ekexium commented May 19, 2021

The PR adds a minimal region cache implementation. It's far away from maturity. It's intended to discuss how the structure should look.

Related issue: #299

Main changes:

  • Place the region cache as a component in PdRpcClient. PdRpcClient -> region cache -> retry client (the low level pd client)
  • Make single-region plan a special case of multi-region plan

Credit to #116, #121 and #139. Since there have been some refactoring after the previous work, I didn't continue development based on those commits.

Naive manual benchmark

Environment: single machine, 1 PD, 3 TiKV, default scheduler + shuffle-region + shuffle-leader
Workload: 6 threads, 512 workers, 50% single-key-read, 50% single-key-write

  • zipfian distribution, pessimistic txns

master: TPS: 1943.9. Latency: mean: 41ms, P99: 1018ms, maximum: 1815ms
This PR: TPS 2108.5. Latency: mean: 12ms, P99: 108ms, maximum: 1061ms

  • uniform distribution, optimistic txns

master: TPS: 4279.4, mean: 70ms, P99: 1048ms, maximum: 2601ms
This PR: TPS: 21129.8, mean: 22ms, P99: 44ms, maximum: 286ms

Todo

  • Remove read_through_cache parameters, if we can always know the invalid cache entry
  • Treat single-region as a special case of multi-region
  • Test
  • Save unnecessary PD requests if they are querying the same region
  • Finer grained error handling
  • Naming

Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
@ekexium ekexium requested a review from andylokandy May 19, 2021 14:13
… rwlocks

Signed-off-by: ekexium <ekexium@gmail.com>
@ekexium ekexium changed the title The minimal region cache [WIP] The minimal region cache May 19, 2021
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
@ekexium ekexium force-pushed the region-cache branch 3 times, most recently from 9b8ed23 to c16c6dd Compare June 21, 2021 09:00
Signed-off-by: ekexium <ekexium@gmail.com>
@ekexium ekexium force-pushed the region-cache branch 2 times, most recently from d24e490 to 2668d9f Compare June 21, 2021 12:52
Signed-off-by: ekexium <ekexium@gmail.com>
@ekexium ekexium force-pushed the region-cache branch 2 times, most recently from 59eb893 to dbddc98 Compare June 21, 2021 14:05
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
tests/common/mod.rs Outdated Show resolved Hide resolved
src/region_cache.rs Outdated Show resolved Hide resolved
src/request/plan.rs Outdated Show resolved Hide resolved
tikv-client-store/src/errors.rs Outdated Show resolved Hide resolved
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
src/request/plan.rs Outdated Show resolved Hide resolved
Self::handle_region_error(pd_client.clone(), e, region_store).await?;
// don't sleep if we have resolved the region error
if !region_error_resolved {
futures_timer::Delay::new(duration).await;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not so related to the topic of the PR. As we assume the user has tokio runtime, the tokio timer may be preferred. (And we can remove the futures_timer dependency)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a wish that we can remove the dependency of tokio one day...

Copy link
Collaborator

@sticnarf sticnarf Jun 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I refused to introduce the runtime too. But I must admit, with a runtime, it is both easy for us to design the API and code, and for the users to use...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The client inevitably needs some background work to do. Rust does not provide an async runtime interface, and it is not so suitable to start a runtime by the client. Relying on a specific runtime looks like the only solution in the near future :(

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right. We can do it in another PR. #246

src/request/plan.rs Outdated Show resolved Hide resolved
src/request/plan.rs Outdated Show resolved Hide resolved
src/request/plan.rs Outdated Show resolved Hide resolved
Ok(false)
}
}
// errors from PD requests, backoff and retry
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emm, I don't see why there can be PD errors.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't considerably think whether it is possible. Just in case :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name is handle_region_error, so it is unreasonable that non-region errors occur here. Maybe the definition of HasRegionError::region_error can be changed to just return an errorpb::Error).

But it indeed reminds me that the error handling for PD errors is coarse grained. Not a big problem for now as there is typically not much trouble PD can cause.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name is handle_region_error, so it is unreasonable that non-region errors occur here.

Right. I just made the change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the definition of HasRegionError::region_error can be changed to just return an errorpb::Error)

Good idea

Cargo.toml Outdated Show resolved Hide resolved
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
Copy link
Collaborator

@sticnarf sticnarf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the current code.

By the way, the backoff design is a bit from the client go. It is simpler because it always use the same backoff type (for different errors) and does not record the backoff history. These features are nice to have, but not an urgent task.

@ekexium
Copy link
Collaborator Author

ekexium commented Jul 1, 2021

By the way, the backoff design is a bit from the client go. It is simpler because it always use the same backoff type (for different errors) and does not record the backoff history. These features are nice to have, but not an urgent task.

Yep, there is an issue there #310

Copy link
Contributor

@youjiali1995 youjiali1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The minimal region cache LGTM.

src/request/plan.rs Show resolved Hide resolved
@ekexium ekexium changed the title [WIP] The minimal region cache The minimal region cache Jul 21, 2021
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
@sticnarf
Copy link
Collaborator

Please fix the clippy warnings.

@ekexium ekexium mentioned this pull request Jul 21, 2021
4 tasks
Copy link
Collaborator

@sticnarf sticnarf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sticnarf sticnarf merged commit c14f23a into tikv:master Jul 21, 2021
@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/LGT1 Indicates that a PR has LGTM 1.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants