New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: support dynamic config max-grpc-send-msg-len and raft-msg-max-batch-size #12335
Conversation
Signed-off-by: glorv <glorvs@163.com>
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
/test |
Signed-off-by: glorv <glorvs@163.com>
@5kbpers @Connor1996 PTAL |
CI failed |
/test |
All failures are caused by github connection timeout.😂️ |
/build |
src/server/raft_client.rs
Outdated
for entry in msg.get_message().get_entries() { | ||
msg_size += entry.data.len(); | ||
let msg_size = Self::message_size(&msg); | ||
// try refrech config before check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refrech -> refresh
@@ -163,7 +163,7 @@ impl<T: RaftStoreRouter<E::Local> + Unpin, S: StoreAddrResolver + 'static, E: En | |||
|
|||
let conn_builder = ConnectionBuilder::new( | |||
env.clone(), | |||
Arc::new(cfg.value().clone()), | |||
Arc::clone(cfg), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cfg.clone()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure either Arc::clone
or .close()
is recommended in our code style, though both are ok to me. There are both of them in the current codebase. (Search in tikv for Arc::clone
returns 333 hits)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rest LGTM
…o grpc-send-msg-len
Signed-off-by: glorv <glorvs@163.com>
Signed-off-by: glorv <glorvs@163.com>
@Connor1996 PTAL again, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/merge |
@5kbpers: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests You only need to trigger If you have any questions about the PR merge process, please refer to pr process. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
This pull request has been accepted and is ready to merge. Commit hash: a3be3ff
|
…-batch-size (tikv#12335) close tikv#12334 Signed-off-by: glorv <glorvs@163.com>
@@ -461,6 +460,12 @@ impl Config { | |||
)); | |||
} | |||
|
|||
if self.raft_entry_max_size.0 == 0 || self.raft_entry_max_size.0 > ReadableSize::gb(3).0 { | |||
return Err(box_err!( | |||
"raft entry max size should be greater than 0 and less than or equal to 3GiB" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we support 3GiB. Not even 2.01GiB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I set this to the vlaue as raft-max-size-per-msg
, this is just an intuitive upper bound, do you mean the hard limit should be 2GB? Sorry I don't find the related code.
msg_size += entry.data.len(); | ||
let msg_size = Self::message_size(&msg); | ||
// try refresh config before check | ||
if let Some(new_cfg) = self.cfg_tracker.any_new() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you bench the performance impact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my benchmark, this is no notable impact after this change. In the most common case, this is only one extra atomic load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the atomic load is way heavier than the original that just sums up several numbers. A better place to check for new version is when a message is sent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So do you mean that we move this check into flush
call as the frequency of flush should be much lower than push
especially when the lead is heavy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
@@ -83,7 +83,6 @@ pub struct Config { | |||
#[online_config(skip)] | |||
pub status_thread_pool_size: usize, | |||
|
|||
#[online_config(skip)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the point of changing this value online. It's only for controlling the size of a batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use case is from tidb-cloud to minimize the memory usage in a minimal test cluster. This kind of scenario is not common. I think it is harmless to support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The batch is about 10MiB, and it only consumes about 10MiB * grpc_concurrency = 40MiB.
Signed-off-by: glorv glorvs@163.com
What is changed and how it works?
Issue Number: Close #12334
What's Changed:
Related changes
pingcap/docs
/pingcap/docs-cn
:Check List
Tests
Side effects
Release note