New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raftstore: optimize AutoSplitController memory usage #16678
raftstore: optimize AutoSplitController memory usage #16678
Conversation
Signed-off-by: Neil Shen <overvenus@gmail.com>
Signed-off-by: Neil Shen <overvenus@gmail.com>
Signed-off-by: Neil Shen <overvenus@gmail.com>
Signed-off-by: Neil Shen <overvenus@gmail.com>
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
@@ -641,10 +642,11 @@ where | |||
let (timer_tx, timer_rx) = mpsc::channel(); | |||
self.timer = Some(timer_tx); | |||
|
|||
let (read_stats_sender, read_stats_receiver) = mpsc::channel(); | |||
let stats_limit = 128; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use const
@@ -775,7 +776,7 @@ where | |||
pub fn maybe_send_read_stats(&self, read_stats: ReadStats) { | |||
if let Some(sender) = &self.read_stats_sender { | |||
if sender.send(read_stats).is_err() { | |||
warn!("send read_stats failed, are we shutting down?") | |||
debug!("send read_stats failed, are we shutting down?") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better not change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The channel is changed to bounded sync_channel
, it may overwhelm log files when there are lots of concurrent coprocessor requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any side-effect of missing read_stats when the channel is full, as this is impossible in the previous unbounded chan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The read_stats are flushed from either the coprocessor
and storage
per tick, and the default tick interval in YATP is 1 second. So, there are only 2 * threads_number
messages per second, which I think is okay for this limit. However, the problem might be that the ReadStats
is large?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is caused by CPU stats. CPU stats holds key ranges of all coprocessor requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any side-effect of missing read_stats when the channel is full, as this is impossible in the previous unbounded chan?
I think it's okay, because hotspot regions overwhelm the CPU stats channel.
&mut self, | ||
read_stats_receiver: &Receiver<ReadStats>, | ||
) -> &mut Vec<ReadStats> { | ||
self.read_stats_vec.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check the capacity and shrink?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It GCs the vec every 30 seconds.
Signed-off-by: Neil Shen <overvenus@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/merge |
@glorv: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests You only need to trigger
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
This pull request has been accepted and is ready to merge. Commit hash: 08bdf93
|
@overvenus: Your PR was out of date, I have automatically updated it for you. If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
ref tikv#16653 raftstore: optimize AutoSplitController memory usage * Replaced unbounded channels with bounded channels to prevent unexpected memory buildup when AutoSplitController runs slowly. * Implemented reusability of temporary vectors and maps during CPU stats handling to reduce memory allocation and deallocation overhead, saving about 10% CPU. Signed-off-by: Neil Shen <overvenus@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
What is changed and how it works?
Issue Number: ref #16653
What's Changed:
Tests shows that it saves about 10% CPU, and memory usage is much stable.
OOM frequently
The memory test is conducted with PR #16662.
Check List
Tests
Release note