Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raftstore: optimize AutoSplitController memory usage #16678

Merged
merged 8 commits into from Mar 23, 2024

Conversation

overvenus
Copy link
Member

@overvenus overvenus commented Mar 20, 2024

What is changed and how it works?

Issue Number: ref #16653

What's Changed:

raftstore: optimize AutoSplitController memory usage

* Replaced unbounded channels with bounded channels to prevent unexpected
  memory buildup when AutoSplitController runs slowly.
* Implemented reusability of temporary vectors and maps during CPU stats
  handling to reduce memory allocation and deallocation overhead, saving
  about 10% CPU.

Tests shows that it saves about 10% CPU, and memory usage is much stable.

Header 2024-03-20 nightly With this PR
CPU image image
Memory* image
OOM frequently
image

The memory test is conducted with PR #16662.

Check List

Tests

  • Unit test

Release note

None

Signed-off-by: Neil Shen <overvenus@gmail.com>
Signed-off-by: Neil Shen <overvenus@gmail.com>
Signed-off-by: Neil Shen <overvenus@gmail.com>
Signed-off-by: Neil Shen <overvenus@gmail.com>
Copy link
Contributor

ti-chi-bot bot commented Mar 20, 2024

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • Connor1996
  • glorv

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@@ -641,10 +642,11 @@ where
let (timer_tx, timer_rx) = mpsc::channel();
self.timer = Some(timer_tx);

let (read_stats_sender, read_stats_receiver) = mpsc::channel();
let stats_limit = 128;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use const

@@ -775,7 +776,7 @@ where
pub fn maybe_send_read_stats(&self, read_stats: ReadStats) {
if let Some(sender) = &self.read_stats_sender {
if sender.send(read_stats).is_err() {
warn!("send read_stats failed, are we shutting down?")
debug!("send read_stats failed, are we shutting down?")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better not change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The channel is changed to bounded sync_channel, it may overwhelm log files when there are lots of concurrent coprocessor requests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any side-effect of missing read_stats when the channel is full, as this is impossible in the previous unbounded chan?

Copy link
Contributor

@nolouch nolouch Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The read_stats are flushed from either the coprocessor and storage per tick, and the default tick interval in YATP is 1 second. So, there are only 2 * threads_number messages per second, which I think is okay for this limit. However, the problem might be that the ReadStats is large?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is caused by CPU stats. CPU stats holds key ranges of all coprocessor requests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any side-effect of missing read_stats when the channel is full, as this is impossible in the previous unbounded chan?

I think it's okay, because hotspot regions overwhelm the CPU stats channel.

&mut self,
read_stats_receiver: &Receiver<ReadStats>,
) -> &mut Vec<ReadStats> {
self.read_stats_vec.clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check the capacity and shrink?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It GCs the vec every 30 seconds.

Signed-off-by: Neil Shen <overvenus@gmail.com>
Signed-off-by: Neil Shen <overvenus@gmail.com>
Copy link
Member

@Connor1996 Connor1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added the status/LGT1 Status: PR - There is already 1 approval label Mar 22, 2024
@ti-chi-bot ti-chi-bot bot added status/LGT2 Status: PR - There are already 2 approvals and removed status/LGT1 Status: PR - There is already 1 approval labels Mar 22, 2024
@glorv
Copy link
Contributor

glorv commented Mar 23, 2024

/merge

Copy link
Contributor

ti-chi-bot bot commented Mar 23, 2024

@glorv: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Copy link
Contributor

ti-chi-bot bot commented Mar 23, 2024

This pull request has been accepted and is ready to merge.

Commit hash: 08bdf93

@ti-chi-bot ti-chi-bot bot added the status/can-merge Status: Can merge to base branch label Mar 23, 2024
Copy link
Contributor

ti-chi-bot bot commented Mar 23, 2024

@overvenus: Your PR was out of date, I have automatically updated it for you.

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot bot merged commit 101b8bc into tikv:master Mar 23, 2024
7 checks passed
@ti-chi-bot ti-chi-bot bot added this to the Pool milestone Mar 23, 2024
hongyunyan pushed a commit to hongyunyan/tikv that referenced this pull request Mar 28, 2024
ref tikv#16653

raftstore: optimize AutoSplitController memory usage

* Replaced unbounded channels with bounded channels to prevent unexpected
  memory buildup when AutoSplitController runs slowly.
* Implemented reusability of temporary vectors and maps during CPU stats
  handling to reduce memory allocation and deallocation overhead, saving
  about 10% CPU.

Signed-off-by: Neil Shen <overvenus@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none size/XL status/can-merge Status: Can merge to base branch status/LGT2 Status: PR - There are already 2 approvals
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants