raftstore: optimize AutoSplitController memory usage #16678

overvenus · 2024-03-20T14:51:16Z

What is changed and how it works?

Issue Number: ref #16653

What's Changed:

raftstore: optimize AutoSplitController memory usage

* Replaced unbounded channels with bounded channels to prevent unexpected
  memory buildup when AutoSplitController runs slowly.
* Implemented reusability of temporary vectors and maps during CPU stats
  handling to reduce memory allocation and deallocation overhead, saving
  about 10% CPU.

Tests shows that it saves about 10% CPU, and memory usage is much stable.

Header	2024-03-20 nightly	With this PR
CPU
Memory*	OOM frequently

The memory test is conducted with PR #16662.

Check List

Tests

Unit test

Release note

None

Signed-off-by: Neil Shen <overvenus@gmail.com>

ti-chi-bot · 2024-03-20T14:51:21Z

[REVIEW NOTIFICATION]

This pull request has been approved by:

Connor1996
glorv

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

Connor1996 · 2024-03-21T07:13:18Z

components/raftstore/src/store/worker/pd.rs

@@ -641,10 +642,11 @@ where
        let (timer_tx, timer_rx) = mpsc::channel();
        self.timer = Some(timer_tx);

-        let (read_stats_sender, read_stats_receiver) = mpsc::channel();
+        let stats_limit = 128;


Connor1996 · 2024-03-21T07:14:04Z

components/raftstore/src/store/worker/pd.rs

@@ -775,7 +776,7 @@ where
    pub fn maybe_send_read_stats(&self, read_stats: ReadStats) {
        if let Some(sender) = &self.read_stats_sender {
            if sender.send(read_stats).is_err() {
-                warn!("send read_stats failed, are we shutting down?")
+                debug!("send read_stats failed, are we shutting down?")


better not change?

The channel is changed to bounded sync_channel, it may overwhelm log files when there are lots of concurrent coprocessor requests.

is there any side-effect of missing read_stats when the channel is full, as this is impossible in the previous unbounded chan?

The read_stats are flushed from either the coprocessor and storage per tick, and the default tick interval in YATP is 1 second. So, there are only 2 * threads_number messages per second, which I think is okay for this limit. However, the problem might be that the ReadStats is large?

The problem is caused by CPU stats. CPU stats holds key ranges of all coprocessor requests.

is there any side-effect of missing read_stats when the channel is full, as this is impossible in the previous unbounded chan?

I think it's okay, because hotspot regions overwhelm the CPU stats channel.

Connor1996 · 2024-03-21T07:18:53Z

components/raftstore/src/store/worker/split_controller.rs

+        &mut self,
+        read_stats_receiver: &Receiver<ReadStats>,
+    ) -> &mut Vec<ReadStats> {
+        self.read_stats_vec.clear();


Do we need to check the capacity and shrink?

It GCs the vec every 30 seconds.

Signed-off-by: Neil Shen <overvenus@gmail.com>

Connor1996

LGTM

glorv · 2024-03-23T02:34:51Z

/merge

ti-chi-bot · 2024-03-23T02:34:52Z

@glorv: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot · 2024-03-23T02:34:54Z

This pull request has been accepted and is ready to merge.

Commit hash: 08bdf93

ti-chi-bot · 2024-03-23T02:39:54Z

@overvenus: Your PR was out of date, I have automatically updated it for you.

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ref tikv#16653 raftstore: optimize AutoSplitController memory usage * Replaced unbounded channels with bounded channels to prevent unexpected memory buildup when AutoSplitController runs slowly. * Implemented reusability of temporary vectors and maps during CPU stats handling to reduce memory allocation and deallocation overhead, saving about 10% CPU. Signed-off-by: Neil Shen <overvenus@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

overvenus added 4 commits March 20, 2024 19:08

pd: prefer bounded channels for stats monitor

4454c91

Signed-off-by: Neil Shen <overvenus@gmail.com>

split_controller: amortized HashMap alloc cost

494bd21

Signed-off-by: Neil Shen <overvenus@gmail.com>

pd: set stats buffer limit

11707a2

Signed-off-by: Neil Shen <overvenus@gmail.com>

refactor params into ctx

9db321c

Signed-off-by: Neil Shen <overvenus@gmail.com>

ti-chi-bot bot added do-not-merge/needs-triage-completed release-note-none size/XL labels Mar 20, 2024

overvenus removed the do-not-merge/needs-triage-completed label Mar 21, 2024

overvenus requested review from JmPotato, Connor1996 and lhy1024 March 21, 2024 03:33

Connor1996 reviewed Mar 21, 2024

View reviewed changes

overvenus added 2 commits March 21, 2024 15:53

address comments

8a95878

Signed-off-by: Neil Shen <overvenus@gmail.com>

wording

08bdf93

Signed-off-by: Neil Shen <overvenus@gmail.com>

Connor1996 approved these changes Mar 22, 2024

View reviewed changes

ti-chi-bot bot added the status/LGT1 Status: PR - There is already 1 approval label Mar 22, 2024

glorv approved these changes Mar 22, 2024

View reviewed changes

ti-chi-bot bot added status/LGT2 Status: PR - There are already 2 approvals and removed status/LGT1 Status: PR - There is already 1 approval labels Mar 22, 2024

nolouch approved these changes Mar 22, 2024

View reviewed changes

ti-chi-bot bot added the status/can-merge Status: Can merge to base branch label Mar 23, 2024

ti-chi-bot bot added 2 commits March 23, 2024 02:35

Merge branch 'master' into oom/cop/bounded-stats-channel-1

0ccd9dc

Merge branch 'master' into oom/cop/bounded-stats-channel-1

0a5c92a

ti-chi-bot bot merged commit 101b8bc into tikv:master Mar 23, 2024
7 checks passed

ti-chi-bot bot added this to the Pool milestone Mar 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

raftstore: optimize AutoSplitController memory usage #16678

raftstore: optimize AutoSplitController memory usage #16678

overvenus commented Mar 20, 2024 •

edited

ti-chi-bot bot commented Mar 20, 2024 •

edited

Connor1996 Mar 21, 2024

Connor1996 Mar 21, 2024

overvenus Mar 21, 2024

glorv Mar 21, 2024

nolouch Mar 21, 2024 •

edited

overvenus Mar 21, 2024

overvenus Mar 21, 2024

Connor1996 Mar 21, 2024

overvenus Mar 21, 2024

Connor1996 left a comment

glorv commented Mar 23, 2024

ti-chi-bot bot commented Mar 23, 2024

ti-chi-bot bot commented Mar 23, 2024

ti-chi-bot bot commented Mar 23, 2024

raftstore: optimize AutoSplitController memory usage #16678

raftstore: optimize AutoSplitController memory usage #16678

Conversation

overvenus commented Mar 20, 2024 • edited

What is changed and how it works?

Check List

Release note

ti-chi-bot bot commented Mar 20, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nolouch Mar 21, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Connor1996 left a comment

Choose a reason for hiding this comment

glorv commented Mar 23, 2024

ti-chi-bot bot commented Mar 23, 2024

ti-chi-bot bot commented Mar 23, 2024

ti-chi-bot bot commented Mar 23, 2024

overvenus commented Mar 20, 2024 •

edited

ti-chi-bot bot commented Mar 20, 2024 •

edited

nolouch Mar 21, 2024 •

edited