Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raftstore: make full compaction incremental, pause when load is high #15995

Merged

Conversation

afeinberg
Copy link
Contributor

@afeinberg afeinberg commented Nov 15, 2023

Issue Number: ref #15271

Makes full compaction incremental, by range. Currently regions' ranges are used as increments.

Run a predicate ("load-check") function before starting full compaction and between each incremental
range.  If the function evaluates to false, pause with exponential backoff (up to a maximum duration)
until it evaluates to  true. 

If periodic full compaction is enabled, poll process CPU stats every 30 seconds to determine usage for the
"load-check"  function. If usage exceeds a certain threshold before full compaction starts, compaction will
 not be started, and if started, full compaction will be paused. This cpu usage is also exported as
``tikv_storage_process_stat_cpu_usage`` gauge metric.

Testing

  • Manual tests.
  • Unit test in compact.rs

Release note

Adds experimental support for periodic full compaction. Periodic full compaction ensures that SSTables are 
compactedat all levels, including the bottommost (L6.) This ensures compaction filters that purge deleted data
are able to run.  Since full compaction can impact live serving, periodic full compaction should only run during
times at which the load  is likely to low. As an additional precaution, periodic full compaction will not start (or if?
started, will be paused) if CPU usage exceeds a specified threshold.

Example tikv.toml configuration:
[raftstore]
periodic-full-compact-start-max-cpu = 0.33 
periodic-full-compact-start-times = ["03:00", "23:00"] 

Copy link
Contributor

ti-chi-bot bot commented Nov 15, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • LykxSassinator
  • v01dstar

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. contribution This PR is from a community contributor. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 15, 2023
@afeinberg
Copy link
Contributor Author

/cc @tonyxuqqi @v01dstar

@afeinberg afeinberg changed the title rafstore: make full compaction incremental, pause when load is high raftstore: make full compaction incremental, pause when load is high Nov 15, 2023
@afeinberg
Copy link
Contributor Author

/test

@afeinberg
Copy link
Contributor Author

/cc @Connor1996

@ti-chi-bot ti-chi-bot bot requested a review from Connor1996 November 15, 2023 22:37
@afeinberg afeinberg force-pushed the afeinberg/full_compaction/incremental branch 2 times, most recently from 389eb4f to a8f6bbe Compare November 17, 2023 06:23
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 17, 2023
@afeinberg afeinberg force-pushed the afeinberg/full_compaction/incremental branch from a8f6bbe to 5516025 Compare November 17, 2023 07:05
components/raftstore/src/store/worker/compact.rs Outdated Show resolved Hide resolved
return false;
}

let cpu_usage = PROCESS_STAT_CPU_USAGE.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we were trying to get cpu usage "over time" instead of "point in time"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after discussion with @tonyxuqqi we decided to decide on cpu usage at every 10 minute interval. so we are doing over time, but that time is a 10 minute interval. (we call it every 10 minutes, and it returns the cpu usage as defined by cpu time - cpu time last time process_stat.cpu_usage() is gotten / time - it was last called.)

Signed-off-by: Alex Feinberg <alex@strlen.net>
Signed-off-by: Alex Feinberg <alex@strlen.net>
Signed-off-by: Alex Feinberg <alex@strlen.net>
Signed-off-by: Alex Feinberg <alex@strlen.net>
Signed-off-by: Alex Feinberg <alex@strlen.net>
Signed-off-by: Alex Feinberg <alex@strlen.net>
Signed-off-by: Alex Feinberg <alex@strlen.net>
Signed-off-by: Alex Feinberg <alex@strlen.net>
@afeinberg afeinberg force-pushed the afeinberg/full_compaction/incremental branch from 918d107 to 60b3fdf Compare November 21, 2023 20:14
@ti-chi-bot ti-chi-bot bot removed the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Nov 21, 2023
Signed-off-by: Alex Feinberg <alex@strlen.net>
Signed-off-by: Alex Feinberg <alex@strlen.net>
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 21, 2023
Signed-off-by: Alex Feinberg <alex@strlen.net>
Copy link
Contributor

@v01dstar v01dstar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added the status/LGT1 Indicates that a PR has LGTM 1. label Nov 21, 2023
@ti-chi-bot ti-chi-bot bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Nov 22, 2023
@afeinberg
Copy link
Contributor Author

/assign @tonyxuqqi

@tonyxuqqi
Copy link
Contributor

/merge

Copy link
Contributor

ti-chi-bot bot commented Nov 22, 2023

@tonyxuqqi: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Copy link
Contributor

ti-chi-bot bot commented Nov 22, 2023

This pull request has been accepted and is ready to merge.

Commit hash: f6e8ebd

@ti-chi-bot ti-chi-bot bot added the status/can-merge Indicates a PR has been approved by a committer. label Nov 22, 2023
@ti-chi-bot ti-chi-bot bot merged commit dce0e55 into tikv:master Nov 22, 2023
6 of 7 checks passed
@ti-chi-bot ti-chi-bot bot added this to the Pool milestone Nov 22, 2023
overvenus added a commit to overvenus/tikv that referenced this pull request Nov 28, 2023
Signed-off-by: Neil Shen <overvenus@gmail.com>
@zhangjinpeng87 zhangjinpeng87 added the needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. label May 20, 2024
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this pull request May 20, 2024
ref tikv#15271

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.5: #17034.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contribution This PR is from a community contributor. needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants