Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: cache history loads in hot region scheduler #6314

Merged
merged 14 commits into from Apr 25, 2023

Conversation

bufferflies
Copy link
Contributor

@bufferflies bufferflies commented Apr 12, 2023

What problem does this PR solve?

Issue Number: Close #6297, Ref #6328, close #tikv/tikv#14458

What is changed and how does it work?

In past, the store pick strategy only consider the current loads, it can't work well if the loads is unstable, it brings many repeat operator to cost the net bandwidth.
In this pr, hot scheduler will save the history loads and the pick strategy will consider it, it decrease operator count if some store loads are unstable.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)

image

Code changes

Side effects

Related changes

Release note

None.

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Apr 12, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • nolouch
  • rleungx

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot
Copy link
Member

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot added do-not-merge/needs-linked-issue release-note-none do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Apr 12, 2023
@bufferflies bufferflies force-pushed the cache_in_hot_region branch 3 times, most recently from 49231b9 to 71be32d Compare April 12, 2023 12:14
@dbsid
Copy link

dbsid commented Apr 12, 2023

/release

@dbsid
Copy link

dbsid commented Apr 12, 2023

/build

@bufferflies bufferflies marked this pull request as ready for review April 12, 2023 13:44
@ti-chi-bot ti-chi-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 12, 2023
@bufferflies bufferflies force-pushed the cache_in_hot_region branch 2 times, most recently from 27a8371 to 675bc4b Compare April 13, 2023 02:33
@codecov
Copy link

codecov bot commented Apr 13, 2023

Codecov Report

Patch coverage: 87.62% and project coverage change: +0.08 🎉

Comparison is base (08b919a) 74.98% compared to head (9da541d) 75.07%.

❗ Current head 9da541d differs from pull request most recent head 123b743. Consider uploading reports for the commit 123b743 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6314      +/-   ##
==========================================
+ Coverage   74.98%   75.07%   +0.08%     
==========================================
  Files         408      408              
  Lines       40621    40704      +83     
==========================================
+ Hits        30461    30559      +98     
+ Misses       7504     7492      -12     
+ Partials     2656     2653       -3     
Flag Coverage Δ
unittests 75.07% <87.62%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/core/constant/kind.go 47.27% <ø> (ø)
pkg/schedule/config/config.go 33.33% <ø> (ø)
pkg/statistics/kind.go 37.86% <ø> (ø)
pkg/schedule/schedulers/hot_region.go 82.79% <73.33%> (+0.33%) ⬆️
pkg/schedule/schedulers/hot_region_v2.go 88.43% <100.00%> (-1.04%) ⬇️
pkg/statistics/store_hot_peers_infos.go 94.40% <100.00%> (+0.75%) ⬆️
pkg/statistics/store_load.go 98.40% <100.00%> (+0.42%) ⬆️
server/config/store_config.go 80.39% <100.00%> (+0.59%) ⬆️

... and 28 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@bufferflies
Copy link
Contributor Author

/ping @lhy1024 @nolouch

Signed-off-by: bufferflies <1045931706@qq.com>
@bufferflies
Copy link
Contributor Author

/ping @nolouch

Copy link
Contributor

@nolouch nolouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ti-chi-bot ti-chi-bot bot added the status/LGT1 Indicates that a PR has LGTM 1. label Apr 24, 2023
@bufferflies
Copy link
Contributor Author

/ping @rleungx

Copy link
Contributor

@lhy1024 lhy1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly LGTM if only use in multi rocksdb

pkg/schedule/schedulers/hot_region_test.go Outdated Show resolved Hide resolved
pkg/schedule/schedulers/hot_region.go Outdated Show resolved Hide resolved
pkg/schedule/schedulers/hot_region.go Show resolved Hide resolved
pkg/schedule/schedulers/hot_region.go Show resolved Hide resolved
pkg/statistics/store_load.go Outdated Show resolved Hide resolved
for i := range allStoreHistoryLoadSum {
expectHistoryLoads[i] = make([]float64, len(allStoreHistoryLoadSum[i]))
for j := range allStoreHistoryLoadSum[i] {
expectHistoryLoads[i][j] = allStoreHistoryLoadSum[i][j] / float64(allStoreCount)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the policy now adds an additional set of judgments that require both to be greater than or less than the historical mean value sampled in order to be allowed to be scheduled.Perhaps we can subsequently take other moving average and more lenient probabilities.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The average has a disadvantage that it is easily affected by extreme or minimal values, which can make the final result very large or small.

for example, 1,1,1,1,1,20,1,1,1,1,1,1, 1, we should actually consider that his load is at 1, but the average becomes 3

If the remaining two of the three nodes are 3,3,3,3,3,3,3,3, and 3,3,3,3,3,3, we should actually expect one of the nodes to schedule 1 to the first node in front of it, but the current one does not

If filtering the extreme values, the median is better, for the trend of the catch, hma will be better?

The current result is definitely better than master in most scenarios, but I think we should add a todo here

Copy link
Member

@rleungx rleungx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly, LGTM

@ti-chi-bot ti-chi-bot bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Apr 25, 2023
Signed-off-by: bufferflies <1045931706@qq.com>
Signed-off-by: bufferflies <1045931706@qq.com>
@bufferflies
Copy link
Contributor Author

/merge

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Apr 25, 2023

@bufferflies: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Apr 25, 2023

This pull request has been accepted and is ready to merge.

Commit hash: 123b743

@ti-chi-bot ti-chi-bot bot added the status/can-merge Indicates a PR has been approved by a committer. label Apr 25, 2023
@ti-chi-bot ti-chi-bot bot merged commit 4f87e9d into tikv:master Apr 25, 2023
19 checks passed
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.1: #6375.

ti-chi-bot bot added a commit that referenced this pull request Apr 27, 2023
close #6297, ref #6314, ref #6328, ref tikv/tikv#14458

Signed-off-by: bufferflies <1045931706@qq.com>

Co-authored-by: bufferflies <1045931706@qq.com>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
@BornChanger
Copy link

/label needs-cherry-pick-release-6.5

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-6.5: #6813.

ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this pull request Jul 14, 2023
close tikv#6297, ref tikv#6328, ref tikv/tikv#14458

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-cherry-pick-release-6.5 needs-cherry-pick-release-7.1 release-note-none status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hot-region-scheduler should be more stable if the loads are not stable
7 participants