Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: revert low space threshold hard limit #2864

Merged
merged 2 commits into from
Sep 1, 2020

Conversation

rleungx
Copy link
Member

@rleungx rleungx commented Aug 31, 2020

What problem does this PR solve?

The previous PR uses a hard limit for low space. If the capacity beyond 500G, it will use 100G as its low space threshold. It doesn't seem reasonable.

What is changed and how it works?

This PR changes back to the original way to control low space.

Check List

Tests

  • Unit test

Related changes

  • Need to cherry-pick to the release branch

Release note

  • Change back to the original way to control the low space threshold

Signed-off-by: Ryan Leung <rleungx@gmail.com>
@rleungx rleungx added component/scheduler Scheduler logic. needs-cherry-pick-release-4.0 The PR needs to cherry pick to release-4.0 branch. labels Aug 31, 2020
Copy link
Member

@HunDunDM HunDunDM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Aug 31, 2020
@zz-jason
Copy link
Member

How MySQL and other databases determine the low space event?

Copy link
Contributor

@lhy1024 lhy1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don’t we do other judgment on large disks?

@zz-jason
Copy link
Member

Here is an example of MySQL: https://help.scalegrid.io/docs/mysql-alerts-rules

These are the default alert rules automated by ScaleGrid for all MySQL clusters:

  • CPU - Total (%) greater than 60%
  • Available disk space (%) less than 20%
  • Replication Delay (secs) greater than 300 secs
  • Connections (per second) greater than 1000 per second

@rleungx
Copy link
Member Author

rleungx commented Aug 31, 2020

Here is an example of MySQL: https://help.scalegrid.io/docs/mysql-alerts-rules

These are the default alert rules automated by ScaleGrid for all MySQL clusters:

  • CPU - Total (%) greater than 60%
  • Available disk space (%) less than 20%
  • Replication Delay (secs) greater than 300 secs
  • Connections (per second) greater than 1000 per second

It seems it's the same as our original way.

@zz-jason
Copy link
Member

CockroachDB Node is running low on disk space:

  • Rule: Send an alert when a node has less than 15% of free space remaining.
  • How to detect: Divide the capacity metric by the capacity_available metric in the node's _status/vars output.

@ZenoTan
Copy link
Contributor

ZenoTan commented Aug 31, 2020

LGTM

@ti-srebot
Copy link
Contributor

@ZenoTan,Thanks for your review. However, LGTM is restricted to Reviewers or higher roles.See the corresponding SIG page for more information. Related SIGs: scheduling(slack).

@rleungx
Copy link
Member Author

rleungx commented Aug 31, 2020

Don’t we do other judgment on large disks?

We need to find a better way.

Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot removed the status/LGT1 Indicates that a PR has LGTM 1. label Sep 1, 2020
@ti-srebot ti-srebot added the status/LGT2 Indicates that a PR has LGTM 2. label Sep 1, 2020
@rleungx
Copy link
Member Author

rleungx commented Sep 1, 2020

/merge

@ti-srebot ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label Sep 1, 2020
@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot
Copy link
Contributor

@rleungx merge failed.

@rleungx
Copy link
Member Author

rleungx commented Sep 1, 2020

/run-all-tests

@rleungx
Copy link
Member Author

rleungx commented Sep 1, 2020

/merge

@ti-srebot
Copy link
Contributor

Your auto merge job has been accepted, waiting for:

  • 2866

@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot ti-srebot merged commit df2344f into tikv:master Sep 1, 2020
ti-srebot pushed a commit to ti-srebot/pd that referenced this pull request Sep 1, 2020
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot
Copy link
Contributor

cherry pick to release-4.0 in PR #2875

disksing pushed a commit that referenced this pull request Sep 1, 2020
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>

Co-authored-by: Ryan Leung <rleungx@gmail.com>
howardlau1999 pushed a commit to howardlau1999/pd that referenced this pull request Sep 2, 2020
Signed-off-by: Ryan Leung <rleungx@gmail.com>
Signed-off-by: Howard Lau <howardlau1999@hotmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/scheduler Scheduler logic. needs-cherry-pick-release-4.0 The PR needs to cherry pick to release-4.0 branch. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants