Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] Add TServer with faulty drive to LB blacklist #12768

Closed
mikhpolitov opened this issue Jun 3, 2022 · 0 comments
Closed

[DocDB] Add TServer with faulty drive to LB blacklist #12768

mikhpolitov opened this issue Jun 3, 2022 · 0 comments
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@mikhpolitov
Copy link
Contributor

mikhpolitov commented Jun 3, 2022

Jira Link: DB-652

Description

A TServer with fault drive has fewer tablets than other TServers, which can cause Load Balancer to move tablets to this TServer, but moving tablets to TServer with fault drive leads to tablet count disbalances on leftover healthy drives. Also, the replaced drive starts with 0 tablets while other drives keep tablets.

@mikhpolitov mikhpolitov added area/docdb YugabyteDB core features priority/medium Medium priority issue labels Jun 3, 2022
@mikhpolitov mikhpolitov self-assigned this Jun 3, 2022
@yugabyte-ci yugabyte-ci added the kind/bug This issue is a bug label Jun 3, 2022
@mikhpolitov mikhpolitov added kind/enhancement This is an enhancement of an existing feature and removed kind/bug This issue is a bug labels Jun 3, 2022
@mikhpolitov mikhpolitov changed the title [DocDB] Add to LB blacklist TServer with fault drive [DocDB] Add TServer with fault drive to LB blacklist Jun 14, 2022
mikhpolitov added a commit that referenced this issue Jun 17, 2022
Summary:
A TServer with faulty drive has fewer tablets than other TServers, which can cause Load Balancer to move tablets to this TServer, but moving tablets to TServer with faulty drive leads to tablet count disbalances on leftover healthy drives. Also, the replaced drive starts with 0 tablets while other drives keep tablets.
With this change, the Load Balancer effectively blacklists the TServer. The corresponding platform change ensures that the user is alerted about it and would fix the underlying issue. Once the user fixes the underlying disk issue and brings the TServer back up, then the TServer no longer reports faulty drive issues to the master and as a result the blacklist gets removed.

Test Plan: ybd --cxx-test load_balancer_mini_cluster-test

Reviewers: skedia, jhe, rthallam

Reviewed By: jhe

Subscribers: ybase, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D17680
@mikhpolitov mikhpolitov changed the title [DocDB] Add TServer with fault drive to LB blacklist [DocDB] Add TServer with faulty drive to LB blacklist Jun 17, 2022
mikhpolitov added a commit that referenced this issue Jun 21, 2022
… blacklist

Summary:
A TServer with faulty drive has fewer tablets than other TServers, which can cause Load Balancer to move tablets to this TServer, but moving tablets to TServer with faulty drive leads to tablet count disbalances on leftover healthy drives. Also, the replaced drive starts with 0 tablets while other drives keep tablets.
With this change, the Load Balancer effectively blacklists the TServer. The corresponding platform change ensures that the user is alerted about it and would fix the underlying issue. Once the user fixes the underlying disk issue and brings the TServer back up, then the TServer no longer reports faulty drive issues to the master and as a result the blacklist gets removed.

Original commit: 2549499 / D17680
Partially changes from commit: df9bd67 / D17781

Test Plan: ybd --cxx-test load_balancer_mini_cluster-test

Reviewers: skedia, rthallam, sergei, jhe

Reviewed By: sergei, jhe

Subscribers: sergei, bogdan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D17775
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug and removed kind/enhancement This is an enhancement of an existing feature labels Jul 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

2 participants