Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] Fix load balancing of dead node #18838

Closed
1 task done
SrivastavaAnubhav opened this issue Aug 24, 2023 · 0 comments
Closed
1 task done

[DocDB] Fix load balancing of dead node #18838

SrivastavaAnubhav opened this issue Aug 24, 2023 · 0 comments
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@SrivastavaAnubhav
Copy link
Contributor

SrivastavaAnubhav commented Aug 24, 2023

Jira Link: DB-7711

Description

We expect to not load balance dead nodes, except for leader balancing if they are part of the leader blacklist. A break was mistakenly removed in 01e36c7#diff-04b2214df1633fc7379d4dfc5c7fcb67bba4a9dd70834b84ac066a9c7ab0750dL498, which caused the load balancer to try to remove replicas even when there is a dead node.

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@SrivastavaAnubhav SrivastavaAnubhav added kind/bug This issue is a bug area/docdb YugabyteDB core features priority/medium Medium priority issue labels Aug 24, 2023
@SrivastavaAnubhav SrivastavaAnubhav self-assigned this Aug 24, 2023
SrivastavaAnubhav added a commit that referenced this issue Aug 25, 2023
Summary: We expect to not load balance dead nodes, except for leader balancing if they are part of the leader blacklist. A `break` was mistakenly removed in D24663, which was caused the load balancer to call HandleRemoveReplicas even though there is a dead node. This diff adds the `break` back, and adds a DCHECK to HandleAddReplicas and HandleRemoveReplicas to catch this case.

Test Plan: `ybd --cxx-test load_balancer_multi_table-test --gtest_filter="LoadBalancerMultiTableTest.TestDeadNodesLeaderBalancing"` now fail if the break is removed, because of the DCHECKs

Reviewers: skedia

Reviewed By: skedia

Subscribers: ybase, bogdan

Differential Revision: https://phorge.dev.yugabyte.com/D28063
SrivastavaAnubhav added a commit that referenced this issue Aug 30, 2023
…d balancer.

Summary:
Original commit: 01e36c7 / D24663

Missing break bug fix:
Original commit: c3e1fbc / D28063

Sometimes the load balancer is not making progress, or is saying that load is balanced when it is not. We currently do not have visibility into where it is stuck / what decisions it is making. Adding verbose logging that can be turned on will help debug these cases.

Also backports a fix for a bug introduced in the original commit.

Jira: DB-6288, DB-7711

Test Plan:
Run the load balancer tests with logging enabled at level 1/2/3 to see if the logs are helpful / not too verbose.

ybd --cxx-test load_balancer_multi_table-test --gtest_filter="LoadBalancerMultiTableTest.TestDeadNodesLeaderBalancing" now fail if the break is removed, because of the DCHECKs

Reviewers: skedia, jhe

Reviewed By: jhe

Subscribers: bogdan, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D28158
SrivastavaAnubhav added a commit that referenced this issue Aug 30, 2023
…d balancer.

Summary:
Original commit: 01e36c7 / D24663

Missing break bug fix:
Original commit: c3e1fbc / D28063

Sometimes the load balancer is not making progress, or is saying that load is balanced when it is not. We currently do not have visibility into where it is stuck / what decisions it is making. Adding verbose logging that can be turned on will help debug these cases.

Also backports a fix for a bug introduced in the original commit.

Jira: DB-6288, DB-7711

Test Plan:
Run the load balancer tests with logging enabled at level 1/2/3 to see if the logs are helpful / not too verbose.

ybd --cxx-test load_balancer_multi_table-test --gtest_filter="LoadBalancerMultiTableTest.TestDeadNodesLeaderBalancing" now fail if the break is removed, because of the DCHECKs

Reviewers: skedia, jhe

Reviewed By: jhe

Subscribers: ybase, bogdan

Differential Revision: https://phorge.dev.yugabyte.com/D28062
SrivastavaAnubhav added a commit that referenced this issue Aug 30, 2023
…d balancer.

Summary:
Original commit: 01e36c7 / D24663

Missing break bug fix:
Original commit: c3e1fbc / D28063

Sometimes the load balancer is not making progress, or is saying that load is balanced when it is not. We currently do not have visibility into where it is stuck / what decisions it is making. Adding verbose logging that can be turned on will help debug these cases.

Also backports a fix for a bug introduced in the original commit.

Jira: DB-6288, DB-7711

Test Plan:
Run the load balancer tests with logging enabled at level 1/2/3 to see if the logs are helpful / not too verbose.

ybd --cxx-test load_balancer_multi_table-test --gtest_filter="LoadBalancerMultiTableTest.TestDeadNodesLeaderBalancing" now fail if the break is removed, because of the DCHECKs

Reviewers: skedia, jhe

Reviewed By: jhe

Subscribers: bogdan, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D28053
vaibhav-yb pushed a commit to vaibhav-yb/yugabyte-db that referenced this issue Sep 7, 2023
…lets

Summary: We expect to not load balance dead nodes, except for leader balancing if they are part of the leader blacklist. A `break` was mistakenly removed in D24663, which was caused the load balancer to call HandleRemoveReplicas even though there is a dead node. This diff adds the `break` back, and adds a DCHECK to HandleAddReplicas and HandleRemoveReplicas to catch this case.

Test Plan: `ybd --cxx-test load_balancer_multi_table-test --gtest_filter="LoadBalancerMultiTableTest.TestDeadNodesLeaderBalancing"` now fail if the break is removed, because of the DCHECKs

Reviewers: skedia

Reviewed By: skedia

Subscribers: ybase, bogdan

Differential Revision: https://phorge.dev.yugabyte.com/D28063
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

1 participant