New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB] Fix load balancing of dead node #18838
Labels
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
priority/medium
Medium priority issue
Comments
SrivastavaAnubhav
added
kind/bug
This issue is a bug
area/docdb
YugabyteDB core features
priority/medium
Medium priority issue
labels
Aug 24, 2023
SrivastavaAnubhav
added a commit
that referenced
this issue
Aug 25, 2023
Summary: We expect to not load balance dead nodes, except for leader balancing if they are part of the leader blacklist. A `break` was mistakenly removed in D24663, which was caused the load balancer to call HandleRemoveReplicas even though there is a dead node. This diff adds the `break` back, and adds a DCHECK to HandleAddReplicas and HandleRemoveReplicas to catch this case. Test Plan: `ybd --cxx-test load_balancer_multi_table-test --gtest_filter="LoadBalancerMultiTableTest.TestDeadNodesLeaderBalancing"` now fail if the break is removed, because of the DCHECKs Reviewers: skedia Reviewed By: skedia Subscribers: ybase, bogdan Differential Revision: https://phorge.dev.yugabyte.com/D28063
SrivastavaAnubhav
added a commit
that referenced
this issue
Aug 30, 2023
…d balancer. Summary: Original commit: 01e36c7 / D24663 Missing break bug fix: Original commit: c3e1fbc / D28063 Sometimes the load balancer is not making progress, or is saying that load is balanced when it is not. We currently do not have visibility into where it is stuck / what decisions it is making. Adding verbose logging that can be turned on will help debug these cases. Also backports a fix for a bug introduced in the original commit. Jira: DB-6288, DB-7711 Test Plan: Run the load balancer tests with logging enabled at level 1/2/3 to see if the logs are helpful / not too verbose. ybd --cxx-test load_balancer_multi_table-test --gtest_filter="LoadBalancerMultiTableTest.TestDeadNodesLeaderBalancing" now fail if the break is removed, because of the DCHECKs Reviewers: skedia, jhe Reviewed By: jhe Subscribers: bogdan, ybase Differential Revision: https://phorge.dev.yugabyte.com/D28158
SrivastavaAnubhav
added a commit
that referenced
this issue
Aug 30, 2023
…d balancer. Summary: Original commit: 01e36c7 / D24663 Missing break bug fix: Original commit: c3e1fbc / D28063 Sometimes the load balancer is not making progress, or is saying that load is balanced when it is not. We currently do not have visibility into where it is stuck / what decisions it is making. Adding verbose logging that can be turned on will help debug these cases. Also backports a fix for a bug introduced in the original commit. Jira: DB-6288, DB-7711 Test Plan: Run the load balancer tests with logging enabled at level 1/2/3 to see if the logs are helpful / not too verbose. ybd --cxx-test load_balancer_multi_table-test --gtest_filter="LoadBalancerMultiTableTest.TestDeadNodesLeaderBalancing" now fail if the break is removed, because of the DCHECKs Reviewers: skedia, jhe Reviewed By: jhe Subscribers: ybase, bogdan Differential Revision: https://phorge.dev.yugabyte.com/D28062
SrivastavaAnubhav
added a commit
that referenced
this issue
Aug 30, 2023
…d balancer. Summary: Original commit: 01e36c7 / D24663 Missing break bug fix: Original commit: c3e1fbc / D28063 Sometimes the load balancer is not making progress, or is saying that load is balanced when it is not. We currently do not have visibility into where it is stuck / what decisions it is making. Adding verbose logging that can be turned on will help debug these cases. Also backports a fix for a bug introduced in the original commit. Jira: DB-6288, DB-7711 Test Plan: Run the load balancer tests with logging enabled at level 1/2/3 to see if the logs are helpful / not too verbose. ybd --cxx-test load_balancer_multi_table-test --gtest_filter="LoadBalancerMultiTableTest.TestDeadNodesLeaderBalancing" now fail if the break is removed, because of the DCHECKs Reviewers: skedia, jhe Reviewed By: jhe Subscribers: bogdan, ybase Differential Revision: https://phorge.dev.yugabyte.com/D28053
vaibhav-yb
pushed a commit
to vaibhav-yb/yugabyte-db
that referenced
this issue
Sep 7, 2023
…lets Summary: We expect to not load balance dead nodes, except for leader balancing if they are part of the leader blacklist. A `break` was mistakenly removed in D24663, which was caused the load balancer to call HandleRemoveReplicas even though there is a dead node. This diff adds the `break` back, and adds a DCHECK to HandleAddReplicas and HandleRemoveReplicas to catch this case. Test Plan: `ybd --cxx-test load_balancer_multi_table-test --gtest_filter="LoadBalancerMultiTableTest.TestDeadNodesLeaderBalancing"` now fail if the break is removed, because of the DCHECKs Reviewers: skedia Reviewed By: skedia Subscribers: ybase, bogdan Differential Revision: https://phorge.dev.yugabyte.com/D28063
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
priority/medium
Medium priority issue
Jira Link: DB-7711
Description
We expect to not load balance dead nodes, except for leader balancing if they are part of the leader blacklist. A
break
was mistakenly removed in 01e36c7#diff-04b2214df1633fc7379d4dfc5c7fcb67bba4a9dd70834b84ac066a9c7ab0750dL498, which caused the load balancer to try to remove replicas even when there is a dead node.Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: