Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docdb ] Leader-only tserver blacklisting mode for rolling upgrades #1748

Closed
mbautin opened this issue Jul 9, 2019 · 4 comments
Closed

[docdb ] Leader-only tserver blacklisting mode for rolling upgrades #1748

mbautin opened this issue Jul 9, 2019 · 4 comments
Assignees
Labels
area/docdb YugabyteDB core features
Milestone

Comments

@mbautin
Copy link
Collaborator

mbautin commented Jul 9, 2019

Rolling upgrades should be done as follows:

  • Move all leaders away from node 1 and load-balance them between other nodes.
  • Upgrade node 1.
  • Move leaders so that they are load-balanced between all nodes except node 2, and node 2 has no leaders.
  • Upgrade node 2.
    etc.

This requires splitting the current node blacklisting logic into two parts: for leaders and for data.

The load balancer should be responsible for rebalancing the leader load between upgrade of individual nodes.

@bmatican
Copy link
Contributor

Some pointers for this:

We have a generic mechanism for persisting cluster configuration:
src/yb/master/catalog_manager.cc::SetClusterConfig

This currently has a blacklisting functionality, that is used for draining data from the provided nodes. This needs to be persisted, in case of master failover, as the drain could take a while.

This blacklist is then used in the load balancer (see: src/yb/master/cluster_balance.cc, src/yb/master/cluster_balance_util.h), to determine some of the following:

  • if tablets exists on a blacklisted node, they need to be moved
  • if tablets are to be added, they should never go to a blacklisted node

There's also a yb-admin command to change the blacklist by adding/removing nodes from it: src/yb/tools/yb-admin_client.cc:ChangeBlacklist

@bmatican bmatican assigned bmatican and rajukumaryb and unassigned georgeklees and bmatican Aug 20, 2019
@bmatican bmatican moved this from To Do to In progress in YBase features Aug 27, 2019
@bmatican bmatican added this to the v2.1 milestone Sep 1, 2019
@bmatican bmatican changed the title A leader-only tserver blacklisting mode should be used in rolling upgrades [docdb ] Leader-only tserver blacklisting mode for rolling upgrades Sep 1, 2019
@rao-vasireddy
Copy link
Contributor

@rajukumaryb and I were discussing this issue -

  1. If a new YW with this capability is performing a gflag update or SW upgrade on a old universe that does not have the new blacklist command, the operation should be a no-op and continue , we should not raise an error and abort
  2. The blacklist needs to be only for a short duration and once the tserver restarts, it doesn't need to be black listed any longer.
  3. Can the current leader affinity feature be used or leveraged for this operation?

@rao-vasireddy
Copy link
Contributor

Also, we need to move the leaders off the node quickly, let's check the current rate limiting mechanism and see how we can speed it up.

@rajukumaryb
Copy link
Contributor

@rajukumaryb and I were discussing this issue -

  1. If a new YW with this capability is performing a gflag update or SW upgrade on a old universe that does not have the new blacklist command, the operation should be a no-op and continue , we should not raise an error and abort
  2. The blacklist needs to be only for a short duration and once the tserver restarts, it doesn't need to be black listed any longer.
  3. Can the current leader affinity feature be used or leveraged for this operation?

@rao-vasireddy -

  1. Might need some help from yugaware - cc: @Arnav15
  2. Can be achieved by "ADD" -> polling of "get_leader_blacklist_completion" -> "REMOVE"
  3. Leader affinity is placement_{cloud+region+zone} based - so it is coarser than this task which is tserver id based.

rajukumaryb added a commit that referenced this issue Sep 23, 2019
Summary:
Load balancer will move leadership role for all tablet replicas on leader blacklisted tservers to other tservers with follower replicas. Prior leader load balancing mechanism is extended to treat leader blacklisted tserver as hosting infinite leader replicas to achieve this goal.

Usage:
  yb-admin -master_addresses ... change_leader_blacklist ADD 127.0.0.1:9100
  yb-admin -master_addresses ... change_leader_blacklist REMOVE 127.0.0.1:9100

  yb-admin -master_addresses ... get_leader_blacklist_completion

Caveats:
  - Leader blacklisted tserver is not yet prevented from becoming a leader for some tablet. In this case, load balancer will again move leadership away from it.
  - If all replicas of a tablet are hosted on leader blacklisted tservers, load balancer cannot (yet) move the leadership role to a non-leader blacklisted tserver.

Test Plan:
./build/debug-gcc-dynamic-ninja/tests-master/catalog_manager-test --gtest_filter=TestLoadBalancerCommunity.TestLoadBalancerAlgorithm
./yb_build.sh debug --scb --java-test org.yb.loadtester.TestClusterTserverRollingLeaderBlacklist#testClusterTserverRollingLeaderBlacklist

Reviewers: bogdan, rahuldesirazu

Reviewed By: rahuldesirazu

Subscribers: rao, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D7145
YBase features automation moved this from In progress to Done Sep 23, 2019
@bmatican bmatican added this to To do in Improved rolling restarts via automation Mar 4, 2020
@bmatican bmatican moved this from To do to Done in Improved rolling restarts Feb 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features
Development

No branches or pull requests

6 participants