New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docdb] Master failover does not register dead TS still present in raft quorums #4691
Labels
Projects
Comments
bmatican
assigned bmatican, rahuldesirazu, hectorgcr and nspiegelberg and unassigned bmatican, hectorgcr and nspiegelberg
Jun 5, 2020
Asked Rahul to take a stab at this, with a simple proposed fix, of adding a fake TSDescriptor entry for any TS reported in heartbeats, but a backfilled time, so it is automatically marked as DEAD in the master. |
rahuldesirazu
added a commit
that referenced
this issue
Aug 19, 2020
… register themselves Summary: When we failover to a new master leader, we only update the replica map when we get a heartbeat from a tserver. In the case of a network partition or dead tserver, this is a problem since the replica map will show that the tablet is under-replicated, even if it is still in Raft quorums. The fix is to register a tablet server through the Raft config, if the config references a tserver that has not already registered with master. Also mark replicas that are in BOOTSTRAPPING/NOT_STARTED state with its true consensus information instead of marking it as a NON_PARTICIPANT. Test Plan: ybd release --cxx-test master_failover-itest --gtest_filter MasterFailoverTest.TestFailoverAfterTsFailure Reviewers: hector, nicolas, bogdan, mikhail Reviewed By: bogdan, mikhail Subscribers: mikhail, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D8655
rahuldesirazu
added a commit
that referenced
this issue
Aug 21, 2020
… register themselves Summary: When we failover to a new master leader, we only update the replica map when we get a heartbeat from a tserver. In the case of a network partition or dead tserver, this is a problem since the replica map will show that the tablet is under-replicated, even if it is still in Raft quorums. The fix is to register a tablet server through the Raft config, if the config references a tserver that has not already registered with master. Also mark replicas that are in BOOTSTRAPPING/NOT_STARTED state with its true consensus information instead of marking it as a NON_PARTICIPANT. Test Plan: Jenkins: rebase: 2.1 Reviewers: hector, nicolas, bogdan, mikhail Subscribers: ybase, mikhail Differential Revision: https://phabricator.dev.yugabyte.com/D9216
rahuldesirazu
added a commit
that referenced
this issue
Aug 21, 2020
… register themselves Summary: When we failover to a new master leader, we only update the replica map when we get a heartbeat from a tserver. In the case of a network partition or dead tserver, this is a problem since the replica map will show that the tablet is under-replicated, even if it is still in Raft quorums. The fix is to register a tablet server through the Raft config, if the config references a tserver that has not already registered with master. Also mark replicas that are in BOOTSTRAPPING/NOT_STARTED state with its true consensus information instead of marking it as a NON_PARTICIPANT. Test Plan: Jenkins: rebase: 2.2 Reviewers: hector, nicolas, bogdan, mikhail Subscribers: ybase, mikhail Differential Revision: https://phabricator.dev.yugabyte.com/D9215
@rahuldesirazu should we close this and link to the task about enabling this by default? |
Fixed in #4691. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Noticed this after one of our itest node operation failures. Tracked it down after noticing this in the master leader logs:
A master leader failover, while a dead TS is still present in raft groups, makes the master ignore that TS completely.
First a TS goes down. Normally, after 60s, it would be marked as DEAD in the master leader and the LB would not do any more work until 15m pass and raft kicks out the peer.
However, before those 15m pass, we get a master leader failover!
On the new leader, this previously dead TS obviously never reports in! However it is still practically in all its raft quorums... The other TSs will be reporting this to the master, via heartbeats, however, since the dead TS never checks in, the master will not have a TSDescriptor for it and thus the TabletReplica map will not include this TS (the log message I referenced above).
The consequence of this is that, instead of the LB being blocked, until this dead peer gets auto-removed by raft, instead, the LB kicks in and thinks the quorum is under-replicated!!! This ends up adding another peer to the quorum, leaving it, in reality, with 3/4 alive peers! At this point, any other node failure cripples these respective raft groups!
The text was updated successfully, but these errors were encountered: