Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docdb] Master failover does not register dead TS still present in raft quorums #4691

Closed
bmatican opened this issue Jun 5, 2020 · 3 comments
Assignees
Labels
area/docdb YugabyteDB core features priority/high High Priority

Comments

@bmatican
Copy link
Contributor

bmatican commented Jun 5, 2020

Noticed this after one of our itest node operation failures. Tracked it down after noticing this in the master leader logs:

W0605 04:13:55.193491 4812 catalog_manager.cc:5754] T 00000000000000000000000000000000 P 5ab0a67716ea4057aa96e0b84fac4ec4: Tablet server has never reported in. Not including in replica locations map yet. Peer: permanent_uuid: “e34a66942b0f45abb320302acef44d98” member_type: VOTER last_known_private_addr { host: “127.0.0.3" port: 9100 } cloud_info { placement_cloud: “cloud1” placement_region: “datacenter1" placement_zone: “rack1” }; Tablet: ea7e324524d44420a169a9c73553c42b (table transactions [id=995dc056ed86423ebb2d93d5c0ed3162])

A master leader failover, while a dead TS is still present in raft groups, makes the master ignore that TS completely.

First a TS goes down. Normally, after 60s, it would be marked as DEAD in the master leader and the LB would not do any more work until 15m pass and raft kicks out the peer.
However, before those 15m pass, we get a master leader failover!

On the new leader, this previously dead TS obviously never reports in! However it is still practically in all its raft quorums... The other TSs will be reporting this to the master, via heartbeats, however, since the dead TS never checks in, the master will not have a TSDescriptor for it and thus the TabletReplica map will not include this TS (the log message I referenced above).

The consequence of this is that, instead of the LB being blocked, until this dead peer gets auto-removed by raft, instead, the LB kicks in and thinks the quorum is under-replicated!!! This ends up adding another peer to the quorum, leaving it, in reality, with 3/4 alive peers! At this point, any other node failure cripples these respective raft groups!

@bmatican bmatican added area/docdb YugabyteDB core features priority/high High Priority labels Jun 5, 2020
@bmatican
Copy link
Contributor Author

bmatican commented Jun 6, 2020

Asked Rahul to take a stab at this, with a simple proposed fix, of adding a fake TSDescriptor entry for any TS reported in heartbeats, but a backfilled time, so it is automatically marked as DEAD in the master.

@bmatican bmatican added this to In progress in YBase features Jun 24, 2020
rahuldesirazu added a commit that referenced this issue Aug 19, 2020
… register themselves

Summary:
When we failover to a new master leader, we only update the replica map when we get a heartbeat from a tserver. In the case of a network partition or dead tserver, this is a problem since the replica map will show that the tablet is under-replicated, even if it is still in Raft quorums.

The fix is to register a tablet server through the Raft config, if the config references a tserver that has not already registered with master.

Also mark replicas that are in BOOTSTRAPPING/NOT_STARTED state with its true consensus information instead of marking it as a NON_PARTICIPANT.

Test Plan: ybd release --cxx-test master_failover-itest --gtest_filter MasterFailoverTest.TestFailoverAfterTsFailure

Reviewers: hector, nicolas, bogdan, mikhail

Reviewed By: bogdan, mikhail

Subscribers: mikhail, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D8655
rahuldesirazu added a commit that referenced this issue Aug 21, 2020
… register themselves

Summary:
When we failover to a new master leader, we only update the replica map when we get a heartbeat from a tserver. In the case of a network partition or dead tserver, this is a problem since the replica map will show that the tablet is under-replicated, even if it is still in Raft quorums.

The fix is to register a tablet server through the Raft config, if the config references a tserver that has not already registered with master.

Also mark replicas that are in BOOTSTRAPPING/NOT_STARTED state with its true consensus information instead of marking it as a NON_PARTICIPANT.

Test Plan: Jenkins: rebase: 2.1

Reviewers: hector, nicolas, bogdan, mikhail

Subscribers: ybase, mikhail

Differential Revision: https://phabricator.dev.yugabyte.com/D9216
rahuldesirazu added a commit that referenced this issue Aug 21, 2020
… register themselves

Summary:
When we failover to a new master leader, we only update the replica map when we get a heartbeat from a tserver. In the case of a network partition or dead tserver, this is a problem since the replica map will show that the tablet is under-replicated, even if it is still in Raft quorums.

The fix is to register a tablet server through the Raft config, if the config references a tserver that has not already registered with master.

Also mark replicas that are in BOOTSTRAPPING/NOT_STARTED state with its true consensus information instead of marking it as a NON_PARTICIPANT.

Test Plan: Jenkins: rebase: 2.2

Reviewers: hector, nicolas, bogdan, mikhail

Subscribers: ybase, mikhail

Differential Revision: https://phabricator.dev.yugabyte.com/D9215
@bmatican
Copy link
Contributor Author

@rahuldesirazu should we close this and link to the task about enabling this by default?

@bmatican bmatican added this to In progress in Master components Nov 13, 2020
@rahuldesirazu
Copy link
Contributor

Fixed in #4691.

YBase features automation moved this from In progress to Done Feb 24, 2021
Master components automation moved this from In progress to Done Feb 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features priority/high High Priority
Projects
Development

No branches or pull requests

4 participants