[DocDB] Improve the leaderless tablet endpoint #17570

Huqicheng · 2023-05-26T14:59:09Z

Jira Link: DB-6717

Description

Issue #15746 has added the leader lease info to the tserver-master heartbeat tablet metrics.
Based on that, we can tell if the tablet leader has a valid leader lease. Also to reduce false positive, only treat
the leaders that have no valid lease for consecutive N heartbeats as leaderless.

But there's a gap between what is showing on the leaderless tablet endpoint and the actual leaderless tablets.
Assume that we have node 1,2,3 and tablet leader is on node1. From master's view, the tablet has a valid leader with valid lease, node1.
Then node1 and node2 crashed. Node1 cannot send any new heartbeats to master, so the master can't rely on the consecutive N heartbeats method to capture it as leaderless.

To capture such leaderless tablets, the potential solution is:
Since the tserver has leader hybrid time lease sent to the master, the master can check if the ht lease is too old.
There's a case that can have false positive: node 1,2,3 and node1 is the leader. node1 cannot talk to the master for some reason but it can replicate to followers. In this case, the lease on master could be too old but actually the raft group is healthy. So we can add some special mark (e.g. red '*') to a tablet leader that is lease too old to suggest that there is a possibility to be a false positive.

Warning: Please confirm that this issue does not contain any sensitive information

I confirm this issue does not contain any sensitive information.

The text was updated successfully, but these errors were encountered:

bmatican · 2023-05-26T15:01:27Z

Note, this endpoint should also have a JSON representation, so whatever marker we use, should have a new field in the JSON as well, to represent the same.
cc @druzac @rthallamko3

Summary: Issue #15746 has added the leader lease info to the tserver-master heartbeat tablet metrics. Based on that, we can tell if the tablet leader has a valid leader lease. Also to reduce false positive, only treat the leaders that have no valid lease for `consecutive N heartbeats` as leaderless. But there's a gap between what is showing on the leaderless tablet endpoint and the actual leaderless tablets. Assume that we have node 1,2,3 and tablet leader is on node1. From master's view, the tablet has a valid leader with valid lease, node1. Then node1 and node2 crashed. Node1 cannot send any new heartbeats to master, so the master can't rely on the `consecutive N heartbeats` method to capture it as leaderless. To capture such leaderless tablets, the solution is: Since the tserver has leader hybrid time lease sent to the master, the master can check if the ht lease is too old. There's a case that can have false positive: node 1,2,3 and node1 is the leader. node1 cannot talk to the master for some reason but it can replicate to followers. In this case, the lease on master could be too old but actually the raft group is healthy. Also, we cannot rely on leader lease for leader only mode since leader lease is not replicated. If the only leader is found but the last heartbeat from it is too old, also report it as leaderless because it's possibly crashed or network partitioned. To help better understand why a tablet is treated as leaderless, also add a new column (reason) to the leaderless tablet endpoint. Jira: DB-6717 Test Plan: MasterPathHandlersLeaderlessITest.TestLeaderlessTabletEndpoint Manual test: 1. Start an rf-3 universe, create tablet with 100 tablets. Stop 2 tservers and should see all tablets are captured by leaderless tablets endpoint after a while. Should see some tablets are reported as leaderless because consecutive N heartbeats doesn't have a lease and the others are reported as leaderless because leader lease is expired for a long time. 2. Start an rf-1 universe, create tablet with 100 tablets. Stop the tserver and should see all tablets are reported as leaderless after a while because the master doesn't receive the heartbeat from it for a long time. Reviewers: asrivastava, rahuldesirazu, zdrudi Reviewed By: zdrudi Subscribers: ybase, qhu, bogdan Differential Revision: https://phorge.dev.yugabyte.com/D26206

Summary: Original commit: b370219 / D26206 Issue #15746 has added the leader lease info to the tserver-master heartbeat tablet metrics. Based on that, we can tell if the tablet leader has a valid leader lease. Also to reduce false positive, only treat the leaders that have no valid lease for `consecutive N heartbeats` as leaderless. But there's a gap between what is showing on the leaderless tablet endpoint and the actual leaderless tablets. Assume that we have node 1,2,3 and tablet leader is on node1. From master's view, the tablet has a valid leader with valid lease, node1. Then node1 and node2 crashed. Node1 cannot send any new heartbeats to master, so the master can't rely on the `consecutive N heartbeats` method to capture it as leaderless. To capture such leaderless tablets, the solution is: Since the tserver has leader hybrid time lease sent to the master, the master can check if the ht lease is too old. There's a case that can have false positive: node 1,2,3 and node1 is the leader. node1 cannot talk to the master for some reason but it can replicate to followers. In this case, the lease on master could be too old but actually the raft group is healthy. Also, we cannot rely on leader lease for leader only mode since leader lease is not replicated. If the only leader is found but the last heartbeat from it is too old, also report it as leaderless because it's possibly crashed or network partitioned. To help better understand why a tablet is treated as leaderless, also add a new column (reason) to the leaderless tablet endpoint. Jira: DB-6717 Test Plan: MasterPathHandlersLeaderlessITest.TestLeaderlessTabletEndpoint Manual test: 1. Start an rf-3 universe, create tablet with 100 tablets. Stop 2 tservers and should see all tablets are captured by leaderless tablets endpoint after a while. Should see some tablets are reported as leaderless because consecutive N heartbeats doesn't have a lease and the others are reported as leaderless because leader lease is expired for a long time. 2. Start an rf-1 universe, create tablet with 100 tablets. Stop the tserver and should see all tablets are reported as leaderless after a while because the master doesn't receive the heartbeat from it for a long time. Reviewers: asrivastava, rahuldesirazu, zdrudi Reviewed By: asrivastava Subscribers: bogdan, qhu, ybase Differential Revision: https://phorge.dev.yugabyte.com/D27638

Summary: Original commit: b370219 / D26206 Issue #15746 has added the leader lease info to the tserver-master heartbeat tablet metrics. Based on that, we can tell if the tablet leader has a valid leader lease. Also to reduce false positive, only treat the leaders that have no valid lease for `consecutive N heartbeats` as leaderless. But there's a gap between what is showing on the leaderless tablet endpoint and the actual leaderless tablets. Assume that we have node 1,2,3 and tablet leader is on node1. From master's view, the tablet has a valid leader with valid lease, node1. Then node1 and node2 crashed. Node1 cannot send any new heartbeats to master, so the master can't rely on the `consecutive N heartbeats` method to capture it as leaderless. To capture such leaderless tablets, the solution is: Since the tserver has leader hybrid time lease sent to the master, the master can check if the ht lease is too old. There's a case that can have false positive: node 1,2,3 and node1 is the leader. node1 cannot talk to the master for some reason but it can replicate to followers. In this case, the lease on master could be too old but actually the raft group is healthy. Also, we cannot rely on leader lease for leader only mode since leader lease is not replicated. If the only leader is found but the last heartbeat from it is too old, also report it as leaderless because it's possibly crashed or network partitioned. To help better understand why a tablet is treated as leaderless, also add a new column (reason) to the leaderless tablet endpoint. Jira: DB-6717 Test Plan: MasterPathHandlersLeaderlessITest.TestLeaderlessTabletEndpoint Manual test: 1. Start an rf-3 universe, create tablet with 100 tablets. Stop 2 tservers and should see all tablets are captured by leaderless tablets endpoint after a while. Should see some tablets are reported as leaderless because consecutive N heartbeats doesn't have a lease and the others are reported as leaderless because leader lease is expired for a long time. 2. Start an rf-1 universe, create tablet with 100 tablets. Stop the tserver and should see all tablets are reported as leaderless after a while because the master doesn't receive the heartbeat from it for a long time. Reviewers: asrivastava, rahuldesirazu, zdrudi Reviewed By: asrivastava Subscribers: ybase, qhu, bogdan Differential Revision: https://phorge.dev.yugabyte.com/D27636

Summary: Original commit: b370219 / D26206 Issue #15746 has added the leader lease info to the tserver-master heartbeat tablet metrics. Based on that, we can tell if the tablet leader has a valid leader lease. Also to reduce false positive, only treat the leaders that have no valid lease for `consecutive N heartbeats` as leaderless. But there's a gap between what is showing on the leaderless tablet endpoint and the actual leaderless tablets. Assume that we have node 1,2,3 and tablet leader is on node1. From master's view, the tablet has a valid leader with valid lease, node1. Then node1 and node2 crashed. Node1 cannot send any new heartbeats to master, so the master can't rely on the `consecutive N heartbeats` method to capture it as leaderless. To capture such leaderless tablets, the solution is: Since the tserver has leader hybrid time lease sent to the master, the master can check if the ht lease is too old. There's a case that can have false positive: node 1,2,3 and node1 is the leader. node1 cannot talk to the master for some reason but it can replicate to followers. In this case, the lease on master could be too old but actually the raft group is healthy. Also, we cannot rely on leader lease for leader only mode since leader lease is not replicated. If the only leader is found but the last heartbeat from it is too old, also report it as leaderless because it's possibly crashed or network partitioned. To help better understand why a tablet is treated as leaderless, also add a new column (reason) to the leaderless tablet endpoint. Jira: DB-6717 Test Plan: MasterPathHandlersLeaderlessITest.TestLeaderlessTabletEndpoint Manual test: 1. Start an rf-3 universe, create tablet with 100 tablets. Stop 2 tservers and should see all tablets are captured by leaderless tablets endpoint after a while. Should see some tablets are reported as leaderless because consecutive N heartbeats doesn't have a lease and the others are reported as leaderless because leader lease is expired for a long time. 2. Start an rf-1 universe, create tablet with 100 tablets. Stop the tserver and should see all tablets are reported as leaderless after a while because the master doesn't receive the heartbeat from it for a long time. Reviewers: asrivastava, rahuldesirazu, zdrudi Reviewed By: asrivastava Subscribers: bogdan, qhu, ybase Differential Revision: https://phorge.dev.yugabyte.com/D27635

Huqicheng added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels May 26, 2023

Huqicheng self-assigned this May 26, 2023

yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels May 26, 2023

yugabyte-ci added kind/enhancement This is an enhancement of an existing feature and removed status/awaiting-triage Issue awaiting triage kind/bug This issue is a bug labels May 30, 2023

Huqicheng added 2.14 Backport Required 2.16 Backport Required 2.18 Backport Required labels Aug 8, 2023

Huqicheng closed this as completed Aug 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DocDB] Improve the leaderless tablet endpoint #17570

[DocDB] Improve the leaderless tablet endpoint #17570

Huqicheng commented May 26, 2023 •

edited by jira bot

bmatican commented May 26, 2023

[DocDB] Improve the leaderless tablet endpoint #17570

[DocDB] Improve the leaderless tablet endpoint #17570

Comments

Huqicheng commented May 26, 2023 • edited by jira bot

Description

Warning: Please confirm that this issue does not contain any sensitive information

bmatican commented May 26, 2023

Huqicheng commented May 26, 2023 •

edited by jira bot