New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB] Improve the leaderless tablet endpoint #17570
Labels
2.14 Backport Required
2.16 Backport Required
2.18 Backport Required
area/docdb
YugabyteDB core features
kind/enhancement
This is an enhancement of an existing feature
priority/medium
Medium priority issue
Comments
Huqicheng
added
area/docdb
YugabyteDB core features
status/awaiting-triage
Issue awaiting triage
labels
May 26, 2023
yugabyte-ci
added
kind/bug
This issue is a bug
priority/medium
Medium priority issue
labels
May 26, 2023
Note, this endpoint should also have a JSON representation, so whatever marker we use, should have a new field in the JSON as well, to represent the same. |
yugabyte-ci
added
kind/enhancement
This is an enhancement of an existing feature
and removed
status/awaiting-triage
Issue awaiting triage
kind/bug
This issue is a bug
labels
May 30, 2023
Huqicheng
added a commit
that referenced
this issue
Aug 1, 2023
Summary: Issue #15746 has added the leader lease info to the tserver-master heartbeat tablet metrics. Based on that, we can tell if the tablet leader has a valid leader lease. Also to reduce false positive, only treat the leaders that have no valid lease for `consecutive N heartbeats` as leaderless. But there's a gap between what is showing on the leaderless tablet endpoint and the actual leaderless tablets. Assume that we have node 1,2,3 and tablet leader is on node1. From master's view, the tablet has a valid leader with valid lease, node1. Then node1 and node2 crashed. Node1 cannot send any new heartbeats to master, so the master can't rely on the `consecutive N heartbeats` method to capture it as leaderless. To capture such leaderless tablets, the solution is: Since the tserver has leader hybrid time lease sent to the master, the master can check if the ht lease is too old. There's a case that can have false positive: node 1,2,3 and node1 is the leader. node1 cannot talk to the master for some reason but it can replicate to followers. In this case, the lease on master could be too old but actually the raft group is healthy. Also, we cannot rely on leader lease for leader only mode since leader lease is not replicated. If the only leader is found but the last heartbeat from it is too old, also report it as leaderless because it's possibly crashed or network partitioned. To help better understand why a tablet is treated as leaderless, also add a new column (reason) to the leaderless tablet endpoint. Jira: DB-6717 Test Plan: MasterPathHandlersLeaderlessITest.TestLeaderlessTabletEndpoint Manual test: 1. Start an rf-3 universe, create tablet with 100 tablets. Stop 2 tservers and should see all tablets are captured by leaderless tablets endpoint after a while. Should see some tablets are reported as leaderless because consecutive N heartbeats doesn't have a lease and the others are reported as leaderless because leader lease is expired for a long time. 2. Start an rf-1 universe, create tablet with 100 tablets. Stop the tserver and should see all tablets are reported as leaderless after a while because the master doesn't receive the heartbeat from it for a long time. Reviewers: asrivastava, rahuldesirazu, zdrudi Reviewed By: zdrudi Subscribers: ybase, qhu, bogdan Differential Revision: https://phorge.dev.yugabyte.com/D26206
Huqicheng
added a commit
that referenced
this issue
Aug 16, 2023
Summary: Original commit: b370219 / D26206 Issue #15746 has added the leader lease info to the tserver-master heartbeat tablet metrics. Based on that, we can tell if the tablet leader has a valid leader lease. Also to reduce false positive, only treat the leaders that have no valid lease for `consecutive N heartbeats` as leaderless. But there's a gap between what is showing on the leaderless tablet endpoint and the actual leaderless tablets. Assume that we have node 1,2,3 and tablet leader is on node1. From master's view, the tablet has a valid leader with valid lease, node1. Then node1 and node2 crashed. Node1 cannot send any new heartbeats to master, so the master can't rely on the `consecutive N heartbeats` method to capture it as leaderless. To capture such leaderless tablets, the solution is: Since the tserver has leader hybrid time lease sent to the master, the master can check if the ht lease is too old. There's a case that can have false positive: node 1,2,3 and node1 is the leader. node1 cannot talk to the master for some reason but it can replicate to followers. In this case, the lease on master could be too old but actually the raft group is healthy. Also, we cannot rely on leader lease for leader only mode since leader lease is not replicated. If the only leader is found but the last heartbeat from it is too old, also report it as leaderless because it's possibly crashed or network partitioned. To help better understand why a tablet is treated as leaderless, also add a new column (reason) to the leaderless tablet endpoint. Jira: DB-6717 Test Plan: MasterPathHandlersLeaderlessITest.TestLeaderlessTabletEndpoint Manual test: 1. Start an rf-3 universe, create tablet with 100 tablets. Stop 2 tservers and should see all tablets are captured by leaderless tablets endpoint after a while. Should see some tablets are reported as leaderless because consecutive N heartbeats doesn't have a lease and the others are reported as leaderless because leader lease is expired for a long time. 2. Start an rf-1 universe, create tablet with 100 tablets. Stop the tserver and should see all tablets are reported as leaderless after a while because the master doesn't receive the heartbeat from it for a long time. Reviewers: asrivastava, rahuldesirazu, zdrudi Reviewed By: asrivastava Subscribers: bogdan, qhu, ybase Differential Revision: https://phorge.dev.yugabyte.com/D27638
Huqicheng
added a commit
that referenced
this issue
Aug 16, 2023
Summary: Original commit: b370219 / D26206 Issue #15746 has added the leader lease info to the tserver-master heartbeat tablet metrics. Based on that, we can tell if the tablet leader has a valid leader lease. Also to reduce false positive, only treat the leaders that have no valid lease for `consecutive N heartbeats` as leaderless. But there's a gap between what is showing on the leaderless tablet endpoint and the actual leaderless tablets. Assume that we have node 1,2,3 and tablet leader is on node1. From master's view, the tablet has a valid leader with valid lease, node1. Then node1 and node2 crashed. Node1 cannot send any new heartbeats to master, so the master can't rely on the `consecutive N heartbeats` method to capture it as leaderless. To capture such leaderless tablets, the solution is: Since the tserver has leader hybrid time lease sent to the master, the master can check if the ht lease is too old. There's a case that can have false positive: node 1,2,3 and node1 is the leader. node1 cannot talk to the master for some reason but it can replicate to followers. In this case, the lease on master could be too old but actually the raft group is healthy. Also, we cannot rely on leader lease for leader only mode since leader lease is not replicated. If the only leader is found but the last heartbeat from it is too old, also report it as leaderless because it's possibly crashed or network partitioned. To help better understand why a tablet is treated as leaderless, also add a new column (reason) to the leaderless tablet endpoint. Jira: DB-6717 Test Plan: MasterPathHandlersLeaderlessITest.TestLeaderlessTabletEndpoint Manual test: 1. Start an rf-3 universe, create tablet with 100 tablets. Stop 2 tservers and should see all tablets are captured by leaderless tablets endpoint after a while. Should see some tablets are reported as leaderless because consecutive N heartbeats doesn't have a lease and the others are reported as leaderless because leader lease is expired for a long time. 2. Start an rf-1 universe, create tablet with 100 tablets. Stop the tserver and should see all tablets are reported as leaderless after a while because the master doesn't receive the heartbeat from it for a long time. Reviewers: asrivastava, rahuldesirazu, zdrudi Reviewed By: asrivastava Subscribers: ybase, qhu, bogdan Differential Revision: https://phorge.dev.yugabyte.com/D27636
Huqicheng
added a commit
that referenced
this issue
Aug 16, 2023
Summary: Original commit: b370219 / D26206 Issue #15746 has added the leader lease info to the tserver-master heartbeat tablet metrics. Based on that, we can tell if the tablet leader has a valid leader lease. Also to reduce false positive, only treat the leaders that have no valid lease for `consecutive N heartbeats` as leaderless. But there's a gap between what is showing on the leaderless tablet endpoint and the actual leaderless tablets. Assume that we have node 1,2,3 and tablet leader is on node1. From master's view, the tablet has a valid leader with valid lease, node1. Then node1 and node2 crashed. Node1 cannot send any new heartbeats to master, so the master can't rely on the `consecutive N heartbeats` method to capture it as leaderless. To capture such leaderless tablets, the solution is: Since the tserver has leader hybrid time lease sent to the master, the master can check if the ht lease is too old. There's a case that can have false positive: node 1,2,3 and node1 is the leader. node1 cannot talk to the master for some reason but it can replicate to followers. In this case, the lease on master could be too old but actually the raft group is healthy. Also, we cannot rely on leader lease for leader only mode since leader lease is not replicated. If the only leader is found but the last heartbeat from it is too old, also report it as leaderless because it's possibly crashed or network partitioned. To help better understand why a tablet is treated as leaderless, also add a new column (reason) to the leaderless tablet endpoint. Jira: DB-6717 Test Plan: MasterPathHandlersLeaderlessITest.TestLeaderlessTabletEndpoint Manual test: 1. Start an rf-3 universe, create tablet with 100 tablets. Stop 2 tservers and should see all tablets are captured by leaderless tablets endpoint after a while. Should see some tablets are reported as leaderless because consecutive N heartbeats doesn't have a lease and the others are reported as leaderless because leader lease is expired for a long time. 2. Start an rf-1 universe, create tablet with 100 tablets. Stop the tserver and should see all tablets are reported as leaderless after a while because the master doesn't receive the heartbeat from it for a long time. Reviewers: asrivastava, rahuldesirazu, zdrudi Reviewed By: asrivastava Subscribers: bogdan, qhu, ybase Differential Revision: https://phorge.dev.yugabyte.com/D27635
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
2.14 Backport Required
2.16 Backport Required
2.18 Backport Required
area/docdb
YugabyteDB core features
kind/enhancement
This is an enhancement of an existing feature
priority/medium
Medium priority issue
Jira Link: DB-6717
Description
Issue #15746 has added the leader lease info to the tserver-master heartbeat tablet metrics.
Based on that, we can tell if the tablet leader has a valid leader lease. Also to reduce false positive, only treat
the leaders that have no valid lease for
consecutive N heartbeats
as leaderless.But there's a gap between what is showing on the leaderless tablet endpoint and the actual leaderless tablets.
Assume that we have node 1,2,3 and tablet leader is on node1. From master's view, the tablet has a valid leader with valid lease, node1.
Then node1 and node2 crashed. Node1 cannot send any new heartbeats to master, so the master can't rely on the
consecutive N heartbeats
method to capture it as leaderless.To capture such leaderless tablets, the potential solution is:
Since the tserver has leader hybrid time lease sent to the master, the master can check if the ht lease is too old.
There's a case that can have false positive: node 1,2,3 and node1 is the leader. node1 cannot talk to the master for some reason but it can replicate to followers. In this case, the lease on master could be too old but actually the raft group is healthy. So we can add some special mark (e.g. red '*') to a tablet leader that is lease too old to suggest that there is a possibility to be a false positive.
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: