Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] Improve the leaderless tablet endpoint #17570

Closed
1 task done
Huqicheng opened this issue May 26, 2023 · 1 comment
Closed
1 task done

[DocDB] Improve the leaderless tablet endpoint #17570

Huqicheng opened this issue May 26, 2023 · 1 comment
Assignees
Labels
2.14 Backport Required 2.16 Backport Required 2.18 Backport Required area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue

Comments

@Huqicheng
Copy link
Contributor

Huqicheng commented May 26, 2023

Jira Link: DB-6717

Description

Issue #15746 has added the leader lease info to the tserver-master heartbeat tablet metrics.
Based on that, we can tell if the tablet leader has a valid leader lease. Also to reduce false positive, only treat
the leaders that have no valid lease for consecutive N heartbeats as leaderless.

But there's a gap between what is showing on the leaderless tablet endpoint and the actual leaderless tablets.
Assume that we have node 1,2,3 and tablet leader is on node1. From master's view, the tablet has a valid leader with valid lease, node1.
Then node1 and node2 crashed. Node1 cannot send any new heartbeats to master, so the master can't rely on the consecutive N heartbeats method to capture it as leaderless.

To capture such leaderless tablets, the potential solution is:
Since the tserver has leader hybrid time lease sent to the master, the master can check if the ht lease is too old.
There's a case that can have false positive: node 1,2,3 and node1 is the leader. node1 cannot talk to the master for some reason but it can replicate to followers. In this case, the lease on master could be too old but actually the raft group is healthy. So we can add some special mark (e.g. red '*') to a tablet leader that is lease too old to suggest that there is a possibility to be a false positive.

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@Huqicheng Huqicheng added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels May 26, 2023
@Huqicheng Huqicheng self-assigned this May 26, 2023
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels May 26, 2023
@bmatican
Copy link
Contributor

Note, this endpoint should also have a JSON representation, so whatever marker we use, should have a new field in the JSON as well, to represent the same.
cc @druzac @rthallamko3

@yugabyte-ci yugabyte-ci added kind/enhancement This is an enhancement of an existing feature and removed status/awaiting-triage Issue awaiting triage kind/bug This issue is a bug labels May 30, 2023
Huqicheng added a commit that referenced this issue Aug 1, 2023
Summary:
Issue #15746 has added the leader lease info to the tserver-master heartbeat tablet metrics.
Based on that, we can tell if the tablet leader has a valid leader lease. Also to reduce false positive, only treat
the leaders that have no valid lease for `consecutive N heartbeats` as leaderless.

But there's a gap between what is showing on the leaderless tablet endpoint and the actual leaderless tablets.
Assume that we have node 1,2,3 and tablet leader is on node1. From master's view, the tablet has a valid leader with valid lease, node1.
Then node1 and node2 crashed. Node1 cannot send any new heartbeats to master, so the master can't rely on the `consecutive N heartbeats` method to capture it as leaderless.

To capture such leaderless tablets, the solution is:
Since the tserver has leader hybrid time lease sent to the master, the master can check if the ht lease is too old.
There's a case that can have false positive: node 1,2,3 and node1 is the leader. node1 cannot talk to the master for some reason but it can replicate to followers. In this case, the lease on master could be too old but actually the raft group is healthy.

Also, we cannot rely on leader lease for leader only mode since leader lease is not replicated.
If the only leader is found but the last heartbeat from it is too old, also report it as leaderless because it's possibly crashed or network partitioned.

To help better understand why a tablet is treated as leaderless, also add a new column (reason) to the leaderless tablet endpoint.
Jira: DB-6717

Test Plan:
MasterPathHandlersLeaderlessITest.TestLeaderlessTabletEndpoint

Manual test:

1. Start an rf-3 universe, create tablet with 100 tablets. Stop 2 tservers and should see all tablets are captured by leaderless tablets endpoint after a while. Should see some tablets are reported as leaderless because consecutive N heartbeats doesn't have a lease and the others are reported as leaderless because leader lease is expired for a long time.
2. Start an rf-1 universe, create tablet with 100 tablets. Stop the tserver and should see all tablets are reported as leaderless after a while because the master doesn't receive the heartbeat from it for a long time.

Reviewers: asrivastava, rahuldesirazu, zdrudi

Reviewed By: zdrudi

Subscribers: ybase, qhu, bogdan

Differential Revision: https://phorge.dev.yugabyte.com/D26206
Huqicheng added a commit that referenced this issue Aug 16, 2023
Summary:
Original commit: b370219 / D26206
Issue #15746 has added the leader lease info to the tserver-master heartbeat tablet metrics.
Based on that, we can tell if the tablet leader has a valid leader lease. Also to reduce false positive, only treat
the leaders that have no valid lease for `consecutive N heartbeats` as leaderless.

But there's a gap between what is showing on the leaderless tablet endpoint and the actual leaderless tablets.
Assume that we have node 1,2,3 and tablet leader is on node1. From master's view, the tablet has a valid leader with valid lease, node1.
Then node1 and node2 crashed. Node1 cannot send any new heartbeats to master, so the master can't rely on the `consecutive N heartbeats` method to capture it as leaderless.

To capture such leaderless tablets, the solution is:
Since the tserver has leader hybrid time lease sent to the master, the master can check if the ht lease is too old.
There's a case that can have false positive: node 1,2,3 and node1 is the leader. node1 cannot talk to the master for some reason but it can replicate to followers. In this case, the lease on master could be too old but actually the raft group is healthy.

Also, we cannot rely on leader lease for leader only mode since leader lease is not replicated.
If the only leader is found but the last heartbeat from it is too old, also report it as leaderless because it's possibly crashed or network partitioned.

To help better understand why a tablet is treated as leaderless, also add a new column (reason) to the leaderless tablet endpoint.
Jira: DB-6717

Test Plan:
MasterPathHandlersLeaderlessITest.TestLeaderlessTabletEndpoint

Manual test:

1. Start an rf-3 universe, create tablet with 100 tablets. Stop 2 tservers and should see all tablets are captured by leaderless tablets endpoint after a while. Should see some tablets are reported as leaderless because consecutive N heartbeats doesn't have a lease and the others are reported as leaderless because leader lease is expired for a long time.
2. Start an rf-1 universe, create tablet with 100 tablets. Stop the tserver and should see all tablets are reported as leaderless after a while because the master doesn't receive the heartbeat from it for a long time.

Reviewers: asrivastava, rahuldesirazu, zdrudi

Reviewed By: asrivastava

Subscribers: bogdan, qhu, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D27638
Huqicheng added a commit that referenced this issue Aug 16, 2023
Summary:
Original commit: b370219 / D26206
Issue #15746 has added the leader lease info to the tserver-master heartbeat tablet metrics.
Based on that, we can tell if the tablet leader has a valid leader lease. Also to reduce false positive, only treat
the leaders that have no valid lease for `consecutive N heartbeats` as leaderless.

But there's a gap between what is showing on the leaderless tablet endpoint and the actual leaderless tablets.
Assume that we have node 1,2,3 and tablet leader is on node1. From master's view, the tablet has a valid leader with valid lease, node1.
Then node1 and node2 crashed. Node1 cannot send any new heartbeats to master, so the master can't rely on the `consecutive N heartbeats` method to capture it as leaderless.

To capture such leaderless tablets, the solution is:
Since the tserver has leader hybrid time lease sent to the master, the master can check if the ht lease is too old.
There's a case that can have false positive: node 1,2,3 and node1 is the leader. node1 cannot talk to the master for some reason but it can replicate to followers. In this case, the lease on master could be too old but actually the raft group is healthy.

Also, we cannot rely on leader lease for leader only mode since leader lease is not replicated.
If the only leader is found but the last heartbeat from it is too old, also report it as leaderless because it's possibly crashed or network partitioned.

To help better understand why a tablet is treated as leaderless, also add a new column (reason) to the leaderless tablet endpoint.
Jira: DB-6717

Test Plan:
MasterPathHandlersLeaderlessITest.TestLeaderlessTabletEndpoint

Manual test:

1. Start an rf-3 universe, create tablet with 100 tablets. Stop 2 tservers and should see all tablets are captured by leaderless tablets endpoint after a while. Should see some tablets are reported as leaderless because consecutive N heartbeats doesn't have a lease and the others are reported as leaderless because leader lease is expired for a long time.
2. Start an rf-1 universe, create tablet with 100 tablets. Stop the tserver and should see all tablets are reported as leaderless after a while because the master doesn't receive the heartbeat from it for a long time.

Reviewers: asrivastava, rahuldesirazu, zdrudi

Reviewed By: asrivastava

Subscribers: ybase, qhu, bogdan

Differential Revision: https://phorge.dev.yugabyte.com/D27636
Huqicheng added a commit that referenced this issue Aug 16, 2023
Summary:
Original commit: b370219 / D26206
Issue #15746 has added the leader lease info to the tserver-master heartbeat tablet metrics.
Based on that, we can tell if the tablet leader has a valid leader lease. Also to reduce false positive, only treat
the leaders that have no valid lease for `consecutive N heartbeats` as leaderless.

But there's a gap between what is showing on the leaderless tablet endpoint and the actual leaderless tablets.
Assume that we have node 1,2,3 and tablet leader is on node1. From master's view, the tablet has a valid leader with valid lease, node1.
Then node1 and node2 crashed. Node1 cannot send any new heartbeats to master, so the master can't rely on the `consecutive N heartbeats` method to capture it as leaderless.

To capture such leaderless tablets, the solution is:
Since the tserver has leader hybrid time lease sent to the master, the master can check if the ht lease is too old.
There's a case that can have false positive: node 1,2,3 and node1 is the leader. node1 cannot talk to the master for some reason but it can replicate to followers. In this case, the lease on master could be too old but actually the raft group is healthy.

Also, we cannot rely on leader lease for leader only mode since leader lease is not replicated.
If the only leader is found but the last heartbeat from it is too old, also report it as leaderless because it's possibly crashed or network partitioned.

To help better understand why a tablet is treated as leaderless, also add a new column (reason) to the leaderless tablet endpoint.
Jira: DB-6717

Test Plan:
MasterPathHandlersLeaderlessITest.TestLeaderlessTabletEndpoint

Manual test:

1. Start an rf-3 universe, create tablet with 100 tablets. Stop 2 tservers and should see all tablets are captured by leaderless tablets endpoint after a while. Should see some tablets are reported as leaderless because consecutive N heartbeats doesn't have a lease and the others are reported as leaderless because leader lease is expired for a long time.
2. Start an rf-1 universe, create tablet with 100 tablets. Stop the tserver and should see all tablets are reported as leaderless after a while because the master doesn't receive the heartbeat from it for a long time.

Reviewers: asrivastava, rahuldesirazu, zdrudi

Reviewed By: asrivastava

Subscribers: bogdan, qhu, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D27635
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.14 Backport Required 2.16 Backport Required 2.18 Backport Required area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

3 participants