Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] Do not add WAL entries larger than the amount that can be successfully replicated to tablet peers. #13610

Closed
rthallamko3 opened this issue Aug 12, 2022 · 1 comment
Assignees
Labels
2.14 Backport Required area/docdb YugabyteDB core features kind/bug This issue is a bug priority/critical Critical issue

Comments

@rthallamko3
Copy link
Contributor

rthallamko3 commented Aug 12, 2022

Jira Link: DB-3183

Description

We have seen cases where in large WAL entries are persisted on the tablet leader but cannot be successfully replicated to followers, leading to tablets getting into unhealthy/unusable state. Example customer issue - https://yugabyte.zendesk.com/agent/tickets/3861

The fix in https://phabricator.dev.yugabyte.com/D16842 doesn't seem to handle the case where in a single WAL is larger than the limit.

@rthallamko3 rthallamko3 added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Aug 12, 2022
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Aug 12, 2022
@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Aug 12, 2022
samiahmedsiddiqui pushed a commit to samiahmedsiddiqui/yugabyte-db that referenced this issue Aug 19, 2022
…rpc message limit, fail the batch instead

Summary:
This diff is to address the cases where the followers could be in inconsistent state because a large WAL entry could be persisted on leader but cannot be replicated to followers because of exceeding the rpc message size limit. This makes tablets getting into unhealthy/unusable state.

The fix is we don't replicate write batch with size > 0.95 * max_rpc_message_size, instead, we reject it before persisting and replicating it. The client will receive the error and abort the transaction. The error looks like `Operation replicate msg size ($0) exceeds limit of leader side single op size ($1)`. Customer, who is hitting this error, can increase the gflag  max_rpc_message_size to be able to accept larger WAL entries.

In this diff, also introduced a new gflag `estimated_replicate_msg_size_percentage `(default as 0.95).  We currently reserve 5% as the overhead of other fields in LogEntryPB but it might be not enough in the future, with this new gflag, we can configure the ratio easily without a code change.

Test Plan: ./yb_build.sh --cxx-test tablet_peer-test --gtest_filter TabletPeerTest.SingleOpExceedsRpcMsgLimit

Reviewers: timur, amitanand, rthallam

Reviewed By: amitanand, rthallam

Subscribers: bogdan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D18950
Huqicheng added a commit that referenced this issue Sep 1, 2022
…xceeds rpc message limit, fail the batch instead

Summary:
Original commit: 985f09a / D18950
This diff is to address the cases where the followers could be in inconsistent state because a large WAL entry could be persisted on leader but cannot be replicated to followers because of exceeding the rpc message size limit. This makes tablets getting into unhealthy/unusable state.

The fix is we don't replicate write batch with size > 0.95 * max_rpc_message_size, instead, we reject it before persisting and replicating it. The client will receive the error and abort the transaction. The error looks like `Operation replicate msg size ($0) exceeds limit of leader side single op size ($1)`. Customer, who is hitting this error, can increase the gflag  max_rpc_message_size to be able to accept larger WAL entries.

In this diff, also introduced a new gflag `estimated_replicate_msg_size_percentage `(default as 0.95).  We currently reserve 5% as the overhead of other fields in LogEntryPB but it might be not enough in the future, with this new gflag, we can configure the ratio easily without a code change.

Test Plan: ./yb_build.sh --cxx-test tablet_peer-test --gtest_filter TabletPeerTest.SingleOpExceedsRpcMsgLimit

Reviewers: amitanand, rthallam, timur

Reviewed By: timur

Subscribers: ybase, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D19112
Huqicheng added a commit that referenced this issue Sep 7, 2022
…xceeds rpc message limit, fail the batch instead

Summary:
Original commit: 985f09a / D18950
This diff is to address the cases where the followers could be in inconsistent state because a large WAL entry could be persisted on leader but cannot be replicated to followers because of exceeding the rpc message size limit. This makes tablets getting into unhealthy/unusable state.

The fix is we don't replicate write batch with size > 0.95 * max_rpc_message_size, instead, we reject it before persisting and replicating it. The client will receive the error and abort the transaction. The error looks like `Operation replicate msg size ($0) exceeds limit of leader side single op size ($1)`. Customer, who is hitting this error, can increase the gflag  max_rpc_message_size to be able to accept larger WAL entries.

In this diff, also introduced a new gflag `estimated_replicate_msg_size_percentage `(default as 0.95).  We currently reserve 5% as the overhead of other fields in LogEntryPB but it might be not enough in the future, with this new gflag, we can configure the ratio easily without a code change.

Test Plan: ./yb_build.sh --cxx-test tablet_peer-test --gtest_filter TabletPeerTest.SingleOpExceedsRpcMsgLimit

Reviewers: amitanand, rthallam, timur

Reviewed By: timur

Subscribers: ybase, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D19113
Huqicheng added a commit that referenced this issue Sep 7, 2022
…ceeds rpc message limit, fail the batch instead

Summary:
Original commit: 985f09a / D18950
This diff is to address the cases where the followers could be in inconsistent state because a large WAL entry could be persisted on leader but cannot be replicated to followers because of exceeding the rpc message size limit. This makes tablets getting into unhealthy/unusable state.

The fix is we don't replicate write batch with size > 0.95 * max_rpc_message_size, instead, we reject it before persisting and replicating it. The client will receive the error and abort the transaction. The error looks like `Operation replicate msg size ($0) exceeds limit of leader side single op size ($1)`. Customer, who is hitting this error, can increase the gflag  max_rpc_message_size to be able to accept larger WAL entries.

In this diff, also introduced a new gflag `estimated_replicate_msg_size_percentage `(default as 0.95).  We currently reserve 5% as the overhead of other fields in LogEntryPB but it might be not enough in the future, with this new gflag, we can configure the ratio easily without a code change.

Test Plan: ./yb_build.sh --cxx-test tablet_peer-test --gtest_filter TabletPeerTest.SingleOpExceedsRpcMsgLimit

Reviewers: amitanand, rthallam, timur

Reviewed By: timur

Subscribers: bogdan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D19258
Huqicheng added a commit that referenced this issue Sep 7, 2022
…ceeds rpc message limit, fail the batch instead

Summary:
Original commit: 985f09a / D18950
This diff is to address the cases where the followers could be in inconsistent state because a large WAL entry could be persisted on leader but cannot be replicated to followers because of exceeding the rpc message size limit. This makes tablets getting into unhealthy/unusable state.

The fix is we don't replicate write batch with size > 0.95 * max_rpc_message_size, instead, we reject it before persisting and replicating it. The client will receive the error and abort the transaction. The error looks like `Operation replicate msg size ($0) exceeds limit of leader side single op size ($1)`. Customer, who is hitting this error, can increase the gflag  max_rpc_message_size to be able to accept larger WAL entries.

In this diff, also introduced a new gflag `estimated_replicate_msg_size_percentage `(default as 0.95).  We currently reserve 5% as the overhead of other fields in LogEntryPB but it might be not enough in the future, with this new gflag, we can configure the ratio easily without a code change.

Test Plan: ./yb_build.sh --cxx-test tablet_peer-test --gtest_filter TabletPeerTest.SingleOpExceedsRpcMsgLimit

Reviewers: amitanand, rthallam, timur

Reviewed By: timur

Subscribers: ybase, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D19116
@Huqicheng
Copy link
Contributor

Backport to 2.6, 2.8, 2.12, 2.14

@yugabyte-ci yugabyte-ci added priority/critical Critical issue and removed priority/medium Medium priority issue labels Feb 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.14 Backport Required area/docdb YugabyteDB core features kind/bug This issue is a bug priority/critical Critical issue
Projects
None yet
Development

No branches or pull requests

3 participants