[DocDB] Do not add WAL entries larger than the amount that can be successfully replicated to tablet peers. #13610

rthallamko3 · 2022-08-12T20:14:15Z

Jira Link: DB-3183

Description

We have seen cases where in large WAL entries are persisted on the tablet leader but cannot be successfully replicated to followers, leading to tablets getting into unhealthy/unusable state. Example customer issue - https://yugabyte.zendesk.com/agent/tickets/3861

The fix in https://phabricator.dev.yugabyte.com/D16842 doesn't seem to handle the case where in a single WAL is larger than the limit.

…rpc message limit, fail the batch instead Summary: This diff is to address the cases where the followers could be in inconsistent state because a large WAL entry could be persisted on leader but cannot be replicated to followers because of exceeding the rpc message size limit. This makes tablets getting into unhealthy/unusable state. The fix is we don't replicate write batch with size > 0.95 * max_rpc_message_size, instead, we reject it before persisting and replicating it. The client will receive the error and abort the transaction. The error looks like `Operation replicate msg size ($0) exceeds limit of leader side single op size ($1)`. Customer, who is hitting this error, can increase the gflag max_rpc_message_size to be able to accept larger WAL entries. In this diff, also introduced a new gflag `estimated_replicate_msg_size_percentage `(default as 0.95). We currently reserve 5% as the overhead of other fields in LogEntryPB but it might be not enough in the future, with this new gflag, we can configure the ratio easily without a code change. Test Plan: ./yb_build.sh --cxx-test tablet_peer-test --gtest_filter TabletPeerTest.SingleOpExceedsRpcMsgLimit Reviewers: timur, amitanand, rthallam Reviewed By: amitanand, rthallam Subscribers: bogdan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D18950

…xceeds rpc message limit, fail the batch instead Summary: Original commit: 985f09a / D18950 This diff is to address the cases where the followers could be in inconsistent state because a large WAL entry could be persisted on leader but cannot be replicated to followers because of exceeding the rpc message size limit. This makes tablets getting into unhealthy/unusable state. The fix is we don't replicate write batch with size > 0.95 * max_rpc_message_size, instead, we reject it before persisting and replicating it. The client will receive the error and abort the transaction. The error looks like `Operation replicate msg size ($0) exceeds limit of leader side single op size ($1)`. Customer, who is hitting this error, can increase the gflag max_rpc_message_size to be able to accept larger WAL entries. In this diff, also introduced a new gflag `estimated_replicate_msg_size_percentage `(default as 0.95). We currently reserve 5% as the overhead of other fields in LogEntryPB but it might be not enough in the future, with this new gflag, we can configure the ratio easily without a code change. Test Plan: ./yb_build.sh --cxx-test tablet_peer-test --gtest_filter TabletPeerTest.SingleOpExceedsRpcMsgLimit Reviewers: amitanand, rthallam, timur Reviewed By: timur Subscribers: ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D19112

…xceeds rpc message limit, fail the batch instead Summary: Original commit: 985f09a / D18950 This diff is to address the cases where the followers could be in inconsistent state because a large WAL entry could be persisted on leader but cannot be replicated to followers because of exceeding the rpc message size limit. This makes tablets getting into unhealthy/unusable state. The fix is we don't replicate write batch with size > 0.95 * max_rpc_message_size, instead, we reject it before persisting and replicating it. The client will receive the error and abort the transaction. The error looks like `Operation replicate msg size ($0) exceeds limit of leader side single op size ($1)`. Customer, who is hitting this error, can increase the gflag max_rpc_message_size to be able to accept larger WAL entries. In this diff, also introduced a new gflag `estimated_replicate_msg_size_percentage `(default as 0.95). We currently reserve 5% as the overhead of other fields in LogEntryPB but it might be not enough in the future, with this new gflag, we can configure the ratio easily without a code change. Test Plan: ./yb_build.sh --cxx-test tablet_peer-test --gtest_filter TabletPeerTest.SingleOpExceedsRpcMsgLimit Reviewers: amitanand, rthallam, timur Reviewed By: timur Subscribers: ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D19113

…ceeds rpc message limit, fail the batch instead Summary: Original commit: 985f09a / D18950 This diff is to address the cases where the followers could be in inconsistent state because a large WAL entry could be persisted on leader but cannot be replicated to followers because of exceeding the rpc message size limit. This makes tablets getting into unhealthy/unusable state. The fix is we don't replicate write batch with size > 0.95 * max_rpc_message_size, instead, we reject it before persisting and replicating it. The client will receive the error and abort the transaction. The error looks like `Operation replicate msg size ($0) exceeds limit of leader side single op size ($1)`. Customer, who is hitting this error, can increase the gflag max_rpc_message_size to be able to accept larger WAL entries. In this diff, also introduced a new gflag `estimated_replicate_msg_size_percentage `(default as 0.95). We currently reserve 5% as the overhead of other fields in LogEntryPB but it might be not enough in the future, with this new gflag, we can configure the ratio easily without a code change. Test Plan: ./yb_build.sh --cxx-test tablet_peer-test --gtest_filter TabletPeerTest.SingleOpExceedsRpcMsgLimit Reviewers: amitanand, rthallam, timur Reviewed By: timur Subscribers: bogdan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D19258

…ceeds rpc message limit, fail the batch instead Summary: Original commit: 985f09a / D18950 This diff is to address the cases where the followers could be in inconsistent state because a large WAL entry could be persisted on leader but cannot be replicated to followers because of exceeding the rpc message size limit. This makes tablets getting into unhealthy/unusable state. The fix is we don't replicate write batch with size > 0.95 * max_rpc_message_size, instead, we reject it before persisting and replicating it. The client will receive the error and abort the transaction. The error looks like `Operation replicate msg size ($0) exceeds limit of leader side single op size ($1)`. Customer, who is hitting this error, can increase the gflag max_rpc_message_size to be able to accept larger WAL entries. In this diff, also introduced a new gflag `estimated_replicate_msg_size_percentage `(default as 0.95). We currently reserve 5% as the overhead of other fields in LogEntryPB but it might be not enough in the future, with this new gflag, we can configure the ratio easily without a code change. Test Plan: ./yb_build.sh --cxx-test tablet_peer-test --gtest_filter TabletPeerTest.SingleOpExceedsRpcMsgLimit Reviewers: amitanand, rthallam, timur Reviewed By: timur Subscribers: ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D19116

Huqicheng · 2022-09-07T19:00:51Z

Backport to 2.6, 2.8, 2.12, 2.14

rthallamko3 added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Aug 12, 2022

yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Aug 12, 2022

rthallamko3 assigned Huqicheng Aug 12, 2022

yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Aug 12, 2022

rthallamko3 added 2.8 Backport Required 2.14 Backport Required labels Aug 31, 2022

Huqicheng closed this as completed Sep 7, 2022

yugabyte-ci added priority/critical Critical issue and removed priority/medium Medium priority issue labels Feb 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DocDB] Do not add WAL entries larger than the amount that can be successfully replicated to tablet peers. #13610

[DocDB] Do not add WAL entries larger than the amount that can be successfully replicated to tablet peers. #13610

rthallamko3 commented Aug 12, 2022 •

edited

Loading

Huqicheng commented Sep 7, 2022

[DocDB] Do not add WAL entries larger than the amount that can be successfully replicated to tablet peers. #13610

[DocDB] Do not add WAL entries larger than the amount that can be successfully replicated to tablet peers. #13610

Comments

rthallamko3 commented Aug 12, 2022 • edited Loading

Description

Huqicheng commented Sep 7, 2022

rthallamko3 commented Aug 12, 2022 •

edited

Loading