-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB] Do not add WAL entries larger than the amount that can be successfully replicated to tablet peers. #13610
Labels
2.14 Backport Required
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
priority/critical
Critical issue
Comments
samiahmedsiddiqui
pushed a commit
to samiahmedsiddiqui/yugabyte-db
that referenced
this issue
Aug 19, 2022
…rpc message limit, fail the batch instead Summary: This diff is to address the cases where the followers could be in inconsistent state because a large WAL entry could be persisted on leader but cannot be replicated to followers because of exceeding the rpc message size limit. This makes tablets getting into unhealthy/unusable state. The fix is we don't replicate write batch with size > 0.95 * max_rpc_message_size, instead, we reject it before persisting and replicating it. The client will receive the error and abort the transaction. The error looks like `Operation replicate msg size ($0) exceeds limit of leader side single op size ($1)`. Customer, who is hitting this error, can increase the gflag max_rpc_message_size to be able to accept larger WAL entries. In this diff, also introduced a new gflag `estimated_replicate_msg_size_percentage `(default as 0.95). We currently reserve 5% as the overhead of other fields in LogEntryPB but it might be not enough in the future, with this new gflag, we can configure the ratio easily without a code change. Test Plan: ./yb_build.sh --cxx-test tablet_peer-test --gtest_filter TabletPeerTest.SingleOpExceedsRpcMsgLimit Reviewers: timur, amitanand, rthallam Reviewed By: amitanand, rthallam Subscribers: bogdan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D18950
Huqicheng
added a commit
that referenced
this issue
Sep 1, 2022
…xceeds rpc message limit, fail the batch instead Summary: Original commit: 985f09a / D18950 This diff is to address the cases where the followers could be in inconsistent state because a large WAL entry could be persisted on leader but cannot be replicated to followers because of exceeding the rpc message size limit. This makes tablets getting into unhealthy/unusable state. The fix is we don't replicate write batch with size > 0.95 * max_rpc_message_size, instead, we reject it before persisting and replicating it. The client will receive the error and abort the transaction. The error looks like `Operation replicate msg size ($0) exceeds limit of leader side single op size ($1)`. Customer, who is hitting this error, can increase the gflag max_rpc_message_size to be able to accept larger WAL entries. In this diff, also introduced a new gflag `estimated_replicate_msg_size_percentage `(default as 0.95). We currently reserve 5% as the overhead of other fields in LogEntryPB but it might be not enough in the future, with this new gflag, we can configure the ratio easily without a code change. Test Plan: ./yb_build.sh --cxx-test tablet_peer-test --gtest_filter TabletPeerTest.SingleOpExceedsRpcMsgLimit Reviewers: amitanand, rthallam, timur Reviewed By: timur Subscribers: ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D19112
Huqicheng
added a commit
that referenced
this issue
Sep 7, 2022
…xceeds rpc message limit, fail the batch instead Summary: Original commit: 985f09a / D18950 This diff is to address the cases where the followers could be in inconsistent state because a large WAL entry could be persisted on leader but cannot be replicated to followers because of exceeding the rpc message size limit. This makes tablets getting into unhealthy/unusable state. The fix is we don't replicate write batch with size > 0.95 * max_rpc_message_size, instead, we reject it before persisting and replicating it. The client will receive the error and abort the transaction. The error looks like `Operation replicate msg size ($0) exceeds limit of leader side single op size ($1)`. Customer, who is hitting this error, can increase the gflag max_rpc_message_size to be able to accept larger WAL entries. In this diff, also introduced a new gflag `estimated_replicate_msg_size_percentage `(default as 0.95). We currently reserve 5% as the overhead of other fields in LogEntryPB but it might be not enough in the future, with this new gflag, we can configure the ratio easily without a code change. Test Plan: ./yb_build.sh --cxx-test tablet_peer-test --gtest_filter TabletPeerTest.SingleOpExceedsRpcMsgLimit Reviewers: amitanand, rthallam, timur Reviewed By: timur Subscribers: ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D19113
Huqicheng
added a commit
that referenced
this issue
Sep 7, 2022
…ceeds rpc message limit, fail the batch instead Summary: Original commit: 985f09a / D18950 This diff is to address the cases where the followers could be in inconsistent state because a large WAL entry could be persisted on leader but cannot be replicated to followers because of exceeding the rpc message size limit. This makes tablets getting into unhealthy/unusable state. The fix is we don't replicate write batch with size > 0.95 * max_rpc_message_size, instead, we reject it before persisting and replicating it. The client will receive the error and abort the transaction. The error looks like `Operation replicate msg size ($0) exceeds limit of leader side single op size ($1)`. Customer, who is hitting this error, can increase the gflag max_rpc_message_size to be able to accept larger WAL entries. In this diff, also introduced a new gflag `estimated_replicate_msg_size_percentage `(default as 0.95). We currently reserve 5% as the overhead of other fields in LogEntryPB but it might be not enough in the future, with this new gflag, we can configure the ratio easily without a code change. Test Plan: ./yb_build.sh --cxx-test tablet_peer-test --gtest_filter TabletPeerTest.SingleOpExceedsRpcMsgLimit Reviewers: amitanand, rthallam, timur Reviewed By: timur Subscribers: bogdan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D19258
Huqicheng
added a commit
that referenced
this issue
Sep 7, 2022
…ceeds rpc message limit, fail the batch instead Summary: Original commit: 985f09a / D18950 This diff is to address the cases where the followers could be in inconsistent state because a large WAL entry could be persisted on leader but cannot be replicated to followers because of exceeding the rpc message size limit. This makes tablets getting into unhealthy/unusable state. The fix is we don't replicate write batch with size > 0.95 * max_rpc_message_size, instead, we reject it before persisting and replicating it. The client will receive the error and abort the transaction. The error looks like `Operation replicate msg size ($0) exceeds limit of leader side single op size ($1)`. Customer, who is hitting this error, can increase the gflag max_rpc_message_size to be able to accept larger WAL entries. In this diff, also introduced a new gflag `estimated_replicate_msg_size_percentage `(default as 0.95). We currently reserve 5% as the overhead of other fields in LogEntryPB but it might be not enough in the future, with this new gflag, we can configure the ratio easily without a code change. Test Plan: ./yb_build.sh --cxx-test tablet_peer-test --gtest_filter TabletPeerTest.SingleOpExceedsRpcMsgLimit Reviewers: amitanand, rthallam, timur Reviewed By: timur Subscribers: ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D19116
Backport to 2.6, 2.8, 2.12, 2.14 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
2.14 Backport Required
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
priority/critical
Critical issue
Jira Link: DB-3183
Description
We have seen cases where in large WAL entries are persisted on the tablet leader but cannot be successfully replicated to followers, leading to tablets getting into unhealthy/unusable state. Example customer issue - https://yugabyte.zendesk.com/agent/tickets/3861
The fix in https://phabricator.dev.yugabyte.com/D16842 doesn't seem to handle the case where in a single WAL is larger than the limit.
The text was updated successfully, but these errors were encountered: