New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB] flaky test: YbAdminSnapshotScheduleTest.CacheRefreshOnNewConnection #15006
Labels
2.16 Backport Required
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
kind/failing-test
Tests and testing infra
priority/high
High Priority
Comments
bmatican
added
area/docdb
YugabyteDB core features
status/awaiting-triage
Issue awaiting triage
labels
Nov 15, 2022
yugabyte-ci
added
kind/bug
This issue is a bug
priority/medium
Medium priority issue
labels
Nov 15, 2022
bmatican
added
kind/failing-test
Tests and testing infra
priority/high
High Priority
and removed
priority/medium
Medium priority issue
labels
Nov 15, 2022
Based on slack conversation seems like caused due to 5a09155 |
Huqicheng
added a commit
that referenced
this issue
Nov 19, 2022
…n retrying a batcher Summary: When retrying a batcher for tablet split error, we reuse the same request id as the failed batcher. So we should make sure all ops having same request id should be retried with the same WriteRpc. But currently, it's possible that an operation failed with TabletLookup and retry with a seperate WriteRpc, this will lead to unexpected `Duplicate request` error. The test YbAdminSnapshotScheduleTest.CacheRefreshOnNewConnection is flaky due to this error. With this diff, if TabletLookup errors happen for an operation is going to be retried (have a valid request_id), we also set the error to operations having the same request_id. They will be retried in one shot with the next retry batcher. Test Plan: YbAdminSnapshotScheduleTest.CacheRefreshOnNewConnection -n 100 Reviewers: sergei, rthallam, timur Reviewed By: timur Subscribers: ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D21137
Look at the detective trend: https://detective.dev.yugabyte.com/stability/test?analyze_trends=true&branch=master&class=YbAdminSnapshotScheduleTest&name=CacheRefreshOnNewConnection The test is passing all builds after the fix is in. Also need to back port this to 2.16 2.17.0 |
Huqicheng
added a commit
that referenced
this issue
Nov 21, 2022
… request id when retrying a batcher Summary: Original commit: 47dec92 / D21137 When retrying a batcher for tablet split error, we reuse the same request id as the failed batcher. So we should make sure all ops having same request id should be retried with the same WriteRpc. But currently, it's possible that an operation failed with TabletLookup and retry with a seperate WriteRpc, this will lead to unexpected `Duplicate request` error. The test YbAdminSnapshotScheduleTest.CacheRefreshOnNewConnection is flaky due to this error. With this diff, if TabletLookup errors happen for an operation is going to be retried (have a valid request_id), we also set the error to operations having the same request_id. They will be retried in one shot with the next retry batcher. Test Plan: YbAdminSnapshotScheduleTest.CacheRefreshOnNewConnection -n 100 Reviewers: sergei, rthallam, timur, jmeehan Reviewed By: jmeehan Subscribers: jmeehan, bogdan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D21228
Huqicheng
added a commit
that referenced
this issue
Nov 21, 2022
…me request id when retrying a batcher Summary: Original commit: 47dec92 / D21137 When retrying a batcher for tablet split error, we reuse the same request id as the failed batcher. So we should make sure all ops having same request id should be retried with the same WriteRpc. But currently, it's possible that an operation failed with TabletLookup and retry with a seperate WriteRpc, this will lead to unexpected `Duplicate request` error. The test YbAdminSnapshotScheduleTest.CacheRefreshOnNewConnection is flaky due to this error. With this diff, if TabletLookup errors happen for an operation is going to be retried (have a valid request_id), we also set the error to operations having the same request_id. They will be retried in one shot with the next retry batcher. Test Plan: YbAdminSnapshotScheduleTest.CacheRefreshOnNewConnection -n 100 Reviewers: sergei, rthallam, timur, jmeehan Reviewed By: jmeehan Subscribers: jmeehan, ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D21229
jayant07-yb
pushed a commit
to jayant07-yb/yugabyte-db
that referenced
this issue
Dec 7, 2022
…t id when retrying a batcher Summary: When retrying a batcher for tablet split error, we reuse the same request id as the failed batcher. So we should make sure all ops having same request id should be retried with the same WriteRpc. But currently, it's possible that an operation failed with TabletLookup and retry with a seperate WriteRpc, this will lead to unexpected `Duplicate request` error. The test YbAdminSnapshotScheduleTest.CacheRefreshOnNewConnection is flaky due to this error. With this diff, if TabletLookup errors happen for an operation is going to be retried (have a valid request_id), we also set the error to operations having the same request_id. They will be retried in one shot with the next retry batcher. Test Plan: YbAdminSnapshotScheduleTest.CacheRefreshOnNewConnection -n 100 Reviewers: sergei, rthallam, timur Reviewed By: timur Subscribers: ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D21137
Huqicheng
added a commit
that referenced
this issue
Dec 22, 2022
… request id when retrying a batcher Summary: Original commit: 6bd085b / D21137 When retrying a batcher for tablet split error, we reuse the same request id as the failed batcher. So we should make sure all ops having same request id should be retried with the same WriteRpc. But currently, it's possible that an operation failed with TabletLookup and retry with a seperate WriteRpc, this will lead to unexpected `Duplicate request` error. The test YbAdminSnapshotScheduleTest.CacheRefreshOnNewConnection is flaky due to this error. With this diff, if TabletLookup errors happen for an operation is going to be retried (have a valid request_id), we also set the error to operations having the same request_id. They will be retried in one shot with the next retry batcher. Test Plan: YbAdminSnapshotScheduleTest.CacheRefreshOnNewConnection -n 100 Reviewers: sergei, rthallam, timur, jmeehan Reviewed By: jmeehan Subscribers: bogdan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D21815
This was referenced Mar 9, 2023
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
2.16 Backport Required
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
kind/failing-test
Tests and testing infra
priority/high
High Priority
Jira Link: DB-4249
Description
https://detective-gcp.dev.yugabyte.com/stability/test?analyze_trends=true&branch=master&build_type=all&class=YbAdminSnapshotScheduleTest&fail_tag=all&name=CacheRefreshOnNewConnection&platform=linux
Bunch of failures of this form, so maybe just overloaded server in the test
The text was updated successfully, but these errors were encountered: