New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB] Optionally retry SST write after corruption is detected #19730
Labels
2.14 Backport Required
2.16 Backport Required
2.18 Backport Required
2.20 Backport Required
area/docdb
YugabyteDB core features
kind/enhancement
This is an enhancement of an existing feature
priority/high
High Priority
Projects
Comments
ttyusupov
added
kind/enhancement
This is an enhancement of an existing feature
area/docdb
YugabyteDB core features
status/awaiting-triage
Issue awaiting triage
labels
Oct 30, 2023
yugabyte-ci
added
priority/medium
Medium priority issue
and removed
status/awaiting-triage
Issue awaiting triage
labels
Oct 30, 2023
ttyusupov
added a commit
that referenced
this issue
Nov 2, 2023
Summary: Added support for `rocksdb_max_sst_write_retries` flag: maximum allowed number of attempts to write SST file in case of detected corruption after write (by default 0 which means no retries). Implemented for both flushes and compactions. For now, we only support retries when sub-compaction results in single output file (which is the case for yugabyte-db as of today). If in future we detect corruption after the first output file in case of multiple sub-compaction output files and retries are enabled, DFATAL will be logged and no retry will be done. Jira: DB-8560 Test Plan: DBTest.SstTailZerosCheckFlushRetries and DBTest.SstTailZerosCheckCompactionRetries Reviewers: arybochkin Reviewed By: arybochkin Subscribers: ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D29784
arybochkin
pushed a commit
that referenced
this issue
Nov 6, 2023
Summary: Added support for `rocksdb_max_sst_write_retries` flag: maximum allowed number of attempts to write SST file in case of detected corruption after write (by default 0 which means no retries). Implemented for both flushes and compactions. For now, we only support retries when sub-compaction results in single output file (which is the case for yugabyte-db as of today). If in future we detect corruption after the first output file in case of multiple sub-compaction output files and retries are enabled, DFATAL will be logged and no retry will be done. Original commit: f58c809 / D29784 Jira: DB-8560 Test Plan: Jenkins: urgent DBTest.SstTailZerosCheckFlushRetries and DBTest.SstTailZerosCheckCompactionRetries Reviewers: bogdan, timur, rthallam, sergei Reviewed By: bogdan, sergei Subscribers: ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D29953
arybochkin
pushed a commit
that referenced
this issue
Nov 7, 2023
Summary: Added support for `rocksdb_max_sst_write_retries` flag: maximum allowed number of attempts to write SST file in case of detected corruption after write (by default 0 which means no retries). Implemented for both flushes and compactions. For now, we only support retries when sub-compaction results in single output file (which is the case for yugabyte-db as of today). If in future we detect corruption after the first output file in case of multiple sub-compaction output files and retries are enabled, DFATAL will be logged and no retry will be done. Original commit: f58c809 / D29784 Backported from 2.14 commit: daf620d / D29953 Jira: DB-8560 Test Plan: Jenkins: urgent DBTest.SstTailZerosCheckFlushRetries and DBTest.SstTailZerosCheckCompactionRetries Reviewers: bogdan, timur, rthallam Reviewed By: bogdan Subscribers: ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D30001
ttyusupov
added a commit
that referenced
this issue
Nov 14, 2023
Summary: Added support for `rocksdb_max_sst_write_retries` flag: maximum allowed number of attempts to write SST file in case of detected corruption after write (by default 0 which means no retries). Implemented for both flushes and compactions. For now, we only support retries when sub-compaction results in single output file (which is the case for yugabyte-db as of today). If in future we detect corruption after the first output file in case of multiple sub-compaction output files and retries are enabled, DFATAL will be logged and no retry will be done. Jira: DB-8560 Original commit: f58c809 / D29784 Test Plan: DBTest.SstTailZerosCheckFlushRetries and DBTest.SstTailZerosCheckCompactionRetries Reviewers: arybochkin Reviewed By: arybochkin Subscribers: ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D30157
ttyusupov
added a commit
that referenced
this issue
Nov 14, 2023
Summary: Added support for `rocksdb_max_sst_write_retries` flag: maximum allowed number of attempts to write SST file in case of detected corruption after write (by default 0 which means no retries). Implemented for both flushes and compactions. For now, we only support retries when sub-compaction results in single output file (which is the case for yugabyte-db as of today). If in future we detect corruption after the first output file in case of multiple sub-compaction output files and retries are enabled, DFATAL will be logged and no retry will be done. Jira: DB-8560 Original commit: f58c809 / D29784 Test Plan: DBTest.SstTailZerosCheckFlushRetries and DBTest.SstTailZerosCheckCompactionRetries Reviewers: arybochkin Reviewed By: arybochkin Subscribers: ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D30158
yugabyte-ci
added
priority/high
High Priority
and removed
priority/medium
Medium priority issue
labels
Nov 15, 2023
ttyusupov
added a commit
that referenced
this issue
Nov 16, 2023
Summary: Added support for `rocksdb_max_sst_write_retries` flag: maximum allowed number of attempts to write SST file in case of detected corruption after write (by default 0 which means no retries). Implemented for both flushes and compactions. For now, we only support retries when sub-compaction results in single output file (which is the case for yugabyte-db as of today). If in future we detect corruption after the first output file in case of multiple sub-compaction output files and retries are enabled, DFATAL will be logged and no retry will be done. Jira: DB-8560 Original commit: f58c809 / D29784 Test Plan: DBTest.SstTailZerosCheckFlushRetries and DBTest.SstTailZerosCheckCompactionRetries Reviewers: arybochkin Reviewed By: arybochkin Subscribers: ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D30159
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
2.14 Backport Required
2.16 Backport Required
2.18 Backport Required
2.20 Backport Required
area/docdb
YugabyteDB core features
kind/enhancement
This is an enhancement of an existing feature
priority/high
High Priority
Jira Link: DB-8560
Description
Follow-up task for #19691 to add an ability to enable SST write retries on corruption detection after writing output SST file (either as a result of flush or compaction).
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: