Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] Optionally retry SST write after corruption is detected #19730

Closed
1 task done
ttyusupov opened this issue Oct 30, 2023 · 0 comments
Closed
1 task done

[DocDB] Optionally retry SST write after corruption is detected #19730

ttyusupov opened this issue Oct 30, 2023 · 0 comments

Comments

@ttyusupov
Copy link
Contributor

ttyusupov commented Oct 30, 2023

Jira Link: DB-8560

Description

Follow-up task for #19691 to add an ability to enable SST write retries on corruption detection after writing output SST file (either as a result of flush or compaction).

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@ttyusupov ttyusupov added kind/enhancement This is an enhancement of an existing feature area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Oct 30, 2023
@ttyusupov ttyusupov self-assigned this Oct 30, 2023
@ttyusupov ttyusupov added this to Backlog in YBase features via automation Oct 30, 2023
@yugabyte-ci yugabyte-ci added priority/medium Medium priority issue and removed status/awaiting-triage Issue awaiting triage labels Oct 30, 2023
@ttyusupov ttyusupov moved this from Backlog to In progress in YBase features Nov 1, 2023
ttyusupov added a commit that referenced this issue Nov 2, 2023
Summary:
Added support for `rocksdb_max_sst_write_retries` flag: maximum allowed number of attempts to write SST file in case of detected corruption after write (by default 0 which means no retries). Implemented for both flushes and compactions.

For now, we only support retries when sub-compaction results in single output file (which is the case for yugabyte-db as of today). If in future we detect corruption after the first output file in case of multiple sub-compaction output files and retries are enabled, DFATAL will be logged and no retry will be done.

Jira: DB-8560

Test Plan: DBTest.SstTailZerosCheckFlushRetries and DBTest.SstTailZerosCheckCompactionRetries

Reviewers: arybochkin

Reviewed By: arybochkin

Subscribers: ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D29784
@yugabyte-ci yugabyte-ci assigned arybochkin and unassigned ttyusupov Nov 4, 2023
arybochkin pushed a commit that referenced this issue Nov 6, 2023
Summary:
Added support for `rocksdb_max_sst_write_retries` flag: maximum allowed number of attempts to write
SST file in case of detected corruption after write (by default 0 which means no retries).
Implemented for both flushes and compactions.

For now, we only support retries when sub-compaction results in single output file (which is
the case for yugabyte-db as of today). If in future we detect corruption after the first output
file in case of multiple sub-compaction output files and retries are enabled, DFATAL will be logged
and no retry will be done.

Original commit: f58c809 / D29784

Jira: DB-8560

Test Plan:
Jenkins: urgent

DBTest.SstTailZerosCheckFlushRetries and DBTest.SstTailZerosCheckCompactionRetries

Reviewers: bogdan, timur, rthallam, sergei

Reviewed By: bogdan, sergei

Subscribers: ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D29953
arybochkin pushed a commit that referenced this issue Nov 7, 2023
Summary:
Added support for `rocksdb_max_sst_write_retries` flag: maximum allowed number of attempts to write
SST file in case of detected corruption after write (by default 0 which means no retries).
Implemented for both flushes and compactions.

For now, we only support retries when sub-compaction results in single output file (which is
the case for yugabyte-db as of today). If in future we detect corruption after the first output
file in case of multiple sub-compaction output files and retries are enabled, DFATAL will be logged
and no retry will be done.

Original commit: f58c809 / D29784
Backported from 2.14 commit: daf620d / D29953

Jira: DB-8560

Test Plan:
Jenkins: urgent

DBTest.SstTailZerosCheckFlushRetries and DBTest.SstTailZerosCheckCompactionRetries

Reviewers: bogdan, timur, rthallam

Reviewed By: bogdan

Subscribers: ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D30001
@ttyusupov ttyusupov moved this from In progress to Done in YBase features Nov 7, 2023
ttyusupov added a commit that referenced this issue Nov 14, 2023
Summary:
Added support for `rocksdb_max_sst_write_retries` flag: maximum allowed number of attempts to write SST file in case of detected corruption after write (by default 0 which means no retries). Implemented for both flushes and compactions.

For now, we only support retries when sub-compaction results in single output file (which is the case for yugabyte-db as of today). If in future we detect corruption after the first output file in case of multiple sub-compaction output files and retries are enabled, DFATAL will be logged and no retry will be done.

Jira: DB-8560

Original commit: f58c809 / D29784

Test Plan: DBTest.SstTailZerosCheckFlushRetries and DBTest.SstTailZerosCheckCompactionRetries

Reviewers: arybochkin

Reviewed By: arybochkin

Subscribers: ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D30157
ttyusupov added a commit that referenced this issue Nov 14, 2023
Summary:
Added support for `rocksdb_max_sst_write_retries` flag: maximum allowed number of attempts to write SST file in case of detected corruption after write (by default 0 which means no retries). Implemented for both flushes and compactions.

For now, we only support retries when sub-compaction results in single output file (which is the case for yugabyte-db as of today). If in future we detect corruption after the first output file in case of multiple sub-compaction output files and retries are enabled, DFATAL will be logged and no retry will be done.

Jira: DB-8560

Original commit: f58c809 / D29784

Test Plan: DBTest.SstTailZerosCheckFlushRetries and DBTest.SstTailZerosCheckCompactionRetries

Reviewers: arybochkin

Reviewed By: arybochkin

Subscribers: ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D30158
@yugabyte-ci yugabyte-ci added priority/high High Priority and removed priority/medium Medium priority issue labels Nov 15, 2023
@rthallamko3 rthallamko3 assigned ttyusupov and unassigned arybochkin Nov 15, 2023
ttyusupov added a commit that referenced this issue Nov 16, 2023
Summary:
Added support for `rocksdb_max_sst_write_retries` flag: maximum allowed number of attempts to write SST file in case of detected corruption after write (by default 0 which means no retries). Implemented for both flushes and compactions.

For now, we only support retries when sub-compaction results in single output file (which is the case for yugabyte-db as of today). If in future we detect corruption after the first output file in case of multiple sub-compaction output files and retries are enabled, DFATAL will be logged and no retry will be done.

Jira: DB-8560

Original commit: f58c809 / D29784

Test Plan: DBTest.SstTailZerosCheckFlushRetries and DBTest.SstTailZerosCheckCompactionRetries

Reviewers: arybochkin

Reviewed By: arybochkin

Subscribers: ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D30159
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

4 participants