Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically mark checksum mismatched blocks for no-compact and no-downsample #5960

Open
RohitKochhar opened this issue Dec 12, 2022 · 0 comments

Comments

@RohitKochhar
Copy link
Contributor

Is your proposal related to a problem?

Currently, if Compact encounters a checksum mismatched block, it will throw an error and halt (unless the halt-on-error flag is disabled). This can be frustrating, especially since there is no course of action that can be taken to fix a checksum mismatched block.

If Compact halts on error, the backlog can begin to accumulate, resulting in many valid blocks not being compacted, and thus unoptimized storage which can accrue additional costs. If Compact does not halt on error, the blocks that are mismatched are ignored and the rest of the backlog is cleared, however eventually Compact will encounter a backlog of only mismatched blocks, which will cause it to throw unactionable errors.

Describe the solution you'd like

I would like to an additional flag to Compact which when enabled, will encounter checksum mismatched blocks and throw a warning, then mark the block for no-compact and no-downsample. This will ensure that the block will not be completely lost, and that Compact will not fail without a clear course of action.

Alternatively, rather than marking the block for no-compact and no-downsample, we could mark the blocks for deletion. In this case, rather than halting on a checksum mismatched block and failing, compact could just mark the block for deletion and then continue.

Describe alternatives you've considered

Without a feature like this, a developer will have to manually mark the block as either no-compact, no-downsample or deletion. This can be incredibly tedious and since the failures are often silent (doesn't break the entire system, you would have to be watching metrics/logs from Compact to notice), this can often take quite some time to be realized and fixed, only to happen again shortly after.

Additional context

I have opened an earlier issue (#5944) and PR (#5945) to add a no-downsample feature that is equivalent to the no-compact feature, except it marks a block as exempt for downsampling. This would extend that feature to mark blocks as no-downsample automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant