Skip to content

2.27.0.0-b125

@ttyusupov ttyusupov tagged this 18 May 22:21
Summary:
Added `check_data_blocks` to sst_dump tool. It will check all data blocks consistency using SST file index and per-block checksums.
`--skip_uncompress` option specifies that `check_data_blocks` command should skip uncompressing data blocks (false by default).

If any corrupt data blocks are found, the tool will generate `dd` commands which could be used to save specific blocks (corrupted block and one block before/after) for analysis.

Example usage:
```
./sst_dump --command=check_data_blocks --file=000046.sst 2>&1 | tee sst_dump.out
cat sst_dump.out | grep '^dd '
```
Jira: DB-16720

Test Plan:
Tested manually on artificially corrupt SST file:
```
~/code/yugabyte5/build/latest/bin/sst_dump --command=check_data_blocks --file=000046.sst 2>&1 | tee sst_dump.out
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0517 07:21:12.046017  3481 sst_dump_tool.cc:433] Checking data block #0 handle: BlockHandle { offset: 0 size: 3927 }
W0517 07:21:12.051733  3481 file_reader_writer.cc:101] Read attempt #1 failed in file 000046.sst.sblock.0 : Corruption (yb/rocksdb/table/format.cc:345): Block checksum mismatch in file: 000046.sst.sblock.0, block handle: BlockHandle { offset: 996452 size: 4024 }, expected checksum: 1118200703, actual checksum: 1023486239.
W0517 07:21:12.051852  3481 file_reader_writer.cc:101] Read attempt #2 failed in file 000046.sst.sblock.0 : Corruption (yb/rocksdb/table/format.cc:345): Block checksum mismatch in file: 000046.sst.sblock.0, block handle: BlockHandle { offset: 996452 size: 4024 }, expected checksum: 1118200703, actual checksum: 1023486239.
E0517 07:21:12.051862  3481 format.cc:466] ReadBlockContents: Corruption (yb/rocksdb/table/format.cc:345): Block checksum mismatch in file: 000046.sst.sblock.0, block handle: BlockHandle { offset: 996452 size: 4024 }, expected checksum: 1118200703, actual checksum: 1023486239.
    @     0x7fd68e2eafde  rocksdb::SstFileReader::CheckDataBlocks()
    @     0x7fd68e2ec8ca  rocksdb::SSTDumpTool::Run()
    @     0x555855a53f78  main
    @     0x7fd688e45ca3  __libc_start_main
    @     0x555855a53eae  _start
W0517 07:21:12.056023  3481 sst_dump_tool.cc:441] Failed to read block with handle: BlockHandle { offset: 996452 size: 4024 }. Corruption (yb/rocksdb/table/format.cc:345): Block checksum mismatch in file: 000046.sst.sblock.0, block handle: BlockHandle { offset: 996452 size: 4024 }, expected checksum: 1118200703, actual checksum: 1023486239.
from [] to []
Process 000046.sst
Sst file format: block-based
dd if="000046.sst.sblock.0" bs=1 skip=992433 count=4014 of="000046.sst.sblock.0.offset_992433.size_4014.part"
dd if="000046.sst.sblock.0" bs=1 skip=996452 count=4024 of="000046.sst.sblock.0.offset_996452.size_4024.part"
dd if="000046.sst.sblock.0" bs=1 skip=1000481 count=4019 of="000046.sst.sblock.0.offset_1000481.size_4019.part"
W0517 07:21:12.634256  3481 file_reader_writer.cc:101] Read attempt #1 failed in file 000046.sst.sblock.0 : Corruption (yb/rocksdb/table/format.cc:345): Block checksum mismatch in file: 000046.sst.sblock.0, block handle: BlockHandle { offset: 119754675 size: 11766 }, expected checksum: 84894407, actual checksum: 2295535311.
W0517 07:21:12.634332  3481 file_reader_writer.cc:101] Read attempt #2 failed in file 000046.sst.sblock.0 : Corruption (yb/rocksdb/table/format.cc:345): Block checksum mismatch in file: 000046.sst.sblock.0, block handle: BlockHandle { offset: 119754675 size: 11766 }, expected checksum: 84894407, actual checksum: 2295535311.
E0517 07:21:12.634339  3481 format.cc:466] ReadBlockContents: Corruption (yb/rocksdb/table/format.cc:345): Block checksum mismatch in file: 000046.sst.sblock.0, block handle: BlockHandle { offset: 119754675 size: 11766 }, expected checksum: 84894407, actual checksum: 2295535311.
    @     0x7fd68e2eafde  rocksdb::SstFileReader::CheckDataBlocks()
    @     0x7fd68e2ec8ca  rocksdb::SSTDumpTool::Run()
    @     0x555855a53f78  main
    @     0x7fd688e45ca3  __libc_start_main
    @     0x555855a53eae  _start
W0517 07:21:12.638135  3481 sst_dump_tool.cc:441] Failed to read block with handle: BlockHandle { offset: 119754675 size: 11766 }. Corruption (yb/rocksdb/table/format.cc:345): Block checksum mismatch in file: 000046.sst.sblock.0, block handle: BlockHandle { offset: 119754675 size: 11766 }, expected checksum: 84894407, actual checksum: 2295535311.
dd if="000046.sst.sblock.0" bs=1 skip=119742901 count=11769 of="000046.sst.sblock.0.offset_119742901.size_11769.part"
dd if="000046.sst.sblock.0" bs=1 skip=119754675 count=11766 of="000046.sst.sblock.0.offset_119754675.size_11766.part"
dd if="000046.sst.sblock.0" bs=1 skip=119766446 count=1384 of="000046.sst.sblock.0.offset_119766446.size_1384.part"
```

```
cat sst_dump.out | grep '^dd '
dd if="000046.sst.sblock.0" bs=1 skip=992433 count=4014 of="000046.sst.sblock.0.offset_992433.size_4014.part"
dd if="000046.sst.sblock.0" bs=1 skip=996452 count=4024 of="000046.sst.sblock.0.offset_996452.size_4024.part"
dd if="000046.sst.sblock.0" bs=1 skip=1000481 count=4019 of="000046.sst.sblock.0.offset_1000481.size_4019.part"
dd if="000046.sst.sblock.0" bs=1 skip=119742901 count=11769 of="000046.sst.sblock.0.offset_119742901.size_11769.part"
dd if="000046.sst.sblock.0" bs=1 skip=119754675 count=11766 of="000046.sst.sblock.0.offset_119754675.size_11766.part"
dd if="000046.sst.sblock.0" bs=1 skip=119766446 count=1384 of="000046.sst.sblock.0.offset_119766446.size_1384.part"
```

Reviewers: hsunder, yyan

Reviewed By: yyan

Subscribers: ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D44035
Assets 2
Loading