-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TiKV panic 'error: Corruption: L6 has overlapping ranges' #8243
Comments
@MyonKeminta is the instance still exists? @Little-Wallace is this the issue you mentioned? |
@yiwu-arbug The tikv node was stopped and the data and logs still exists. |
The above error is caused by RocksDB detected unordered SST files after a L5->L6 compaction. The two unordered SSTs are generated by the same compaction. After investigating the remaining DB, we find that one of the compaction input SST is unordered internally. The problematic SST is generated as a result of multiple compactions, and the intermediate result is gone, so we are not able investigate further this time. In the sequence of compaction that leads to the problematic SST there's also ingested SSTs participated in it, so we cannot rule out if one of the ingested file is unordered. Followup will be to add logic to fail compaction once unordered result is generated, so that next time we reproduce the problem we can examine the data and see how it could happen. |
@yiwu-arbug Thank you :) |
Got another reproduction from another user POC test. Following up. |
The issue, at least the last reproduction, is due to RocksDB block cache cache key conflict, causing compaction reading wrong block content from file. facebook/rocksdb#7405 (comment) |
The cache key conflict is more likely to happen AFTER this kernel patch, which change inode generation number from sequential to random. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=232530680290ba94ca37852ab10d9556ea28badf |
Adding compaction and read path consistency check (by @Connor1996): tikv/rocksdb#195 |
known affected operation systems: known unaffected operation systems: The test: |
tikv/rocksdb#205 Will fix it by generating uniqueID based on db instance and sst file number instead of inode number. |
@Connor1996 Can you help also cherry-pick the change to disable force-consistency-checks and the change to check key ordering? |
okay |
kernel 4.18.20-2.el7.x86_64 The same thing happened,error: L5 have overlapping ranges |
What's your linux distribution and its version, and what's the file system you use? Just want to keep a record. Also can you run the test program in this comment on the server hosting TiKV and report the result? #8243 (comment) |
Bug Report
What version of TiKV are you using?
v4.0.1
Steps to reproduce
Unknown
What did you expect?
TiKV runs properly
What did happened?
TiKV panics:
The key in the panic log cannot be found in TiKV's logs about ingesting sst. Neither can it be found in RocksDB's logs and manifests.
After deleting the panic mark file and restarting the TiKV node, it will soon panic again, printing the same key in the log.
cc @Little-Wallace @yiwu-arbug @zhangjinpeng1987
The text was updated successfully, but these errors were encountered: