bugfix: clear spilled hybrid update state after emit batches#1147
Merged
bugfix: clear spilled hybrid update state after emit batches#1147
Conversation
When ManyAggregatedData is destroyed, its 'rocks' member (vector of shared_ptr<RocksDB>) is destroyed before 'variants'. This causes RocksDB::shutdown() to free the underlying rocksdb::DB. The subsequent destruction of variants triggers RocksDBColumnFamilyHandler::destroy(), which accesses db->DefaultColumnFamily() through a dangling raw pointer. Fix by locking the weak_ptr<RocksDB> before any raw db pointer access. If the RocksDB is already gone or shut down, return early since RocksDB::shutdown() already cleaned up all column family handles.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
HybridHashTable::clear()andHybridKeyList::clear()both had the same stale-persistent-data bug: whencleanup_on_disk_data=truethecf_handlershared_ptr was reset without callingdestroyPersistentPart(), leaving stale keys in the shared RocksDB column family that would reappear on the nextinitRocks()call (e.g. after checkpoint recovery).clear()methods by adding anelse { destroyPersistentPart(); }branch so the column family is dropped before the handler is released.{}braces to bothif/elsebranches in both methods.TEST(HybridKeyList, ClearRemovesSharedPersistentData)regression test mirroring theHybridHashTableone.99248) to exercise the pause/resume checkpoint-recovery path that most reliably triggers the re-emission of unchanged groups.Root cause
After
clear(),cf_handler.reset()dropped the shared_ptr but the RocksDB column family entry remained live inRocksDB::cf_handles. The nextgetOrCreateColumnFamilyHandler()call found the existing CF with all old data, causingforEachKey()/forEach()to iterate stale spilled keys alongside current ones and re-emit unchanged aggregation groups.