CauliBase is a tiny key-value database prototype for research purpose. It uses a simple LSM-tree style design:
- writes first go to a WAL (Write-Ahead Log)
- recent data lives in an in-memory memtable
- flushed data is stored in immutable SSTable files
- compaction merges SSTables and removes deleted records
The project is intentionally compact and educational, with unit tests for the core storage components.
put <key> <value>: insert or overwrite a keyget <key>: read a keydel <key>: logically delete a key with a tombstoneflush: flush the current memtable to an SSTablecompact: merge SSTables and discard tombstonesdebug: print current memtable and SSTable state- WAL replay on startup for crash recovery of unflushed writes
- Key normalization with a short 64-bit hash-based internal key
- Optional Feistel-based pseudo-random permutation and 1000-slot block-level key shuffling
- SSTables include a Bloom filter and key-offset index for faster point lookups
- C++ standard:
C++17 - Build system:
CMake - Test framework:
doctest v2.4.12 - Main executable target:
cauli_base - Unit test target:
cauli_unit_tests - Benchmark target:
cauli_bench
cmake -S . -B build
cmake --build buildBy default, the CLI stores database files in:
data/
wal.log
000001.sst
000002.sst
...
ctest --test-dir build --output-on-failureYou can also run the doctest binary directly:
./build/test/cauli_unit_testsThe benchmark program lives in bench/ and measures the main database operations:
putget_memtableget_sstabledelcompact
SSTable point lookups use a Bloom filter to reject absent keys, then binary-search a key-offset index and seek directly to the matching record. Older SSTable files without metadata still fall back to sequential scanning.
Default benchmark settings:
operations=10000
compact_operations=2000
value_size=64
You can override the settings:
./build/bench/cauli_bench [operations] [compact_operations] [value_size] [both|shuffle-on|shuffle-off] [prepare-keys|no-prepare] [repeats]