Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix LSM net additions count and add WAL threshold #1855

Merged
merged 6 commits into from Mar 11, 2022

Conversation

parkerduckworth
Copy link
Member

Changes

Count Memtable Net Additions

Prior to these changes, the length in bytes of each key+value was used to calculate the approximate size of the memtable. This led to the bug described in this PR's related issue, where the memtable flushes too early, as the parent bucket overestimated the memtable's size.

This PR introduces the counting of the net additions, where if an existing key is updated, the size of the memtable is incremented only by the difference in length of the original value's bytes with the length of the new value's.

WAL Threshold

Added WAL threshold to be monitored in addition to the existing memtable threshold. Reason is, with the counting of net additions to the memtable vs gross additions, the WAL can grow without bound in certain situation.

An example is if the majority of writes to the memtable are updates to existing keys, with values of similar size, the bucket's active memtable will grow extremely slowly, if at all. Therefore threshold is never reached and the WAL increases in size unbounded.

To mitigate this problem, the WAL threshold will stat the current WAL and check to ensure that its size is below the threshold. Otherwise, like with the memtable threshold, the log is flushed to a segment and switched.


Closes #1830

@codecov
Copy link

codecov bot commented Mar 11, 2022

Codecov Report

Merging #1855 (fb9b142) into master (01a8262) will increase coverage by 0.04%.
The diff coverage is 83.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1855      +/-   ##
==========================================
+ Coverage   66.06%   66.11%   +0.04%     
==========================================
  Files         411      411              
  Lines       30915    30931      +16     
==========================================
+ Hits        20425    20450      +25     
+ Misses       8691     8685       -6     
+ Partials     1799     1796       -3     
Flag Coverage Δ
integration 67.37% <83.33%> (+0.09%) ⬆️
unittests 66.11% <83.33%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
adapters/repos/db/lsmkv/bucket.go 62.81% <58.33%> (+0.12%) ⬆️
adapters/repos/db/lsmkv/binary_search_tree.go 100.00% <100.00%> (ø)
adapters/repos/db/lsmkv/bucket_options.go 70.96% <100.00%> (+4.30%) ⬆️
adapters/repos/db/lsmkv/memtable.go 75.00% <100.00%> (-0.17%) ⬇️
adapters/repos/db/vector/hnsw/insert.go 74.02% <0.00%> (-1.65%) ⬇️
...dapters/repos/db/lsmkv/segment_group_compaction.go 67.71% <0.00%> (+0.78%) ⬆️
usecases/sharding/state.go 66.99% <0.00%> (+0.97%) ⬆️
usecases/traverser/grouper/merge_group.go 81.76% <0.00%> (+1.17%) ⬆️
adapters/repos/db/shard_write_batch_objects.go 77.27% <0.00%> (+1.51%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 877688d...fb9b142. Read the comment docs.

@etiennedi
Copy link
Member

Looks great! Thank you for the detailed description in the PR. This will come in really handy if we ever wonder why we did things in a certain way.

@parkerduckworth parkerduckworth merged commit d766134 into master Mar 11, 2022
@parkerduckworth parkerduckworth deleted the bugfix/lsm-net-additions branch March 11, 2022 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: LSMKV Memtable flushes too early if it's full of updates
2 participants