Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-1852 sort memtable KV pairs on read #1853

Merged
merged 3 commits into from Mar 11, 2022
Merged

gh-1852 sort memtable KV pairs on read #1853

merged 3 commits into from Mar 11, 2022

Conversation

etiennedi
Copy link
Member

The memtable for Map is a binary tree so it's always sorted. However,
since this is type 'Map' each "row key" holds a map. This map was
unsorted in the past. In #1832 we introduced a change that made sure
this change would always be sorted ON DISK, i.e. in the segments. It was
very natural to also keep it sorted in the memtable, as we did not have
to do any sorting when flushing. However, the performance tests on
imports that make heavy use of the inverted index had a large
performance degradation after #1832. In a test I did locally the import
time went up by over 30%.

This fix goes back to keeping the KV pairs unsorted and making each
change an append only operation. This means it now needs to be sorted in
just two places (as opposed to on every single insertion):

  1. On a read query. Those should be rare on memtable, since memtables
    are mostly meant for writing. The added overhead here (minimal) is
    not a problem since it was also there before LSMKV: Store Map in always sorted manner #1832
  2. When flushing. Flushing is an async operation and the small overhead
    of sorting each row's Map KVs is neglible.

This new implementation has the same import speed as prior to #1832
while keeping all the runtime benefits of having the KV pairs sorted on
disk.

closes #1852

The memtable for Map is a binary tree so it's always sorted. However,
since this is type 'Map' each "row key" holds a map. This map was
unsorted in the past. In #1832 we introduced a change that made sure
this change would always be sorted ON DISK, i.e. in the segments. It was
very natural to also keep it sorted in the memtable, as we did not have
to do any sorting when flushing. However, the performance tests on
imports that make heavy use of the inverted index had a large
performance degradation after #1832. In a test I did locally the import
time went up by over 30%.

This fix goes back to keeping the KV pairs unsorted and making each
change an append only operation. This means it now needs to be sorted in
just two places (as opposed to on every single insertion):

1. On a read query. Those should be rare on memtable, since memtables
   are mostly meant for writing. The added overhead here (minimal) is
   not a problem since it was also there before #1832
2. When flushing. Flushing is an async operation and the small overhead
   of sorting each row's Map KVs is neglible.

This new implementation has the same import speed as prior to #1832
while keeping all the runtime benefits of having the KV pairs sorted on
disk.

closes #1852
@codecov
Copy link

codecov bot commented Mar 10, 2022

Codecov Report

Merging #1853 (5b132f4) into master (01a8262) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1853      +/-   ##
==========================================
+ Coverage   66.06%   66.09%   +0.02%     
==========================================
  Files         411      411              
  Lines       30915    30914       -1     
==========================================
+ Hits        20425    20432       +7     
+ Misses       8691     8682       -9     
- Partials     1799     1800       +1     
Flag Coverage Δ
integration 67.34% <100.00%> (+0.06%) ⬆️
unittests 66.09% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
adapters/repos/db/lsmkv/binary_search_tree_map.go 100.00% <100.00%> (+2.70%) ⬆️
adapters/repos/db/vector/hnsw/insert.go 74.02% <0.00%> (-1.65%) ⬇️
adapters/repos/db/shard_write_batch_objects.go 74.24% <0.00%> (-1.52%) ⬇️
adapters/repos/db/lsmkv/bucket.go 64.17% <0.00%> (+1.49%) ⬆️
adapters/repos/db/vector/hnsw/maintainance.go 86.20% <0.00%> (+20.68%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 877688d...5b132f4. Read the comment docs.

@etiennedi etiennedi merged commit b940e62 into master Mar 11, 2022
@etiennedi etiennedi deleted the gh-1852 branch March 11, 2022 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance: Sorted Map KVs in memtable degrade import performance
3 participants