Performance: Sorted Map KVs in memtable degrade import performance #1852

etiennedi · 2022-03-10T16:10:52Z

On datasets that make heavy use of the inverted index, the changes introduced in #1832 degrade import performance.

This most likely (but not yet investigated) comes down to this method.

The memtable for Map is a binary tree so it's always sorted. However, since this is type 'Map' each "row key" holds a map. This map was unsorted in the past. In #1832 we introduced a change that made sure this change would always be sorted ON DISK, i.e. in the segments. It was very natural to also keep it sorted in the memtable, as we did not have to do any sorting when flushing. However, the performance tests on imports that make heavy use of the inverted index had a large performance degradation after #1832. In a test I did locally the import time went up by over 30%. This fix goes back to keeping the KV pairs unsorted and making each change an append only operation. This means it now needs to be sorted in just two places (as opposed to on every single insertion): 1. On a read query. Those should be rare on memtable, since memtables are mostly meant for writing. The added overhead here (minimal) is not a problem since it was also there before #1832 2. When flushing. Flushing is an async operation and the small overhead of sorting each row's Map KVs is neglible. This new implementation has the same import speed as prior to #1832 while keeping all the runtime benefits of having the KV pairs sorted on disk. closes #1852

gh-1852 sort memtable KV pairs on read

etiennedi added the performance label Mar 10, 2022

etiennedi self-assigned this Mar 10, 2022

etiennedi mentioned this issue Mar 10, 2022

gh-1852 sort memtable KV pairs on read #1853

Merged

etiennedi added a commit that referenced this issue Mar 10, 2022

gh-1852 clean up commented line

303f700

etiennedi added a commit that referenced this issue Mar 11, 2022

gh-1852 fix typos

5b132f4

etiennedi closed this as completed in #1853 Mar 11, 2022

etiennedi added a commit that referenced this issue Mar 11, 2022

Merge pull request #1853 from semi-technologies/gh-1852

b940e62

gh-1852 sort memtable KV pairs on read

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: Sorted Map KVs in memtable degrade import performance #1852

Performance: Sorted Map KVs in memtable degrade import performance #1852

etiennedi commented Mar 10, 2022

Performance: Sorted Map KVs in memtable degrade import performance #1852

Performance: Sorted Map KVs in memtable degrade import performance #1852

Comments

etiennedi commented Mar 10, 2022