Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
gh-1852 sort memtable KV pairs on read
The memtable for Map is a binary tree so it's always sorted. However, since this is type 'Map' each "row key" holds a map. This map was unsorted in the past. In #1832 we introduced a change that made sure this change would always be sorted ON DISK, i.e. in the segments. It was very natural to also keep it sorted in the memtable, as we did not have to do any sorting when flushing. However, the performance tests on imports that make heavy use of the inverted index had a large performance degradation after #1832. In a test I did locally the import time went up by over 30%. This fix goes back to keeping the KV pairs unsorted and making each change an append only operation. This means it now needs to be sorted in just two places (as opposed to on every single insertion): 1. On a read query. Those should be rare on memtable, since memtables are mostly meant for writing. The added overhead here (minimal) is not a problem since it was also there before #1832 2. When flushing. Flushing is an async operation and the small overhead of sorting each row's Map KVs is neglible. This new implementation has the same import speed as prior to #1832 while keeping all the runtime benefits of having the KV pairs sorted on disk. closes #1852
- Loading branch information