LSMKV: Object Count #1811

etiennedi · 2022-01-31T17:09:40Z

Background

This is a requirement for #1798. Currently, the only way to count objects in an lsmkv.Bucket is to iterate over all of them. Such an iteration is currently implemented for { Aggregate { meta { count } } } which is orders of magnitudes too slow to be part of a BM25 query. This is because a simple count of keys does not contain enough information to determine the true count: A key could contain a tombstone and we don't know if a previous segment contained objects for those keys. Similarly, a key could either be a create or an update.

Proposed Algorithms

I could not find any info if there is a standard algorithm for an efficient count in LSM-based indices, so I came up with the following which should be fairly efficient without too much overhead:

Most parts of this algorithm run just once at start-up. Only the mem-table part needs to run at runtime.
This algorithm assumes running in order of segments as they are added (oldest to newest)
Starting at the root segment, each segment can reliably determine how many objects it adds itself.
On the root segment we can ignore tombstones as there is nothing they could remove.
For any non-root segment, we can encounter 4 situations:
- Previously unseen key: This is guaranteed to increase the count by 1
- Previously seen key: In all three modes (replace, set, map), a previously seen key means that the count is not changed
- Previously seen key with tombstone: This is guaranteed to reduce the count by 1
- Previously unseen key with tombstone: This does not alter the count.
To determine if a key is seen or unseen this algorithm can make use of the fact that it runs when segments are added, i.e. it can make use of high-level API functions of all previously included segments to reliably determine if a key exists. This will also take situations into account such as add->delete->re-add since the high-level APIs already produce correct results for those.
Each segment will therefore produce a net change number which can be cached along with other structures created at start-up time, such as bloom filters, etc. (Most likely the count might be able to run in the same loop as the bloom filter construction to save additional disk lookups)
As new segments get written to disk they can run through the same process when mounted
The topmost segment is the memtable. It could be extended with a simple thread-safe counter which uses the same techniques as above: ~~When a new object comes in, determine which of the 4 states it is in~~.
- Alternative: The suggested method above would require read-before-write patterns which would most likely add a large overhead on every single write for a relatively small benefit. Alternatively when .Count() is called this information can be aggregated. If such a runtime aggregation is still too slow, it could be cached in the memtable with a new write occurring as the cache invalidation event.
Assumption: When performing a compaction, the net count modifier of the compacted segment is the sum of the two previous segments
- If that assumption is incorrect, we can also run the same "segment initialization" logic from above again, however, we need to be careful that we limit the lookups to all previous segments, so we can't use the highest level APIs here (as to the outside we would include future segments as well which would produce an incorrect count).

Notes

Once present, this new algorithm can also be used to considerably speed up { Aggregate { meta { count } } }.

The text was updated successfully, but these errors were encountered:

This primitive implementation does not handle updates andd deletes yet and must be adapted in a future commit

currently only works for unique additions, does not yet handle udpates or deletes

This does not yet check for disk segments, so this only works if all disk segments consist of unique additions, but does work if the memtable contains updates or deletes.

currently this process is skipped during compactions, so the values are not reliable for compacted segments. Has not yet been evaluated for performance.

also introduce new count api in meta count

stale · 2022-04-02T09:28:42Z

Thank you for your contribution to Weaviate. This issue has not received any activity in a while and has therefore been marked as stale. Stale issues will eventually be autoclosed. This does not mean that we are ruling out to work on this issue, but it most likely has not been prioritized high enough in the last months.
If you believe that this issue should remain open, please leave a short reply. This lets us know that the issue is not abandoned and acts as a reminder for our team to consider prioritizing this again.
Please also consider if you can make a contribution to help with the solution of this issue. If you are willing to contribute, but don't know where to start, please leave a quick message and we'll try to help you.
Thank you, The Weaviate Team

etiennedi mentioned this issue Jan 31, 2022

BM25 #1798

Closed

11 tasks

etiennedi added a commit that referenced this issue Feb 1, 2022

gh-1811 add count to memtable

f3c6573

This primitive implementation does not handle updates andd deletes yet and must be adapted in a future commit

etiennedi added a commit that referenced this issue Feb 1, 2022

gh-1811 count keys on single disk segment

28a99c2

currently only works for unique additions, does not yet handle udpates or deletes

etiennedi added a commit that referenced this issue Feb 1, 2022

gh-1811 check if memtable keys are previously seen

35e4f32

This does not yet check for disk segments, so this only works if all disk segments consist of unique additions, but does work if the memtable contains updates or deletes.

etiennedi added a commit that referenced this issue Feb 4, 2022

gh-1811 add more test cases for reload, and delete scenarios

97edfc7

etiennedi added a commit that referenced this issue Feb 4, 2022

gh-1811 respect previous segments on count init after newly compacted

8a878e7

also introduce new count api in meta count

etiennedi added a commit that referenced this issue Feb 4, 2022

gh-1811 make sure to include flushing memtable if flush cycle is running

fe6be93

stale bot added the autoclosed Closed by the bot. We still want this, but it didn't quite make the latest prioritization round label Apr 2, 2022

stale bot closed this as completed Apr 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LSMKV: Object Count #1811

LSMKV: Object Count #1811

etiennedi commented Jan 31, 2022 •

edited

stale bot commented Apr 2, 2022

LSMKV: Object Count #1811

LSMKV: Object Count #1811

Comments

etiennedi commented Jan 31, 2022 • edited

Background

Proposed Algorithms

Notes

stale bot commented Apr 2, 2022

etiennedi commented Jan 31, 2022 •

edited