Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structure for finding values that belongs to result set #44

Open
ak0rz opened this issue Apr 2, 2019 · 0 comments
Open

Structure for finding values that belongs to result set #44

ak0rz opened this issue Apr 2, 2019 · 0 comments
Assignees

Comments

@ak0rz
Copy link
Contributor

ak0rz commented Apr 2, 2019

This issue is a part of #42

As we already have BitSet indexes it seems to be pretty straightforward to implement algorithms that check that certain value belongs to given result BitSet.

Unfortunately, it's performance would be dramatically lower on databases with a large number of documents, and even worse when the number of values in the index is equal or almost large as the number of documents, up to O(n * n / 64) operations to find all of the distinct values for this result set.

The simplest idea is to store something like a bloom filter consists of all longs in a bitset bitwise-or'ed to a single (or a few) long values. That way we may calculate the same hash on the result set and then use indexed hash to exclude values where is definitely no documents matching.

It would work smoothly on a large diverged indexes where it's needed and will cause almost no interfering on indexes with few values where in most cases we may have a document for each value and thus must check for each value by hand (which is a few)

@ak0rz ak0rz self-assigned this Apr 2, 2019
@ak0rz ak0rz changed the title Bloom filter for finding values that belongs to result set Structure for finding values that belongs to result set Apr 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant