Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set bloom_fpr automatically #3969

Open
kostja opened this issue Feb 8, 2019 · 0 comments
Open

Set bloom_fpr automatically #3969

kostja opened this issue Feb 8, 2019 · 0 comments
Labels
Milestone

Comments

@kostja
Copy link
Contributor

kostja commented Feb 8, 2019

While mainstream designs along the top curve in Figure 1 set the same false positive rate to Bloom
filters across all levels of LSM-tree, Monkey, the current
state of the art, sets exponentially lower false positive rates to
Bloom filters at smaller levels [22]. This is shown to minimize
the sum of false positive rates across all filters and to thereby
minimize I/O for point lookups

See N. Dayan, M. Athanassoulis, and S. Idreos. Monkey: Optimal Navigable KeyValue Store. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 79–94, 2017

See also the dostoevsky paper referencing the paper above.

@kyukhin kyukhin added feature A new functionality vinyl performance labels Feb 13, 2019
@kyukhin kyukhin added this to the 2.2.0 milestone Feb 13, 2019
@kyukhin kyukhin modified the milestones: 2.2.0, 2.3.0 Mar 27, 2019
@kyukhin kyukhin modified the milestones: 2.3.0, 2.4.0 Jul 22, 2019
@kyukhin kyukhin modified the milestones: 2.4.1, 2.5.1 Jan 21, 2020
@kyukhin kyukhin modified the milestones: 2.5.1, 2.6.1 Apr 10, 2020
@kyukhin kyukhin modified the milestones: 2.6.1, wishlist Oct 23, 2020
EmirVildanov added a commit to EmirVildanov/tarantool that referenced this issue Aug 30, 2022
The optimal bloom_fpr is calculated
using transformed optimization from Monkey:
https://stratos.seas.harvard.edu/files/stratos/files/monkeykeyvaluestore.pdf
article: the calculation based not on a run's level
in LSM tree, but on it's size.

See also tarantool#3969
NO_DOC=internal
NO_TEST=internal
EmirVildanov added a commit to EmirVildanov/tarantool that referenced this issue Sep 19, 2022
The optimal bloom_fpr is calculated
using transformed optimization from Monkey:
https://stratos.seas.harvard.edu/files/stratos/files/monkeykeyvaluestore.pdf
article: the calculation based not on a run's level
in LSM tree, but on it's size.

See also tarantool#3969
NO_DOC=internal
NO_TEST=internal
EmirVildanov added a commit to EmirVildanov/tarantool that referenced this issue Sep 23, 2022
@TarantoolBot document
Title: optimized bloom_fpr calculation

The optimal bloom_fpr is calculated
using transformed optimization from Monkey:
https://stratos.seas.harvard.edu/files/stratos/files/monkeykeyvaluestore.pdf
article: the calculation based not on a run's level
in LSM tree, but on it's size.

vy_scheduler.c added functions:
vy_task_find_range_bloom_fpr(lsm, range,
new_run_stmts_count, W, total_stmts_count, skip_n)
Calculate optimal bloom_fpr for given range.
Works for runs created during compaction and as
a subfunction during dump run creation.

Parameters:
* lsm — LSM-tree we are working with
* range — concrete range we are working with
* new_run_stmts_count — statements count in
the newly created after dump/compaction run
* W — total number of runs in the range after
operation (dump/compaction)
* total_stmts_count — total number of statements
in the range after operation (dump/compaction)
* skip_n — number of runs to skip during the range
traverse as they are compacted into one new run

Returns:
optimally calculated bloom_fpr

vy_task_find_lsm_bloom_fpr(lsm, new_run_stmts_count)
Calculate optimal bloom_fpr for given lsm.
Using vy_task_find_range_bloom_fpr inside.

Parameters:
* lsm — LSM-tree we are working with
* new_run_stmts_count — statements count in
the newly created after dump run

Returns:
optimally calculated bloom_fpr for dump

See also tarantool#3969
NO_TEST=internal
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants