New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set bloom_fpr automatically #3969
Labels
Milestone
Comments
EmirVildanov
added a commit
to EmirVildanov/tarantool
that referenced
this issue
Aug 30, 2022
The optimal bloom_fpr is calculated using transformed optimization from Monkey: https://stratos.seas.harvard.edu/files/stratos/files/monkeykeyvaluestore.pdf article: the calculation based not on a run's level in LSM tree, but on it's size. See also tarantool#3969 NO_DOC=internal NO_TEST=internal
EmirVildanov
added a commit
to EmirVildanov/tarantool
that referenced
this issue
Sep 19, 2022
The optimal bloom_fpr is calculated using transformed optimization from Monkey: https://stratos.seas.harvard.edu/files/stratos/files/monkeykeyvaluestore.pdf article: the calculation based not on a run's level in LSM tree, but on it's size. See also tarantool#3969 NO_DOC=internal NO_TEST=internal
EmirVildanov
added a commit
to EmirVildanov/tarantool
that referenced
this issue
Sep 23, 2022
@TarantoolBot document Title: optimized bloom_fpr calculation The optimal bloom_fpr is calculated using transformed optimization from Monkey: https://stratos.seas.harvard.edu/files/stratos/files/monkeykeyvaluestore.pdf article: the calculation based not on a run's level in LSM tree, but on it's size. vy_scheduler.c added functions: vy_task_find_range_bloom_fpr(lsm, range, new_run_stmts_count, W, total_stmts_count, skip_n) Calculate optimal bloom_fpr for given range. Works for runs created during compaction and as a subfunction during dump run creation. Parameters: * lsm — LSM-tree we are working with * range — concrete range we are working with * new_run_stmts_count — statements count in the newly created after dump/compaction run * W — total number of runs in the range after operation (dump/compaction) * total_stmts_count — total number of statements in the range after operation (dump/compaction) * skip_n — number of runs to skip during the range traverse as they are compacted into one new run Returns: optimally calculated bloom_fpr vy_task_find_lsm_bloom_fpr(lsm, new_run_stmts_count) Calculate optimal bloom_fpr for given lsm. Using vy_task_find_range_bloom_fpr inside. Parameters: * lsm — LSM-tree we are working with * new_run_stmts_count — statements count in the newly created after dump run Returns: optimally calculated bloom_fpr for dump See also tarantool#3969 NO_TEST=internal
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
While mainstream designs along the top curve in Figure 1 set the same false positive rate to Bloom
filters across all levels of LSM-tree, Monkey, the current
state of the art, sets exponentially lower false positive rates to
Bloom filters at smaller levels [22]. This is shown to minimize
the sum of false positive rates across all filters and to thereby
minimize I/O for point lookups
See N. Dayan, M. Athanassoulis, and S. Idreos. Monkey: Optimal Navigable KeyValue Store. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 79–94, 2017
See also the dostoevsky paper referencing the paper above.
The text was updated successfully, but these errors were encountered: