Skip to content

Eploit synergies when evaluating many queries at the same time #2117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 50 commits into from
Apr 8, 2022

Conversation

tobim
Copy link
Member

@tobim tobim commented Feb 25, 2022

This set of changes introduces a new query scheduling algorithm that is designed to exploit synergies between queries candidate sets: When many queries are active simultaneously the sets of candidate partitions can be expected to overlap partially. To make use of this overlap we build a map from partition_id to the list query_ids interested in the partition. The partitions that ought to be materialized are selected based on the number of interested queries, and all queries get sent to the partition.

A measurement of the completion time of running 500 queries on a local database of 220 GB :
master: 155 seconds
this branch: 114,5 seconds

📝 Checklist

  • All user-facing changes have changelog entries.
  • The changes are reflected on docs.tenzir.com/vast, if necessary.
  • The PR description contains instructions for the reviewer, if necessary.

🎯 Review Instructions

@tobim tobim added performance Improvements or regressions of performance enhancement blocked Blocked by an (external) issue labels Feb 25, 2022
@dominiklohmann dominiklohmann removed the blocked Blocked by an (external) issue label Mar 3, 2022
@tobim tobim force-pushed the topic/index-query-scheduling branch from 9f87b16 to f5bd408 Compare March 6, 2022 13:04
@tobim tobim force-pushed the topic/index-query-scheduling branch from f5bd408 to 469fdef Compare March 14, 2022 17:33
tobim added 11 commits March 18, 2022 14:15
Remaining items for the POC:
* Use an ordered structure to keep track of pending partitions
* Consider partition residency and query priorities in the partition
  selection algorithm
* Handle failing partition spin-ups

Remaining items for the final implementation:
* Fix the max-queries option
* Bring back partition transforms
* Remove dead code
This brings back priorities and incorporates them into the scheduling
algorithm by accumulating the priorities of all active queries instead
of using the simple count.
The new overload takes an additional count parameter for the
approximate number of desired events that the client is interested
in.

The client can request an exhaustive evaluation by passing `0`. This
is currently the only supported value.
@tobim tobim force-pushed the topic/index-query-scheduling branch from 469fdef to 17e3140 Compare March 18, 2022 13:16
tobim added 12 commits March 21, 2022 17:50
`query.partition.materializations` counts how many partitions are
read from disk.
`query.partition.lookups` counts the number of lookups issued against
materialized partitions.

TODO:
- split out active and unpersisted partitions
In case the atomic replace operation failed we reset the error with
the subsequent file deletion.
This didn't add any value because the sink would be notified
about shutdown from the corresponding exporter anyways.
The previous default of 1'000'000 lead to about 400MB of additional
memory consumption. This is acceptable for the server but not for
the often short-running client commands.
@tobim tobim force-pushed the topic/index-query-scheduling branch from 135e5c3 to 10d39f5 Compare April 5, 2022 13:31
@tobim tobim force-pushed the topic/index-query-scheduling branch from 48297b4 to 8beee92 Compare April 5, 2022 15:26
Copy link
Member

@dominiklohmann dominiklohmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving modulo docs update. Thanks so much for making this change, this is a big leap forward! 🚀

@tobim tobim enabled auto-merge April 8, 2022 12:24
@tobim tobim merged commit 874b9eb into master Apr 8, 2022
@tobim tobim deleted the topic/index-query-scheduling branch April 8, 2022 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Improvements or regressions of performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants