-
-
Notifications
You must be signed in to change notification settings - Fork 98
Eploit synergies when evaluating many queries at the same time #2117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
9f87b16
to
f5bd408
Compare
f5bd408
to
469fdef
Compare
Remaining items for the POC: * Use an ordered structure to keep track of pending partitions * Consider partition residency and query priorities in the partition selection algorithm * Handle failing partition spin-ups Remaining items for the final implementation: * Fix the max-queries option * Bring back partition transforms * Remove dead code
This brings back priorities and incorporates them into the scheduling algorithm by accumulating the priorities of all active queries instead of using the simple count.
The new overload takes an additional count parameter for the approximate number of desired events that the client is interested in. The client can request an exhaustive evaluation by passing `0`. This is currently the only supported value.
469fdef
to
17e3140
Compare
`query.partition.materializations` counts how many partitions are read from disk. `query.partition.lookups` counts the number of lookups issued against materialized partitions. TODO: - split out active and unpersisted partitions
In case the atomic replace operation failed we reset the error with the subsequent file deletion.
This didn't add any value because the sink would be notified about shutdown from the corresponding exporter anyways.
The previous default of 1'000'000 lead to about 400MB of additional memory consumption. This is acceptable for the server but not for the often short-running client commands.
135e5c3
to
10d39f5
Compare
Also rename from `pending_queue` to `query_queue` and document the exposed functions.
... and document the index counters.
The `remove` function now always removes a query from the queue, even when it is currently being evaluated at one or more partitions. We don't send a `done` message for those any more because the client is clearly not interested.
This allows us to get rid of the mutable accessor to the queries.
48297b4
to
8beee92
Compare
dominiklohmann
approved these changes
Apr 8, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving modulo docs update. Thanks so much for making this change, this is a big leap forward! 🚀
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This set of changes introduces a new query scheduling algorithm that is designed to exploit synergies between queries candidate sets: When many queries are active simultaneously the sets of candidate partitions can be expected to overlap partially. To make use of this overlap we build a map from partition_id to the list query_ids interested in the partition. The partitions that ought to be materialized are selected based on the number of interested queries, and all queries get sent to the partition.
A measurement of the completion time of running 500 queries on a local database of 220 GB :
master
: 155 secondsthis branch: 114,5 seconds
📝 Checklist
🎯 Review Instructions