-
Notifications
You must be signed in to change notification settings - Fork 462
Limit to 3 pipelines per node per source #5792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: impr-shard-collocation
Are you sure you want to change the base?
Conversation
why 3? |
It's a ratio proposed by Paul here. The rational is that if you you start saturating the systems, merges being 3x faster than indexing, with this ratio merges would be able to keep up. |
There are 2 issues with the logic so far:
EDIT: 2) is solved in #5808 |
c065f94
to
1b57434
Compare
Actually this is wrong for now, I thought shards in the simplified problem were indexing pipelines in the physical plan, but it's not true. Shards in the simplified problem are physical shards, and they are mapped to pipelines in quickwit/quickwit/quickwit-control-plane/src/indexing_scheduler/scheduling/mod.rs Line 204 in 5853b73
EDIT: solved now |
2c62834
to
0648c42
Compare
e6c4ebd
to
761073e
Compare
// To ensure that merges can keep up, we try not to assign more than 3 | ||
// pipelines per indexer for a source (except if there aren't enough nodes). | ||
let target_limit_num_shards_per_indexer_per_source = | ||
3 * MAX_LOAD_PER_PIPELINE.cpu_millis() / source.load_per_shard.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is creating some undesired coupling with the rest of the code:
- we rely on convert_scheduling_solution_to_physical_plan to use the same MAX_LOAD_PER_PIPELINE to create the right amount of pipelines
- we rely on the default load_per_pipeline for non-ingest sources (e.g Kafka) to also use MAX_LOAD_PER_PIPELINE.
b62e363
to
2aa62ec
Compare
d1003c7
to
38cd299
Compare
1a870a8
to
9513e00
Compare
Description
Closes #4470
Closes #5747
Closes #4630
This addresses both the limitation that only 1 merge pipelines can run per indexer at any given time and the fact that nodes systematically end up with all pipelines of a source when using Kafka, even if the number of pipelines for that source is rather large.
How was this PR tested?
Added unit test, should add more