Limit to 3 pipelines per node per source #5792

rdettai · 2025-06-10T13:40:32Z

Description

Closes #4470
Closes #5747
Closes #4630

This addresses both the limitation that only 1 merge pipelines can run per indexer at any given time and the fact that nodes systematically end up with all pipelines of a source when using Kafka, even if the number of pipelines for that source is rather large.

How was this PR tested?

Added unit test, should add more

fmassot · 2025-06-10T13:42:52Z

why 3?

rdettai · 2025-06-10T13:46:55Z

why 3?

It's a ratio proposed by Paul here. The rational is that if you you start saturating the systems, merges being 3x faster than indexing, with this ratio merges would be able to keep up.

rdettai · 2025-06-10T13:56:13Z

There are 2 issues with the logic so far:

when you have 4 pipelines, they will be split into 3 and 1, which is not ideally balanced
upon new iterations the extra pipeline might end up on other nodes (e.g 3 on a node, and 1 on each other nodes)

EDIT: 2) is solved in #5808

rdettai · 2025-06-11T10:01:57Z

Actually this is wrong for now, I thought shards in the simplified problem were indexing pipelines in the physical plan, but it's not true. Shards in the simplified problem are physical shards, and they are mapped to pipelines in

quickwit/quickwit/quickwit-control-plane/src/indexing_scheduler/scheduling/mod.rs

Line 204 in 5853b73

fn convert_scheduling_solution_to_physical_plan_single_node_single_source(

EDIT: solved now

rdettai · 2025-06-12T12:21:55Z

quickwit/quickwit-control-plane/src/indexing_scheduler/scheduling/scheduling_logic.rs

+    // To ensure that merges can keep up, we try not to assign more than 3
+    // pipelines per indexer for a source (except if there aren't enough nodes).
+    let target_limit_num_shards_per_indexer_per_source =
+        3 * MAX_LOAD_PER_PIPELINE.cpu_millis() / source.load_per_shard.get();


This is creating some undesired coupling with the rest of the code:

we rely on convert_scheduling_solution_to_physical_plan to use the same MAX_LOAD_PER_PIPELINE to create the right amount of pipelines

we rely on the default load_per_pipeline for non-ingest sources (e.g Kafka) to also use MAX_LOAD_PER_PIPELINE.

rdettai requested a review from fulmicoton-dd June 10, 2025 13:50

rdettai marked this pull request as draft June 11, 2025 08:33

rdettai force-pushed the explicit-scheduling-rescaling branch from c065f94 to 1b57434 Compare June 11, 2025 09:20

Base automatically changed from explicit-scheduling-rescaling to main June 11, 2025 12:47

rdettai force-pushed the limit-3-pipelines-per-node branch from 2c62834 to 0648c42 Compare June 11, 2025 12:47

rdettai changed the base branch from main to test-solution-stability June 11, 2025 12:50

rdettai force-pushed the limit-3-pipelines-per-node branch 2 times, most recently from e6c4ebd to 761073e Compare June 12, 2025 12:16

rdettai commented Jun 12, 2025

View reviewed changes

rdettai force-pushed the test-solution-stability branch from b62e363 to 2aa62ec Compare June 19, 2025 07:53

rdettai force-pushed the limit-3-pipelines-per-node branch from d1003c7 to 38cd299 Compare June 19, 2025 09:05

rdettai changed the base branch from test-solution-stability to impr-shard-collocation June 19, 2025 09:06

rdettai marked this pull request as ready for review June 19, 2025 09:14

rdettai added 2 commits June 19, 2025 11:51

Add limit of three pipelines per node

630c781

Small code simplification

9513e00

rdettai force-pushed the limit-3-pipelines-per-node branch from 1a870a8 to 9513e00 Compare June 19, 2025 09:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Limit to 3 pipelines per node per source #5792

Limit to 3 pipelines per node per source #5792

Uh oh!

rdettai commented Jun 10, 2025 •

edited

Loading

Uh oh!

fmassot commented Jun 10, 2025

Uh oh!

rdettai commented Jun 10, 2025

Uh oh!

rdettai commented Jun 10, 2025 •

edited

Loading

Uh oh!

rdettai commented Jun 11, 2025 •

edited

Loading

Uh oh!

rdettai Jun 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

Limit to 3 pipelines per node per source #5792

Are you sure you want to change the base?

Limit to 3 pipelines per node per source #5792

Uh oh!

Conversation

rdettai commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How was this PR tested?

Uh oh!

fmassot commented Jun 10, 2025

Uh oh!

rdettai commented Jun 10, 2025

Uh oh!

rdettai commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rdettai commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rdettai Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rdettai commented Jun 10, 2025 •

edited

Loading

rdettai commented Jun 10, 2025 •

edited

Loading

rdettai commented Jun 11, 2025 •

edited

Loading

rdettai Jun 12, 2025 •

edited

Loading