Motivation
In some cases, multiple sub-tasks are grouped in a batch and must work together. A typical use case is multi-node inference: each inference request requires multiple nodes to handle it cooperatively (e.g., tensor parallelism, pipeline parallelism in LLM inference).
Function Specification
- Batch allocation: The scheduler must allocate resources in batch size increments:
n, 2n, 3n, ... where n is the batch size
- Session configuration: Support batch configuration as a session-level attribute
- Dedicated task indexing: Each executor in a batch should fetch only its dedicated index task (executor 0 gets task 0 in batch, executor 1 gets task 1, etc.)
- Backward compatibility: Default batch size of 1 maintains current single-task-per-request behavior
Solutions
N/A
Additional context
N/A
Motivation
In some cases, multiple sub-tasks are grouped in a batch and must work together. A typical use case is multi-node inference: each inference request requires multiple nodes to handle it cooperatively (e.g., tensor parallelism, pipeline parallelism in LLM inference).
Function Specification
n, 2n, 3n, ...wherenis the batch sizeSolutions
N/A
Additional context
N/A