Skip to content

Batch in Session #400

@k82cn

Description

@k82cn

Motivation

In some cases, multiple sub-tasks are grouped in a batch and must work together. A typical use case is multi-node inference: each inference request requires multiple nodes to handle it cooperatively (e.g., tensor parallelism, pipeline parallelism in LLM inference).

Function Specification

  1. Batch allocation: The scheduler must allocate resources in batch size increments: n, 2n, 3n, ... where n is the batch size
  2. Session configuration: Support batch configuration as a session-level attribute
  3. Dedicated task indexing: Each executor in a batch should fetch only its dedicated index task (executor 0 gets task 0 in batch, executor 1 gets task 1, etc.)
  4. Backward compatibility: Default batch size of 1 maintains current single-task-per-request behavior

Solutions

N/A

Additional context

N/A

Metadata

Metadata

Assignees

Labels

kind/featureNew feature or requestpriority/p1High priorityrferequest for enhancement
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions