[RFC]: Async scheduler and Multi-step in v1

### 🚀 The feature, motivation and pitch

We propose reducing the framework’s overhead by introducing two key improvements—an asynchronous scheduler and a multi-step approach—all with minimal code modifications.

1. Asynchronous Scheduler

For the async scheduler, we suggest the following design:

<img width="1532" height="223" alt="Image" src="https://github.com/user-attachments/assets/99ee3ac3-4ee0-4ef7-8dd6-740ee47259dd" />

This solution requires changes only in EngineCore without modifying other modules. By incorporating an update_schedule module, the framework can also seamlessly support speculative decoding.

2. Multi-Step Approach

Although v1’s preprocessing and postprocessing are lighter compared to v0, we still observe notable inefficiencies on some platforms (e.g., ARM + XPU). In particular, there exists a significant gap between input preparation and launching the forward model, and the device-to-host (D2H) communication for each model output further increases overall latency.

To address these issues, we introduce a multi-step strategy that differs from v0 in several key ways:
• We propose a simple_prepare_input function to reduce unnecessary CPU operations.
• We defer the D2H communication to avoid excessive stream synchronizations.
• We integrate the multi-step process with the asynchronous scheduler, thereby alleviating the scheduler’s load when handling multiple outputs.

<img width="1099" height="248" alt="Image" src="https://github.com/user-attachments/assets/3d75b878-e5ee-4a3d-83c1-48eb6fcc0afc" />

─────────────────────────────

This refined solution minimizes code modifications while significantly improving performance and reducing overheads.

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[RFC]: Async scheduler and Multi-step in v1 #20727

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[RFC]: Async scheduler and Multi-step in v1 #20727

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions