Skip to content

[RFC]: Async scheduler and Multi-step in v1 #20727

@chengda-wu

Description

@chengda-wu

🚀 The feature, motivation and pitch

We propose reducing the framework’s overhead by introducing two key improvements—an asynchronous scheduler and a multi-step approach—all with minimal code modifications.

  1. Asynchronous Scheduler

For the async scheduler, we suggest the following design:

Image

This solution requires changes only in EngineCore without modifying other modules. By incorporating an update_schedule module, the framework can also seamlessly support speculative decoding.

  1. Multi-Step Approach

Although v1’s preprocessing and postprocessing are lighter compared to v0, we still observe notable inefficiencies on some platforms (e.g., ARM + XPU). In particular, there exists a significant gap between input preparation and launching the forward model, and the device-to-host (D2H) communication for each model output further increases overall latency.

To address these issues, we introduce a multi-step strategy that differs from v0 in several key ways:
• We propose a simple_prepare_input function to reduce unnecessary CPU operations.
• We defer the D2H communication to avoid excessive stream synchronizations.
• We integrate the multi-step process with the asynchronous scheduler, thereby alleviating the scheduler’s load when handling multiple outputs.

Image

─────────────────────────────

This refined solution minimizes code modifications while significantly improving performance and reducing overheads.

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions