[Feature] Code implementation of Async Scheduler #924

cychiuak · 2025-10-23T09:29:08Z

feat(scheduler): Implement asynchronous scheduler to reduce CPU wait overhead

Problem:
The existing synchronous scheduler blocks the CPU while waiting for TPU
results. This wait time creates significant overhead, especially for
small models or decode-heavy scenarios where the scheduler is invoked
frequently.

Solution:
This commit introduces an asynchronous scheduler that decouples the CPU
scheduler from the TPU execution. The CPU no longer blocks and can
continue processing, effectively eliminating the wait-time overhead.

Impact:
Observed a ~33% throughput increase on Llama3.2-1B. This change is
most significant for workloads with high scheduler call frequency.

Benchmark (Llama3.2-1B):

Before: 6.38 req/s (Sync @ cychiuak@8c7e7bb)
After: 8.89 req/s (Async)
How to Reproduce:

HF_TOKEN=<YOUR_HF_TOKEN> VLLM_SERVER_DEV_MODE=1 VLLM_USE_V1=1 \
VLLM_TORCH_PROFILER_DIR=/dev/shm/vllm-prof \
python -m vllm.entrypoints.cli.main serve meta-llama/Llama-3.2-1B \
--max-model-len 800 --max-num-seqs 256 --max-num-batched-tokens 512 \
--no-enable-prefix-caching --tensor-parallel-size 1 --async-scheduling

python -m vllm.entrypoints.cli.main bench serve --backend vllm \
--model meta-llama/Llama-3.2-1B --dataset-name random --random-input-len 230 \
--random-output-len 120 --num-prompts 200 --max-concurrency 3 \
--percentile-metrics "ttft,tpot,itl,e2el" --ignore-eos

py4 · 2025-10-23T18:08:35Z

Tnx for the PR. Please add e2e CI test for this feature to prevent feature from breaking

tpu_inference/runner/tpu_jax_runner.py

Accidental

Signed-off-by: cychiuak <andersonchiu@google.com>

bvrockwell · 2025-10-29T03:08:01Z

Great contribution @yuyanpeng-google !! Excited to see it in action.

Signed-off-by: cychiuak <andersonchiu@google.com>

vipannalla

LGTM, @yuyanpeng-google can follow up on adding the tests to CI after PR is merged.

cychiuak force-pushed the async-scheduler branch from 04609e6 to 887c779 Compare October 23, 2025 09:55

cychiuak changed the title ~~Code implementation of Async Scheduler~~ [Feature] Code implementation of Async Scheduler Oct 23, 2025

py4 reviewed Oct 23, 2025

View reviewed changes

tpu_inference/runner/tpu_jax_runner.py Show resolved Hide resolved

jrplatin previously approved these changes Oct 27, 2025

View reviewed changes

tpu_inference/runner/tpu_jax_runner.py Outdated Show resolved Hide resolved

Code implementation of Async Scheduler

49f11fe

Signed-off-by: cychiuak <andersonchiu@google.com>

cychiuak force-pushed the async-scheduler branch from 887c779 to db4a874 Compare October 29, 2025 11:32

pytests, async code optimisation

816d319

Signed-off-by: cychiuak <andersonchiu@google.com>

cychiuak force-pushed the async-scheduler branch from 1a21402 to 816d319 Compare October 29, 2025 12:02

vipannalla approved these changes Oct 30, 2025

View reviewed changes

py4 merged commit ae06584 into vllm-project:main Oct 30, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Code implementation of Async Scheduler #924

[Feature] Code implementation of Async Scheduler #924

cychiuak commented Oct 23, 2025

Uh oh!

py4 commented Oct 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

bvrockwell commented Oct 29, 2025

Uh oh!

vipannalla left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[Feature] Code implementation of Async Scheduler #924

[Feature] Code implementation of Async Scheduler #924

Conversation

cychiuak commented Oct 23, 2025

Uh oh!

py4 commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bvrockwell commented Oct 29, 2025

Uh oh!

vipannalla left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

py4 commented Oct 23, 2025 •

edited

Loading