Skip to content

Conversation

@cychiuak
Copy link
Collaborator

feat(scheduler): Implement asynchronous scheduler to reduce CPU wait overhead

Problem:
The existing synchronous scheduler blocks the CPU while waiting for TPU
results. This wait time creates significant overhead, especially for
small models or decode-heavy scenarios where the scheduler is invoked
frequently.

Solution:
This commit introduces an asynchronous scheduler that decouples the CPU
scheduler from the TPU execution. The CPU no longer blocks and can
continue processing, effectively eliminating the wait-time overhead.

Impact:
Observed a ~33% throughput increase on Llama3.2-1B. This change is
most significant for workloads with high scheduler call frequency.

Benchmark (Llama3.2-1B):

Before: 6.38 req/s (Sync @ cychiuak@8c7e7bb)
After: 8.89 req/s (Async)
How to Reproduce:

HF_TOKEN=<YOUR_HF_TOKEN> VLLM_SERVER_DEV_MODE=1 VLLM_USE_V1=1 \
VLLM_TORCH_PROFILER_DIR=/dev/shm/vllm-prof \
python -m vllm.entrypoints.cli.main serve meta-llama/Llama-3.2-1B \
--max-model-len 800 --max-num-seqs 256 --max-num-batched-tokens 512 \
--no-enable-prefix-caching --tensor-parallel-size 1 --async-scheduling
python -m vllm.entrypoints.cli.main bench serve --backend vllm \
--model meta-llama/Llama-3.2-1B --dataset-name random --random-input-len 230 \
--random-output-len 120 --num-prompts 200 --max-concurrency 3 \
--percentile-metrics "ttft,tpot,itl,e2el" --ignore-eos

@cychiuak cychiuak changed the title Code implementation of Async Scheduler [Feature] Code implementation of Async Scheduler Oct 23, 2025
@py4
Copy link
Collaborator

py4 commented Oct 23, 2025

Tnx for the PR. Please add e2e CI test for this feature to prevent feature from breaking

jrplatin
jrplatin previously approved these changes Oct 27, 2025
@jrplatin jrplatin dismissed their stale review October 27, 2025 03:21

Accidental

Signed-off-by: cychiuak <andersonchiu@google.com>
@bvrockwell
Copy link
Collaborator

Great contribution @yuyanpeng-google !! Excited to see it in action.

Signed-off-by: cychiuak <andersonchiu@google.com>
Copy link
Collaborator

@vipannalla vipannalla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @yuyanpeng-google can follow up on adding the tests to CI after PR is merged.

@py4 py4 merged commit ae06584 into vllm-project:main Oct 30, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants