[Models] Prevent CUDA sync in Qwen2.5-VL #24741

lgeiger · 2025-09-12T11:08:35Z

This is a followup to #24443 from @david6666666

When I profiled Qwen2.5-VL it seems like an implicit CUDA sync is still happening during the indexing:

vllm/vllm/model_executor/models/qwen2_5_vl.py

Line 826 in 59d5d2c

hidden_states = hidden_states[reverse_indices, :]

This is because reverse_indices now is computed on CPU and needs to be copied in a blocking way to the GPU before the indexing operation can happen.

This PR copies the reverse_indices to the GPU in a non blocking removing this sync. An alternative would be to call invert_permutation on the window_index GPU tensor which would run it on GPU. Since it's a simple indexing operation it's probably not worth doing on GPU.

I ran end2end benchmarks on the lmarena-ai/VisionArena-Chat dataset but didn't see any changes in performance which I don't quite understand. However, the profile clearly shows that the CUDA sync is gone with this change which is a good thing in general and might be relevant with bit multimodal inputs.

Related to #23884, so tagging @ywang96 for review.

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

gemini-code-assist

Code Review

This pull request introduces a performance optimization for the Qwen2.5-VL model by eliminating an implicit CUDA synchronization. The change correctly uses pinned memory for a CPU-based tensor (reverse_indices) and then performs a non-blocking copy to the GPU. This is a standard and effective technique to improve performance by avoiding stalls in the CUDA stream. The implementation appears correct and directly addresses the issue described. I have not identified any issues of high or critical severity in these changes.

lgeiger · 2025-09-12T14:03:04Z

Merging main to re-trigger failing CI

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Signed-off-by: bbartels <benjamin@bartels.dev>

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

[Models] Prevent CUDA sync in Qwen2.5-VL

bb2bcf3

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

lgeiger requested a review from sighingnow as a code owner September 12, 2025 11:08

mergify bot added the qwen Related to Qwen models label Sep 12, 2025

gemini-code-assist bot reviewed Sep 12, 2025

View reviewed changes

Isotr0py requested a review from ywang96 September 12, 2025 13:40

DarkLight1337 approved these changes Sep 12, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) September 12, 2025 13:45

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 12, 2025

Merge branch 'main' into qwen-gpu-sync

6787c54

DarkLight1337 merged commit b0d1213 into vllm-project:main Sep 12, 2025
43 checks passed

lgeiger deleted the qwen-gpu-sync branch September 12, 2025 16:43

skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025

[Models] Prevent CUDA sync in Qwen2.5-VL (vllm-project#24741)

7357361

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

dsxsteven pushed a commit to dsxsteven/vllm_splitPR that referenced this pull request Sep 15, 2025

[Models] Prevent CUDA sync in Qwen2.5-VL (vllm-project#24741)

fdfb21f

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

bbartels pushed a commit to bbartels/vllm that referenced this pull request Sep 15, 2025

[Models] Prevent CUDA sync in Qwen2.5-VL (vllm-project#24741)

6612d20

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Signed-off-by: bbartels <benjamin@bartels.dev>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Models] Prevent CUDA sync in Qwen2.5-VL (vllm-project#24741)

aee780e

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Models] Prevent CUDA sync in Qwen2.5-VL #24741

[Models] Prevent CUDA sync in Qwen2.5-VL #24741

Uh oh!

lgeiger commented Sep 12, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

lgeiger commented Sep 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Models] Prevent CUDA sync in Qwen2.5-VL #24741

[Models] Prevent CUDA sync in Qwen2.5-VL #24741

Uh oh!

Conversation

lgeiger commented Sep 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

lgeiger commented Sep 12, 2025

Uh oh!

Uh oh!

Uh oh!

lgeiger commented Sep 12, 2025 •

edited by github-actions bot

Loading