Skip to content

Conversation

@lgeiger
Copy link
Contributor

@lgeiger lgeiger commented Nov 11, 2025

Purpose

rot_pos_emb currently runs on the CPU at the beginning of the model forward pass which means that no GPU computation happens in parallel. This PR caches the index computation to help with this. I also moved the index computation to numpy which gave a small ~25-30% improvement to uncached computations as well.

Test Plan

VLLM_TORCH_PROFILER_DIR=./vllm_profile vllm serve Qwen/Qwen3-VL-30B-A3B-Instruct-FP8 --limit-mm-per-prompt.video 0 --max-model-len 10000

Test Result

Before:

Screenshot 2025-11-11 at 16 06 12

After:

Screenshot 2025-11-11 at 16 05 54

@lgeiger lgeiger requested a review from sighingnow as a code owner November 11, 2025 16:22
def device(self) -> torch.device:
return self.patch_embed.proj.weight.device

@staticmethod
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a static methods to make this work nicely with lru_cache

@mergify mergify bot added the qwen Related to Qwen models label Nov 11, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a performance optimization for the Qwen3VL model by caching the computation of positional embedding indices. The rot_pos_emb method, which previously ran on the CPU for every forward pass, is refactored to use a new cached static method rot_pos_ids. This new method leverages functools.lru_cache to store results and avoid redundant computations for the same image dimensions. Additionally, the computation logic has been migrated from PyTorch to NumPy, which, as the author notes, provides a slight performance improvement even for uncached calls. The changes are well-implemented, correct, and effectively address the performance bottleneck. The code is clearer and more efficient. I have no further comments.

@heheda12345 heheda12345 requested a review from ywang96 November 14, 2025 07:49
Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Left a very small nit

@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 14, 2025
@DarkLight1337
Copy link
Member

Please fix the merge conflicts

lgeiger and others added 3 commits November 15, 2025 15:38
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
@lgeiger lgeiger force-pushed the qwen3-cached-rot-pos branch from 9431e54 to 1b083ab Compare November 15, 2025 15:41
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) November 15, 2025 16:53
@DarkLight1337 DarkLight1337 merged commit 07cadab into vllm-project:main Nov 15, 2025
51 checks passed
@lgeiger lgeiger deleted the qwen3-cached-rot-pos branch November 15, 2025 22:13
geodavic pushed a commit to geodavic/vllm that referenced this pull request Nov 16, 2025
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: George D. Torres <gdavtor@gmail.com>
bwasti pushed a commit to bwasti/vllm that referenced this pull request Nov 17, 2025
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Bram Wasti <bwasti@meta.com>
bringlein pushed a commit to bringlein/vllm that referenced this pull request Nov 26, 2025
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants