[Models][Qwen3VL] Speedup `fast_pos_embed_interpolate` #26647

lgeiger · 2025-10-11T20:29:01Z

Purpose

Followup on #25337 and #25347. fast_pos_embed_interpolate launches many small cuda ops so this PR slightly simplifies and optimised the implementation.
/cc @ywang96 @Isotr0py

Test Plan & Results

I verified that the new implementation doesn't change the computation.

A quick micro benchmark on an H100 shows a 15% speedup of fast_pos_embed_interpolate and I don't think it reduces readability of the code.

import torch
import torch.nn as nn


class Qwen3_VisionTransformer(nn.Module):
    def __init__(self, hidden_size, num_position_embeddings, spatial_merge_size):
        super().__init__()
        self.spatial_merge_size = spatial_merge_size
        self.num_grid_per_side = int(num_position_embeddings**0.5)
        self.dtype = torch.bfloat16
        self.device = torch.device("cuda:0")
        self.pos_embed = nn.Embedding(
            num_position_embeddings, hidden_size, dtype=self.dtype, device=self.device
        )

    def fast_pos_embed_interpolate(self, grid_thw: list[list[int]]) -> torch.Tensor:
        ...

    def bench_fast_pos_embed_interpolate(self, grid_thw: list[list[int]]):
        self.fast_pos_embed_interpolate(grid_thw)
        torch.cuda.synchronize()


model = Qwen3_VisionTransformer(
    hidden_size=1152, num_position_embeddings=2304, spatial_merge_size=2
)

grid_thw = [[1, 64, 48], [1, 64, 48], [1, 64, 48], [1, 64, 48], [1, 64, 48]]

model.bench_fast_pos_embed_interpolate(grid_thw)
%timeit model.bench_fast_pos_embed_interpolate(grid_thw)

main:    1.33 ms ± 30.9 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
This PR: 1.13 ms ± 3.85 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

lgeiger · 2025-10-11T20:30:45Z

vllm/model_executor/models/qwen3_vl.py

-            weights = weights.to(
-                dtype=self.dtype, device=self.device, non_blocking=True
-            )
+            weights = weights.to(dtype=self.dtype)


weights will already be on self.device so no need to copy it again.

gemini-code-assist

Code Review

This pull request introduces several well-reasoned optimizations to the fast_pos_embed_interpolate function in Qwen3VL. The changes, including refactoring weight calculations, vectorizing index computations, and using more efficient PyTorch operations like .sum() instead of unbind() followed by manual addition, are mathematically sound and contribute to the reported 11% performance improvement. The code is now more concise and idiomatic. I've reviewed the changes and found them to be correct and beneficial for performance without sacrificing readability. This is a solid improvement.

lgeiger · 2025-10-11T22:54:53Z

The grid_thw in the above benchmark came from an actual model request. I am now wondering whether it would make sense to cache the result for each t, h, w which in the above case for 5 images of the same shape would significantly speedup this code. I guess this could cause slightly higher memory usage but we could either put this cache on the CPU or at the very least handle the case where all items in the grid_thw list are the same.

ywang96

Thanks for the contribution!

Re: caching t, h, w - I had the same idea when I first cleaned up the code here that we could build a small cache on CPU for this, but I also wonder whether the h2d cost is worth the effort

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

lgeiger · 2025-10-12T11:52:03Z

Thanks for the fast review. I also updated the tiling logic in 24b6717 which slightly improves things further. I updated the benchmarks above.

…26647) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Signed-off-by: 1994 <1994@users.noreply.github.com>

…26647) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>

…26647) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Signed-off-by: bbartels <benjamin@bartels.dev>

…26647) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

…26647) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

…26647) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

…26647) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

lgeiger · 2025-11-11T01:40:02Z

Re: caching t, h, w - I had the same idea when I first cleaned up the code here that we could build a small cache on CPU for this, but I also wonder whether the h2d cost is worth the effort

I doubt that this would actually be a performance win since I think the GPU to CPU-cache transfer would need to be synchronous to be safe which might actually hurt performance more.

However, I could add a simple cache that caches the GPU tensors only for the duration of this functions which at least would help for requests with multiple images of the same size to prevent re-computation within the loop. @ywang96 let me know what you think, I'm happy to make a PR.

lgeiger added 2 commits October 11, 2025 21:19

Simplify and speedup fast_pos_embed_interpolate

18cfb2a

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

Speedup fast_pos_embed_interpolate by using more vectorized ops

95103bb

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

lgeiger requested a review from sighingnow as a code owner October 11, 2025 20:29

mergify bot added the qwen Related to Qwen models label Oct 11, 2025

lgeiger commented Oct 11, 2025

View reviewed changes

gemini-code-assist bot reviewed Oct 11, 2025

View reviewed changes

ywang96 approved these changes Oct 12, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) October 12, 2025 09:42

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 12, 2025

Simplify tiling

24b6717

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

auto-merge was automatically disabled October 12, 2025 11:50
Head branch was pushed to by a user without write access

Isotr0py merged commit a6049be into vllm-project:main Oct 12, 2025
54 checks passed

lgeiger deleted the qwen3-pos-interpolate branch October 12, 2025 17:28

1994 pushed a commit to 1994/vllm that referenced this pull request Oct 14, 2025

[Models][Qwen3VL] Speedup fast_pos_embed_interpolate (vllm-project#…

fb3d2d9

…26647) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Signed-off-by: 1994 <1994@users.noreply.github.com>

Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025

[Models][Qwen3VL] Speedup fast_pos_embed_interpolate (vllm-project#…

d93f53c

…26647) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>

bbartels pushed a commit to bbartels/vllm that referenced this pull request Oct 16, 2025

[Models][Qwen3VL] Speedup fast_pos_embed_interpolate (vllm-project#…

5d73644

…26647) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Signed-off-by: bbartels <benjamin@bartels.dev>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Models][Qwen3VL] Speedup fast_pos_embed_interpolate (vllm-project#…

7b0e0db

…26647) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

[Models][Qwen3VL] Speedup fast_pos_embed_interpolate (vllm-project#…

b56ca7b

…26647) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[Models][Qwen3VL] Speedup fast_pos_embed_interpolate (vllm-project#…

2e8a891

…26647) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[Models][Qwen3VL] Speedup fast_pos_embed_interpolate (vllm-project#…

424ffde

…26647) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[Models][Qwen3VL] Speedup fast_pos_embed_interpolate (vllm-project#…

8c64513

…26647) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

lgeiger mentioned this pull request Nov 11, 2025

[Model][Qwen3VL] Slighly speedup fast_pos_embed_interpolate #28434

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Models][Qwen3VL] Speedup `fast_pos_embed_interpolate` #26647

[Models][Qwen3VL] Speedup `fast_pos_embed_interpolate` #26647

Uh oh!

lgeiger commented Oct 11, 2025 •

edited by github-actions bot

Loading

Uh oh!

lgeiger Oct 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

lgeiger commented Oct 11, 2025

Uh oh!

ywang96 left a comment •

edited

Loading

Uh oh!

lgeiger commented Oct 12, 2025

Uh oh!

Uh oh!

lgeiger commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Models][Qwen3VL] Speedup fast_pos_embed_interpolate #26647

[Models][Qwen3VL] Speedup fast_pos_embed_interpolate #26647

Uh oh!

Conversation

lgeiger commented Oct 11, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan & Results

Uh oh!

lgeiger Oct 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

lgeiger commented Oct 11, 2025

Uh oh!

ywang96 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lgeiger commented Oct 12, 2025

Uh oh!

Uh oh!

lgeiger commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Models][Qwen3VL] Speedup `fast_pos_embed_interpolate` #26647

[Models][Qwen3VL] Speedup `fast_pos_embed_interpolate` #26647

lgeiger commented Oct 11, 2025 •

edited by github-actions bot

Loading

ywang96 left a comment •

edited

Loading