[Multi Modal] Configurable MM Profiling #25631

wwl2755 · 2025-09-25T04:00:54Z

LAST EDIT: 09/26

Basic idea to make the profiling mm data to be more configurable.

main:
limit_mm_per_prompt={"image": 1, "video": 1}

PR:
limit_mm_per_prompt={"image": {"count": 1, "width": 512, "height": 512}}

Backward compatibility is provided. Fix format is also supported.

limit_mm_per_prompt={"image": {"count": 1, "width": 512, "height": 512}, "video": 0}

For image, we can specify count, width, height.
For video, we can sepcify count, width, height, num_frames.
For audio, we can specify count, length.

Known issues/TODOs:

Update related doc and tests
Only tested with image, will test on videos and audio soon
The width and height are used directly without resize/reshape (not sure if it is good)
Currently the change is only applied to qwen2_vl, need to adapt to other models

Test

Tested on A100-40GB

#main
vllm serve Qwen/Qwen2.5-VL-3B-Instruct
Available KV cache memory: 5.26 GiB

#PR
vllm serve Qwen/Qwen2.5-VL-3B-Instruct
Available KV cache memory: 5.26 GiB

vllm serve Qwen/Qwen2.5-VL-3B-Instruct \
--limit-mm-per-prompt '{"image": 1}'
Available KV cache memory: 5.26 GiB

vllm serve Qwen/Qwen2.5-VL-3B-Instruct \
--limit-mm-per-prompt '{"image": 1, "video": 0}'
Available KV cache memory: 25.05 GiB

vllm serve Qwen/Qwen2.5-VL-3B-Instruct \
--limit-mm-per-prompt.image 1 --limit-mm-per-prompt.video 0
Available KV cache memory: 25.05 GiB

vllm serve Qwen/Qwen2.5-VL-3B-Instruct \
--limit-mm-per-prompt '{"image": 0, "video": 0}'
Available KV cache memory: 29.08 GiB

vllm serve Qwen/Qwen2.5-VL-3B-Instruct \
--limit-mm-per-prompt '{"image": {"count": 1, "width": 512, "height": 512}, "video": 0}'
Available KV cache memory: 27.82 GiB

vllm serve Qwen/Qwen2.5-VL-3B-Instruct \
--limit-mm-per-prompt '{"image": {"count": 1, "width": 5120, "height": 5120}, "video": 0}'
Available KV cache memory: 25.05 GiB

vllm serve Qwen/Qwen2.5-VL-3B-Instruct \
--limit-mm-per-prompt '{"image": {"count": 1, "width": 5120, "height": 5120}, "video": {"count": 1, "num_frames": 5, "width": 5120, "height": 5120}}'
Available KV cache memory: 18.45 GiB

#audio
vllm serve Qwen/Qwen2-Audio-7B-Instruct
Available KV cache memory: 19.30 GiB

vllm serve Qwen/Qwen2-Audio-7B-Instruct \
--limit-mm-per-prompt '{"image": 0, "video": 0, "audio": {"count": 1, "length": 1000}}'
Available KV cache memory: 19.31 GiB

mergify · 2025-09-25T04:01:42Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @wwl2755.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

gemini-code-assist

Code Review

This pull request introduces a valuable feature for making multimodal profiling more configurable, allowing for more precise memory allocation. The changes, which include new dataclasses for modality options and updated configuration parsing, are well-structured and maintain backward compatibility. However, I've identified a critical issue with the use of overly broad exception handling in the profiling logic. This could mask underlying bugs and lead to incorrect memory profiling, potentially causing runtime errors. My review includes specific suggestions to address this by narrowing the exception scope and improving debuggability.

vllm/multimodal/profiling.py

DarkLight1337

Thanks for working on this, can you fix the merge conflicts?

vllm/config/multimodal.py

vllm/engine/arg_utils.py

mergify · 2025-09-26T03:58:35Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @wwl2755.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

vllm/config/multimodal.py

vllm/multimodal/registry.py

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

DarkLight1337 · 2025-10-02T02:39:52Z

Feel free to update each model now

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

mergify · 2025-10-02T03:58:25Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @wwl2755.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

DarkLight1337

/gemini review

DarkLight1337 · 2025-10-02T06:33:11Z

Thanks, I'm waiting for upstream CI to be fixed before adding ready label

DarkLight1337 · 2025-10-02T13:03:56Z

Can you merge from main again to get the CI running?

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

DarkLight1337

Tests pass so LGTM, thanks!

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Rahul Tuli <rtuli@redhat.com>

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

wwl2755 requested review from simon-mo, WoosukKwon, youkaichao, robertgshaw2-redhat, mgoin, tlrmchlsmth, houseroad, hmellor, yewentao256, ProExpertProg, DarkLight1337, ywang96 and NickLucche as code owners September 25, 2025 04:00

wwl2755 marked this pull request as draft September 25, 2025 04:01

mergify bot added the multi-modality Related to multi-modality (#4194) label Sep 25, 2025

mergify bot added the needs-rebase label Sep 25, 2025

gemini-code-assist bot reviewed Sep 25, 2025

View reviewed changes

vllm/multimodal/profiling.py Outdated Show resolved Hide resolved

vllm/multimodal/profiling.py Outdated Show resolved Hide resolved

vllm/multimodal/profiling.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Sep 25, 2025

View reviewed changes

vllm/config/multimodal.py Outdated Show resolved Hide resolved

vllm/config/multimodal.py Outdated Show resolved Hide resolved

Isotr0py mentioned this pull request Sep 25, 2025

[VLM] Update Qwen3-VL max_num_video_tokens calculation for configurable video profiling #25557

Merged

5 tasks

wwl2755 force-pushed the mm-profiling branch from 2df6898 to 85c6947 Compare September 25, 2025 20:25

mergify bot removed the needs-rebase label Sep 25, 2025

DarkLight1337 reviewed Sep 26, 2025

View reviewed changes

vllm/config/multimodal.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Sep 26, 2025

View reviewed changes

vllm/engine/arg_utils.py Outdated Show resolved Hide resolved

mergify bot added the needs-rebase label Sep 26, 2025

DarkLight1337 reviewed Sep 26, 2025

View reviewed changes

vllm/config/multimodal.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Sep 26, 2025

View reviewed changes

vllm/multimodal/registry.py Show resolved Hide resolved

mergify bot added the qwen Related to Qwen models label Sep 26, 2025

hmellor added 2 commits October 1, 2025 12:22

Use pydantic to validate the new classes

6a6f0ab

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Validate limit_per_prompt inside MultiModalConfig

d8e3872

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

add all models

ee66a9e

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

wwl2755 requested a review from patrickvonplaten as a code owner October 2, 2025 03:57

mergify bot added documentation Improvements or additions to documentation deepseek Related to DeepSeek models llama Related to Llama models labels Oct 2, 2025

mergify bot added the needs-rebase label Oct 2, 2025

Merge branch 'main' of github.com:wwl2755/vllm into mm-profiling

738e33d

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

mergify bot removed the needs-rebase label Oct 2, 2025

DarkLight1337 reviewed Oct 2, 2025

View reviewed changes

Merge branch 'main' of github.com:wwl2755/vllm into mm-profiling

c3c79ca

DarkLight1337 enabled auto-merge (squash) October 2, 2025 17:52

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 2, 2025

fix import error

bd74d03

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

auto-merge was automatically disabled October 2, 2025 20:34
Head branch was pushed to by a user without write access

wwl2755 added 2 commits October 2, 2025 23:25

fix tests

8b0bea0

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

fix mllama4 test

79343c3

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

DarkLight1337 approved these changes Oct 3, 2025

View reviewed changes

vllm-bot merged commit 79aa244 into vllm-project:main Oct 3, 2025
53 of 55 checks passed

wwl2755 deleted the mm-profiling branch October 3, 2025 14:57

ywang96 mentioned this pull request Oct 3, 2025

[Bugfix] Fix qwen3 vl dummy data generation with overrides #26193

Merged

5 tasks

wwl2755 mentioned this pull request Oct 3, 2025

[MM][Doc] Add documentation for configurable mm profiling #26200

Open

DarkLight1337 mentioned this pull request Oct 4, 2025

[Model] Support nested structures for TensorSchema #26212

Merged

5 tasks

Uh oh!

[Multi Modal] Configurable MM Profiling #25631

[Multi Modal] Configurable MM Profiling #25631

Uh oh!

Conversation

wwl2755 commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test

Uh oh!

mergify bot commented Sep 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Sep 26, 2025

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Oct 2, 2025

Uh oh!

mergify bot commented Oct 2, 2025

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Oct 2, 2025

Uh oh!

DarkLight1337 commented Oct 2, 2025

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wwl2755 commented Sep 25, 2025 •

edited

Loading