Skip to content

Conversation

wwl2755
Copy link
Contributor

@wwl2755 wwl2755 commented Sep 25, 2025

Fix #14438

LAST EDIT: 09/26

  1. Basic idea to make the profiling mm data to be more configurable.
main:
limit_mm_per_prompt={"image": 1, "video": 1}

PR:
limit_mm_per_prompt={"image": {"count": 1, "width": 512, "height": 512}}
  1. Backward compatibility is provided. Fix format is also supported.
limit_mm_per_prompt={"image": {"count": 1, "width": 512, "height": 512}, "video": 0}
  1. For image, we can specify count, width, height.
    For video, we can sepcify count, width, height, num_frames.
    For audio, we can specify count, length.

Known issues/TODOs:

  • Update related doc and tests
  • Only tested with image, will test on videos and audio soon
  • The width and height are used directly without resize/reshape (not sure if it is good)
  • Currently the change is only applied to qwen2_vl, need to adapt to other models

Test

Tested on A100-40GB

#main
vllm serve Qwen/Qwen2.5-VL-3B-Instruct
Available KV cache memory: 5.26 GiB

#PR
vllm serve Qwen/Qwen2.5-VL-3B-Instruct
Available KV cache memory: 5.26 GiB

vllm serve Qwen/Qwen2.5-VL-3B-Instruct \
--limit-mm-per-prompt '{"image": 1}'
Available KV cache memory: 5.26 GiB

vllm serve Qwen/Qwen2.5-VL-3B-Instruct \
--limit-mm-per-prompt '{"image": 1, "video": 0}'
Available KV cache memory: 25.05 GiB

vllm serve Qwen/Qwen2.5-VL-3B-Instruct \
--limit-mm-per-prompt.image 1 --limit-mm-per-prompt.video 0
Available KV cache memory: 25.05 GiB

vllm serve Qwen/Qwen2.5-VL-3B-Instruct \
--limit-mm-per-prompt '{"image": 0, "video": 0}'
Available KV cache memory: 29.08 GiB

vllm serve Qwen/Qwen2.5-VL-3B-Instruct \
--limit-mm-per-prompt '{"image": {"count": 1, "width": 512, "height": 512}, "video": 0}'
Available KV cache memory: 27.82 GiB

vllm serve Qwen/Qwen2.5-VL-3B-Instruct \
--limit-mm-per-prompt '{"image": {"count": 1, "width": 5120, "height": 5120}, "video": 0}'
Available KV cache memory: 25.05 GiB

vllm serve Qwen/Qwen2.5-VL-3B-Instruct \
--limit-mm-per-prompt '{"image": {"count": 1, "width": 5120, "height": 5120}, "video": {"count": 1, "num_frames": 5, "width": 5120, "height": 5120}}'
Available KV cache memory: 18.45 GiB

#audio
vllm serve Qwen/Qwen2-Audio-7B-Instruct
Available KV cache memory: 19.30 GiB

vllm serve Qwen/Qwen2-Audio-7B-Instruct \
--limit-mm-per-prompt '{"image": 0, "video": 0, "audio": {"count": 1, "length": 1000}}'
Available KV cache memory: 19.31 GiB

Copy link

mergify bot commented Sep 25, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @wwl2755.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 25, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable feature for making multimodal profiling more configurable, allowing for more precise memory allocation. The changes, which include new dataclasses for modality options and updated configuration parsing, are well-structured and maintain backward compatibility. However, I've identified a critical issue with the use of overly broad exception handling in the profiling logic. This could mask underlying bugs and lead to incorrect memory profiling, potentially causing runtime errors. My review includes specific suggestions to address this by narrowing the exception scope and improving debuggability.

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, can you fix the merge conflicts?

Copy link

mergify bot commented Sep 26, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @wwl2755.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 26, 2025
@mergify mergify bot added the qwen Related to Qwen models label Sep 26, 2025
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
@DarkLight1337
Copy link
Member

Feel free to update each model now

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
@mergify mergify bot added documentation Improvements or additions to documentation deepseek Related to DeepSeek models llama Related to Llama models labels Oct 2, 2025
Copy link

mergify bot commented Oct 2, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @wwl2755.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 2, 2025
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
@mergify mergify bot removed the needs-rebase label Oct 2, 2025
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/gemini review

@DarkLight1337
Copy link
Member

Thanks, I'm waiting for upstream CI to be fixed before adding ready label

@DarkLight1337
Copy link
Member

Can you merge from main again to get the CI running?

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) October 2, 2025 17:52
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 2, 2025
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
auto-merge was automatically disabled October 2, 2025 20:34

Head branch was pushed to by a user without write access

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests pass so LGTM, thanks!

@vllm-bot vllm-bot merged commit 79aa244 into vllm-project:main Oct 3, 2025
53 of 55 checks passed
rahul-tuli pushed a commit to neuralmagic/vllm that referenced this pull request Oct 3, 2025
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
@wwl2755 wwl2755 deleted the mm-profiling branch October 3, 2025 14:57
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deepseek Related to DeepSeek models documentation Improvements or additions to documentation llama Related to Llama models multi-modality Related to multi-modality (#4194) qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC]: Configurable multi-modal data for profiling
5 participants