[Benchmarks] Add MMVU video dataset support and clean up deprecated datasets #24719

Isotr0py · 2025-09-12T04:52:32Z

Purpose

Add https://huggingface.co/datasets/yale-nlp/MMVU video dataset support for out-of-box video benchmarking
Remove benchmarks/benchmark_dataset.py as benchmarks/benchmark_serving.py etc have been deprecated by vllm bench.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

gemini-code-assist

Code Review

This pull request adds support for the MMVU video dataset for benchmarking and removes the deprecated benchmark_dataset.py file. The changes are well-structured, but I've identified an issue in the new MMVUDataset implementation where the sample method is missing the no_oversample parameter. This would cause the --no-oversample CLI flag to be ignored for this dataset. I've provided a code suggestion to fix this inconsistency.

vllm/benchmarks/datasets.py

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

DarkLight1337 · 2025-09-12T14:14:14Z

Can you post the result of benchmarking some models on this dataset?

tjtanaa · 2025-09-12T14:17:51Z

@Isotr0py amazing feature. Thank you for this PR. Need this.

david6666666 · 2025-09-15T02:57:17Z

Nice one, We really need a video benchmark dataset.

mergify · 2025-09-16T13:06:53Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Isotr0py.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Isotr0py · 2025-09-16T14:42:33Z

Can you post the result of benchmarking some models on this dataset?

Test suite: T4 GPU x 2

vllm serve Qwen/Qwen2-VL-2B-Instruct -tp 2 --max-model-len 32768 --enforce-eager --max-num-seqs 5

Benchmark commands:

vllm bench serve \
  --backend openai-chat \
  --endpoint-type openai-chat \
  --model Qwen/Qwen2-VL-2B-Instruct \
  --endpoint /v1/chat/completions \
  --dataset-name hf \
  --dataset-path yale-nlp/MMVU \
  --hf-split validation \
  --num-prompts 10

Results:

Initial test run completed. Starting main benchmark run...
Traffic request rate: inf
Burstiness factor: 1.0 (Poisson process)
Maximum request concurrency: None
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [02:10<00:00, 13.00s/it]
============ Serving Benchmark Result ============
Successful requests:                     10        
Benchmark duration (s):                  130.04    
Total input tokens:                      578       
Total generated tokens:                  605       
Request throughput (req/s):              0.08      
Output token throughput (tok/s):         4.65      
Total Token throughput (tok/s):          9.10      
---------------Time to First Token----------------
Mean TTFT (ms):                          64757.39  
Median TTFT (ms):                        58903.87  
P99 TTFT (ms):                           112885.96 
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          837.82    
Median TPOT (ms):                        771.18    
P99 TPOT (ms):                           1754.11   
---------------Inter-token Latency----------------
Mean ITL (ms):                           632.08    
Median ITL (ms):                         316.43    
P99 ITL (ms):                            3830.31   
==================================================

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

…atasets (vllm-project#24719) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

…atasets (vllm-project#24719) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: charlifu <charlifu@amd.com>

Isotr0py and others added 5 commits September 10, 2025 15:51

init mmvu dataset

fc11101

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

update readme

49b6ef4

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

fix

cf233c1

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Merge branch 'vllm-project:main' into mmvu-benchmark

421b969

remove benchmark_dataset.py

c67afa5

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Isotr0py requested a review from ywang96 September 12, 2025 04:52

mergify bot added the performance Performance-related issues label Sep 12, 2025

gemini-code-assist bot reviewed Sep 12, 2025

View reviewed changes

vllm/benchmarks/datasets.py Show resolved Hide resolved

fix missing no_oversample

ba3a3eb

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

mergify bot added the needs-rebase label Sep 16, 2025

Merge branch 'main' into mmvu-benchmark

9cfdb42

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

update doc

afc9d28

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Isotr0py requested a review from hmellor as a code owner September 16, 2025 14:45

mergify bot added documentation Improvements or additions to documentation and removed needs-rebase labels Sep 16, 2025

DarkLight1337 approved these changes Sep 16, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) September 16, 2025 15:20

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 16, 2025

DarkLight1337 merged commit 5a411ef into vllm-project:main Sep 17, 2025
46 of 54 checks passed

Isotr0py deleted the mmvu-benchmark branch September 17, 2025 03:39

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Benchmarks] Add MMVU video dataset support and clean up deprecated d…

be287d9

…atasets (vllm-project#24719) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025

[Benchmarks] Add MMVU video dataset support and clean up deprecated d…

fedb5b1

…atasets (vllm-project#24719) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: charlifu <charlifu@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Benchmarks] Add MMVU video dataset support and clean up deprecated datasets #24719

[Benchmarks] Add MMVU video dataset support and clean up deprecated datasets #24719

Uh oh!

Isotr0py commented Sep 12, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

DarkLight1337 commented Sep 12, 2025

Uh oh!

tjtanaa commented Sep 12, 2025 •

edited

Loading

Uh oh!

david6666666 commented Sep 15, 2025 •

edited

Loading

Uh oh!

mergify bot commented Sep 16, 2025

Uh oh!

Isotr0py commented Sep 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Benchmarks] Add MMVU video dataset support and clean up deprecated datasets #24719

[Benchmarks] Add MMVU video dataset support and clean up deprecated datasets #24719

Uh oh!

Conversation

Isotr0py commented Sep 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

DarkLight1337 commented Sep 12, 2025

Uh oh!

tjtanaa commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david6666666 commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Sep 16, 2025

Uh oh!

Isotr0py commented Sep 16, 2025

Uh oh!

Uh oh!

Uh oh!

Isotr0py commented Sep 12, 2025 •

edited by github-actions bot

Loading

tjtanaa commented Sep 12, 2025 •

edited

Loading

david6666666 commented Sep 15, 2025 •

edited

Loading