SVI-Bench

Project page: https://svi-bench.github.io/ · Dataset: MVP-Group/SVI-Bench · Arxiv: http://arxiv.org/abs/2605.31529

SVI-Bench is a multi-task benchmark for evaluating vision-language models on sports video understanding. It spans four pillars — Perception, Reasoning, Simulation, and Agency — across nine tasks, three sports (basketball, hockey, soccer), and multiple seasons.

Tasks

Task	Pillar	Description
T1	Perception	Structured play-by-play description
T2	Perception	Fine-grained action QA
T3	Perception	Compositional video retrieval (R@K ranking)
T4	Reasoning	Strategic reasoning QA (free-text, LLM-as-judge)
T5	Reasoning	Outcome forecasting (multiple-choice)
T6	Reasoning	Long-form narrative synthesis (report generation)
T7	Simulation	Motion-conditioned video generation (LoRA fine-tune)
T8	Simulation	Goal-conditioned action generation (LoRA fine-tune)
T9	Agency	Cross-corpus agentic reasoning (search + QA)

Each task has its own directory under svi_bench/tasks/ with a dedicated README.md covering setup, data format, and evaluation instructions. Per-task quickstarts (clone → install → download → infer → eval one-shot recipes) live in those READMEs.

Data

Datasets are hosted on HuggingFace at MVP-Group/SVI-Bench. Access is gated — agree to the terms on the dataset page once, then your HF token unlocks all data. Large data are shipped as .tar bundles. See each task's README for download and setup instructions.

Repository layout

SVI-Bench/
├── pyproject.toml
├── README.md
├── LICENSE
├── scripts/                  # benchmark-wide helpers (e.g. extract_tars.py)
└── svi_bench/
    ├── core/                 # shared utilities
    └── tasks/                # one self-contained subpackage per task
        ├── t1_structured_play_description/
        ├── t2_fine_grained_action_qa/
        ├── t3_compositional_video_retrieval/
        ├── t4_strategic_reasoning_qa/
        ├── t5_outcome_forecasting/
        ├── t6_long_form_narrative_synthesis/
        ├── t7_motion_conditioned_generation/
        ├── t8_goal_conditioned_action_generation/
        └── t9_cross_corpus_agentic_reasoning/

Each task dir holds its own train.sh / inference/ / eval/ / scripts/ / configs/ / figures/ as needed.

License

Code is MIT (see LICENSE). Data is governed by the gated-access agreement on the HF dataset page.

Citation

If you use SVI-Bench, please cite the benchmark:

@misc{pan2026svibenchdynamicmicroworldstrategic,
      title={SVI-Bench: A Dynamic Microworld for Strategic Video Intelligence}, 
      author={Yulu Pan and Han Yi and Seongsu Ha and Md Mohaiminul Islam and Benjamin Zhang and Lorenzo Torresani and Gedas Bertasius},
      year={2026},
      eprint={2605.31529},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.31529}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SVI-Bench

Tasks

Data

Repository layout

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
scripts		scripts
svi_bench		svi_bench
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

SVI-Bench

Tasks

Data

Repository layout

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages