A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
benchmark
research
video-summarization
dataset
video-captioning
video-story
vision-language
video-question-answering
video-language
large-language-models
video-language-pretraining
video-story-generation
-
Updated
Sep 25, 2024 - Python