Skip to content

zwusy/InstructVideo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 

Repository files navigation

InstructVideo

InstructVideo: A reasoning-centric video object segmentation dataset with QA annotations for Multi-modal Large Language Models.

**πŸ“Œ Important Notice: This repository is currently under preparation. The dataset and all associated code will be publicly released no later than October 2026.

Dataset Overview

InstructVideo is a reasoning-centric video object segmentation dataset designed to evaluate and facilitate research on multi-modal large language models (MLLMs) for complex video understanding tasks.

Key Statistics

Statistic Value
Videos 1,788
QA Pairs 6,112
Objects 3,603
Average instances per multiple-object sample 3.77
Max instances in a single sample 16

Key Features

  • Reasoning-centric queries requiring world knowledge and temporal understanding
  • Both single-object and multiple-object segmentation tasks
  • Logical textual responses beyond simple mask prediction
  • High-quality mask annotations for referred targets

Dataset Structure

InstructVideo/
β”œβ”€β”€ train/
β”‚ β”œβ”€β”€ videos/ # Training video clips
β”‚ β”œβ”€β”€ masks/ # Segmentation mask annotations
β”‚ └── annotations/ # QA pairs and textual responses
β”œβ”€β”€ test/
β”‚ β”œβ”€β”€ videos/ # Test video clips
β”‚ β”œβ”€β”€ masks/ # Segmentation mask annotations
β”‚ └── annotations/ # QA pairs and textual responses
└── README.md


About

InstructVideo: A reasoning-centric video object segmentation dataset with QA annotations for Multi-modal Large Language Models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors