LVChat

This is the official implementation of our paper LVChat: Facilitating Long Video Comprehension. Our code base is built on the repo Ask-Anything.

Environment Preparation

conda create --name lvchat python=3.11
pip install -r requirements.txt

Datasets

We used the instruction data for training. Specifically, we used the following subsets (Please refer to the link here which includes all the json file needed for training):

conversation_videochat1
conversation_videochat2
conversation_videochatgpt
caption_videochat
reasoning_clevrer_qa
reasoning_clevrer_mc
reasoning_next_qa

To replicate our training for Frame Scalable Encoding (FSE), please download the datasets Clevrer, NExT-QA, VideoChatGPT, WebVid-10M(However, this dataset is no longer available) as well as the json files from VideoChat2-IT. Then we put all the datasets as the following structure:

- data
    - ANet
        - activitynet_train_videos_video_chatgpt
    - anno
        - video
            - caption
            - conversation
            - reasoning
    - clevrer
    - internvid-10s (This is the instruction dataset collected by VideoChat2. These videos are from InternVid (https://github.com/OpenGVLab/InternVideo/tree/main/Data/InternVid). Considering the data is too large, you may can download the video by yourself.  For example, “LLU5X98aozs_648.258.mp4”, “LLU5X98aozs”is YouTube ID, “648.258”is the start time，and the video clip duration is 10s. Thanks to the author Kunchang Li of VideoChat2 for offering the link and instructions.)
    - nextqa
    - WebVid10M (All the videos of VideoChat v1 data are from here)

Base model preparation

Download the VideoBLIP model.

wget -P video_models https://pjlab-gvm-data.oss-cn-shanghai.aliyuncs.com/videochat2/umt_l16_qformer.pth

Follow here to prepare vicuna-7b-v0 and place it under video_models

Training with Frame Scalable Encoding (FSE)

Download the model videochat2_7b_stage3.pth from here then put it under the folder video_models. Now the folder video_models should have the following structure:

- video_models
    - vicuna-7b-v0
    l16_25m.pth
    umt_l16_qformer.pth
    videochat2_7b_stage3.pth

For Validation, please refer to the following section to download MVBench and put the dataset under the folder ./MVBench.

Then simply run the following code (remember to set the number of gpus in the file NUM_GPUS).

sh run_7b_stage4.sh

Evaluation

Download MVBench

Download from Hugging Face and place it under ./MVBench. The file structure under MVBench is:

- assert
- json
- video
.gitattributes
README.md

Prepare street-scene data(required if want to use the extended MVBench data)

bash download_street_scnene.sh

Prepare LV-Chat Model

Please download the model from LV-Chat. Put the pth file 7b_stage4.pth under the folder video_models.

Evaluate LV-Chat on MVBench

Run the script to test our model and the result will be written to logs:

bash run_mvbench.sh

You can also run the baseline (VideoChat2) using:

bash run_mvbench.sh --config ./configs/config_videochat2.json

Evaluate LV-Chat on Real-world datasets

TACoS

Download TACoS dataset from here and place the videos folder under ./TACoS.
Download GPT-4 generated summary:

wget -P ./TACoS https://huggingface.co/datasets/Kevin99z/tacos_summary/resolve/main/summary.json

Evaluate TACoS

bash run_tacos.sh # add --config ./configs/config_videochat2.json to test the baseline

EgoSchema

Download EgoSchema here and place it under ./EgoSchema.
Evaluate EgoSchema

bash run_egoschema.sh # add --config ./configs/config_videochat2.json to test the baseline

If you find our paper or code useful, please consider citing our paper.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
configs		configs
dataset		dataset
models		models
output/7b_stage4		output/7b_stage4
scripts		scripts
tasks		tasks
utils		utils
.gitignore		.gitignore
README.md		README.md
common.py		common.py
data		data
download_street_scnene.sh		download_street_scnene.sh
eval_tacos.py		eval_tacos.py
inference.py		inference.py
mv_eval.py		mv_eval.py
mvbench.py		mvbench.py
requirements.txt		requirements.txt
run_7b_stage4.sh		run_7b_stage4.sh
run_egoschema.sh		run_egoschema.sh
run_mvbench.sh		run_mvbench.sh
run_tacos.sh		run_tacos.sh
streetscene.py		streetscene.py
train_it_long.py		train_it_long.py
video_models		video_models
videochat2_process.py		videochat2_process.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LVChat

Environment Preparation

Datasets

Base model preparation

Training with Frame Scalable Encoding (FSE)

Evaluation

Download MVBench

Prepare street-scene data(required if want to use the extended MVBench data)

Prepare LV-Chat Model

Evaluate LV-Chat on MVBench

Evaluate LV-Chat on Real-world datasets

TACoS

EgoSchema

About

Releases

Packages

Languages

wangyu-ustc/LVChat

Folders and files

Latest commit

History

Repository files navigation

LVChat

Environment Preparation

Datasets

Base model preparation

Training with Frame Scalable Encoding (FSE)

Evaluation

Download MVBench

Prepare street-scene data(required if want to use the extended MVBench data)

Prepare LV-Chat Model

Evaluate LV-Chat on MVBench

Evaluate LV-Chat on Real-world datasets

TACoS

EgoSchema

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages