GitHub - yuezih/BLIP4video: Championship solution for the Video-to-Text Description (VTT) task at TRECVID 2022

BLIP4video: BLIP Captioner's Video Solution

This is the PyTorch code of BLIP4video, a modified version of BLIP for the Video-to-Text Description (VTT) task at TRECVID 2022. Our submission ranks 1st in all official evaluation metrics including BLEU, METEOR, CIDER, SPICE, and STS, and achieves the best submission score of 60.2 on CIDEr, 67.2% higher than last year’s best result.

Catalog:

BLIP captioner's video solution
Self-critical reinforcement learning for video captioning (VinVL implementation)
Text-video retrieval and matching for caption candidates scoring and re-ranking

Video-Text Captioning:

Set data root in configs/*.yaml accordingly.
To train the finetuned BLIP4video model for the video captioning task, run:

bash scripts/train_video_caption.sh

Citation

If you find this code to be useful for your research, please consider citing.

@inproceedings{yue2022blip4video,
  author = {Yue, Zihao and Liu, Yuqi and Zhang, Liang and Yao, Linli and Jin, Qin},
  title = {RUCAIM3-Tencent at TRECVID 2022: Video to Text Description},
  year = {2022},
  booktitle = {Proceedings of TRECVID 2022},
  organization = {NIST, USA},
  url={https://www-nlpir.nist.gov/projects/tvpubs/tv22.papers/rucaim3-tencent.pdf}
}

Acknowledgement

The implementation of BLIP relies on resources from BLIP and Oscar. We thank the original authors for their open-sourcing.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
coco_caption		coco_caption
configs		configs
data		data
losses		losses
models		models
scripts		scripts
transform		transform
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
eval_video_itm.py		eval_video_itm.py
train_video_caption.py		train_video_caption.py
train_video_retrieval.py		train_video_retrieval.py
utils.py		utils.py

License

yuezih/BLIP4video

Folders and files

Latest commit

History

Repository files navigation

BLIP4video: BLIP Captioner's Video Solution

Catalog:

Video-Text Captioning:

Citation

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Languages