Skip to content
forked from salesforce/BLIP

Championship solution for the Video-to-Text Description (VTT) task at TRECVID 2022

License

Notifications You must be signed in to change notification settings

yuezih/BLIP4video

 
 

Repository files navigation

BLIP4video: BLIP Captioner's Video Solution

Issues Forks Stars PRs-Welcome

This is the PyTorch code of BLIP4video, a modified version of BLIP for the Video-to-Text Description (VTT) task at TRECVID 2022. Our submission ranks 1st in all official evaluation metrics including BLEU, METEOR, CIDER, SPICE, and STS, and achieves the best submission score of 60.2 on CIDEr, 67.2% higher than last year’s best result.

Catalog:

  • BLIP captioner's video solution
  • Self-critical reinforcement learning for video captioning (VinVL implementation)
  • Text-video retrieval and matching for caption candidates scoring and re-ranking

Video-Text Captioning:

  1. Set data root in configs/*.yaml accordingly.
  2. To train the finetuned BLIP4video model for the video captioning task, run:
bash scripts/train_video_caption.sh

Citation

If you find this code to be useful for your research, please consider citing.

@inproceedings{yue2022blip4video,
  author = {Yue, Zihao and Liu, Yuqi and Zhang, Liang and Yao, Linli and Jin, Qin},
  title = {RUCAIM3-Tencent at TRECVID 2022: Video to Text Description},
  year = {2022},
  booktitle = {Proceedings of TRECVID 2022},
  organization = {NIST, USA},
  url={https://www-nlpir.nist.gov/projects/tvpubs/tv22.papers/rucaim3-tencent.pdf}
}

Acknowledgement

The implementation of BLIP relies on resources from BLIP and Oscar. We thank the original authors for their open-sourcing.

About

Championship solution for the Video-to-Text Description (VTT) task at TRECVID 2022

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 56.6%
  • Jupyter Notebook 43.1%
  • Shell 0.3%