Let's Thing Frame by Frame with VIP: A Video Infilling and Prediction Dataset for Evaluating Video Chain-of-Thought
code coming soon!
If you use this dataset for your research, please cite using the citation below:
@inproceedings{
anonymous2023lets,
title={Let's Think Frame by Frame with {VIP}: A Video Infilling and Prediction Dataset for Evaluating Video Chain-of-Thought},
author={Anonymous},
booktitle={The 2023 Conference on Empirical Methods in Natural Language Processing},
year={2023},
url={https://openreview.net/forum?id=y6Ej5BZkrR}
}