An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
-
Updated
Apr 12, 2024 - Python
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
A PyTorch implementation of state of the art video captioning models from 2015-2019 on MSVD and MSRVTT datasets.
[ACM MM 2017 & IEEE TMM 2020] This is the Theano code for the paper "Video Description with Spatial Temporal Attention"
Source code for Semantics-Assisted Video Captioning Model Trained with Scheduled Sampling Strategy
Source code of the paper titled *Improving Video Captioning with Temporal Composition of a Visual-Syntactic Embedding*
Python implementation of extraction of several visual features representations from videos
Source code of the paper titled *Attentive Visual Semantic Specialized Network for Video Captioning*
[Pattern Rcognition 2021] This is the Theano code for our paper "Enhancing the Alignment between Target Words and Corresponding Frames for Video Captioning".
Add a description, image, and links to the msvd topic page so that developers can more easily learn about it.
To associate your repository with the msvd topic, visit your repo's landing page and select "manage topics."