A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
-
Updated
Jul 28, 2024 - Python
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
🔍 Shotluck Holmes: A family of small-scale LLVMs for shot-level video understanding
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR 2023)
[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
Official code for Tell Me What You See: A Zero-Shot Action Recognition Method Based on Natural Language Descriptions (Multimedia Tools and Applications 2024)
[ICCV 2023] Accurate and Fast Compressed Video Captioning
Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)
A PyTorch implementation of EmpiricalMVM
[ECCV 2020] PyTorch code of MMT (a multimodal transformer captioning model) on TVCaption dataset
Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
A PyTorch implementation of state of the art video captioning models from 2015-2019 on MSVD and MSRVTT datasets.
A PyTorch implementation of the paper Thinking Hallucination for Video Captioning.
Auto transcribe tool based on whisper
Official code for Global Semantic Descriptors for Zero-Shot Action Recognition (IEEE Signal Processing Letters 2022)
Video captioning using SCN-LSTM models with S2VT baseline
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
Implementation of Encoder-Decoder Model for Video Captioning in Tensorflow
What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]
Second-place solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2022 workshop)
Master Thesis on Multimodal Video Captioning, done at Huawei's Research Center in Amsterdam.
Add a description, image, and links to the video-captioning topic page so that developers can more easily learn about it.
To associate your repository with the video-captioning topic, visit your repo's landing page and select "manage topics."