video-captioning

Here are 46 public repositories matching this topic...

bytedance / Shot2Story

A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.

benchmark video-summarization dataset video-captioning video-story vision-language video-question-answering video-language large-language-models video-language-pretraining video-story-generation

Updated Jul 28, 2024
Python

Skyline-9 / Shotluck-Holmes

Star

🔍 Shotluck Holmes: A family of small-scale LLVMs for shot-level video understanding

python nlp video-summarization video-captioning llm

Updated Jun 14, 2024
Python

mlvlab / MELTR

Star

MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR 2023)

multi-modal video-captioning meta-learning video-retrieval video-question-answering cvpr2023

Updated Apr 23, 2024
Python

jpthu17 / EMCL

Star

[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

video-captioning neurips video-retrieval video-question-answering cross-modal-retrieval

Updated Apr 9, 2024
Python

valterlej / zsarcap

Star

Official code for Tell Me What You See: A Zero-Shot Action Recognition Method Based on Natural Language Descriptions (Multimedia Tools and Applications 2024)

zero-shot-learning video-captioning cross-dataset-learning

Updated Mar 8, 2024
Python

acherstyx / CoCap

Star

[ICCV 2023] Accurate and Fast Compressed Video Captioning

video-captioning compressed-video iccv2023

Updated Feb 18, 2024
Python

zjr2000 / LLMVA-GEBC

Star

Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)

video-captioning pytorch-implementation long-video-understanding

Updated Jan 1, 2024
Python

tsujuifu / pytorch_empirical-mvm

Star

A PyTorch implementation of EmpiricalMVM

pytorch video-captioning vision-and-language pre-training video-retrieval video-question-answering cvpr2023

Updated Dec 18, 2023
Python

jayleicn / TVCaption

Star

[ECCV 2020] PyTorch code of MMT (a multimodal transformer captioning model) on TVCaption dataset

pytorch dataset video-captioning

Updated Sep 6, 2023
Python

TXH-mercury / COSA

Star

Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

video-captioning video-qa video-retrieval vision-language-pretraining video-language-pretrainng

Updated Aug 1, 2023
Python

nasib-ullah / video-captioning-models-in-Pytorch

Star

A PyTorch implementation of state of the art video captioning models from 2015-2019 on MSVD and MSRVTT datasets.

video deep-learning pytorch sequence-to-sequence video-captioning s2vt msvd pytorch-implementation msrvtt marn video-captioning-models recnet

Updated Jul 30, 2023
Python

nasib-ullah / THVC

Star

A PyTorch implementation of the paper Thinking Hallucination for Video Captioning.

video-captioning hallucinations accv2022

Updated Jul 30, 2023
Python

tomchang25 / whisper-auto-transcribe

Star

Auto transcribe tool based on whisper

text-to-speech deep-learning pytorch speech-recognition speech-to-text language-model gradio speech-processing asr video-captioning voice-activity-detection gradio-interface

Updated Apr 27, 2023
Python

valterlej / objsentzsar

Star

Official code for Global Semantic Descriptors for Zero-Shot Action Recognition (IEEE Signal Processing Letters 2022)

object-recognition zero-shot-learning video-captioning sentence-representantion

Updated Apr 14, 2023
Python

SteveImmanuel / SCN-LSTM

Star

Video captioning using SCN-LSTM models with S2VT baseline

computer-vision deep-learning video-captioning

Updated Mar 15, 2023
Python

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).

image-captioning video-captioning visual-question-answering vision-and-language cross-modal-retrieval pretraining tden