#

video-question-answering

Here are 43 public repositories matching this topic...

Vision-CAIR / MiniGPT4-video

Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding

video-understanding video-retrieval video-question-answering long-video-understanding

Updated Jul 18, 2024
Python

bcmi / Causal-VidQA

[CVPR 2022] A large-scale public benchmark dataset for video question-answering, especially about evidence and commonsense reasoning. The code used in our paper "From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering", CVPR2022.

commonsense-reasoning video-question-answering evidence-reason visual-understanding video-question-answering-dataset

Updated Jul 11, 2024
Python

OpenGVLab / InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

benchmark action-recognition video-understanding video-data self-supervised multimodal video-dataset open-set-recognition video-retrieval video-question-answering masked-autoencoder temporal-action-localization contrastive-learning spatio-temporal-action-localization zero-shot-retrieval video-clip vision-transformer zero-shot-classification foundation-models instruction-tuning

Updated Jul 10, 2024
Python

OpenGVLab / Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

chat video gradio big-model video-understanding captioning-videos video-question-answering foundation-models large-model large-language-models chatgpt langchain stablelm

Updated Jul 5, 2024
Python

doc-doc / NExT-GQA

Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)

videoqa video-grounding video-question-answering video-language-understanding trustworthy-vqa visual-evidence-grounding

Updated Jul 1, 2024
Python

declare-lab / Sealing

[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"

multimodality video-understanding video-question-answering visual-language-models naacl2024

Updated Jun 27, 2024
Python

mmazab / LifeQA

Data and PyTorch code for the LifeQA LREC 2020 paper.

nlp machine-learning natural-language-processing youtube research computer-vision deep-learning pytorch dataset videos question-answering real-life videoqa video-question-answering lrec2020 lrec lifeqa

Updated Jun 21, 2024
Python

bytedance / Shot2Story

A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.

benchmark video-summarization dataset video-captioning video-story vision-language video-question-answering video-language large-language-models video-language-pretraining video-story-generation

Updated Jun 17, 2024
Python

whwu95 / FreeVA

FreeVA: Offline MLLM as Training-Free Video Assistant

chatbot video-understanding zero-shot-video-captioning video-question-answering chatgpt vision-language-model llava training-free multimodal-large-language-models

Updated Jun 9, 2024
Python

mlvlab / Flipped-VQA

Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)

multi-modal visual-question-answering video-question-answering large-language-models emnlp2023

Updated Apr 23, 2024
Python

mlvlab / OVQA

Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models (ICCV 2023)

multi-modal visual-question-answering video-question-answering iccv2023

Updated Apr 23, 2024
Python

mlvlab / MELTR

MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR 2023)

multi-modal video-captioning meta-learning video-retrieval video-question-answering cvpr2023

Updated Apr 23, 2024
Python

jpthu17 / HBI

[CVPR 2023 Highlight] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning

cvpr video-retrieval video-question-answering cross-modal-retrieval

Updated Apr 9, 2024
Python

jpthu17 / EMCL

[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

video-captioning neurips video-retrieval video-question-answering cross-modal-retrieval

Updated Apr 9, 2024
Python

doc-doc / CoVGT

Contrastive Video Question Answering via Video Graph Transformer (IEEE T-PAMI'23)

videoqa video-question-answering contrastive-learning dynamic-visual-graph video-language-understanding

Updated Mar 9, 2024
Python

zchoi / PKOL

[TIP 2022] Official code of paper “Video Question Answering with Prior Knowledge and Object-sensitive Learning”

pytorch pytorch-implementation video-retrieval vision-language video-question-answering

Updated Jan 27, 2024
Python

X-PLUG / Youku-mPLUG

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks

benchmark video dataset chinese youku multimodal video-retrieval video-question-answering multimodal-pretraining mllm multimodal-large-language-models

Updated Jan 8, 2024
Python

tsujuifu / pytorch_empirical-mvm

A PyTorch implementation of EmpiricalMVM

pytorch video-captioning vision-and-language pre-training video-retrieval video-question-answering cvpr2023

Updated Dec 18, 2023
Python

tsujuifu / pytorch_violet

A PyTorch implementation of VIOLET

pytorch vision-and-language pre-training video-retrieval video-question-answering

Updated Dec 17, 2023
Python

doc-doc / NExT-QA

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)

video-understanding videoqa vision-language video-question-answering multi-object-interaction causal-temporal-action-reasoning

Updated Oct 25, 2023
Python

Improve this page

Add a description, image, and links to the video-question-answering topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the video-question-answering topic, visit your repo's landing page and select "manage topics."