#

vqa

Here are 142 public repositories matching this topic...

facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

deep-learning dialog pytorch vqa pretrained-models captioning multimodal multi-tasking textvqa hateful-memes

Updated May 25, 2024
Python

OpenGVLab / InternGPT

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)

Updated Jun 9, 2024
Python

NVlabs / prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

vqa image-captioning language-model multi-task-learning vision-and-language multi-modal-learning vision-language-model

Updated Jan 17, 2024
Python

Oscar

microsoft / Oscar

Oscar and VinVL

vqa image-captioning oscar vision-and-language pre-training image-text-search vinvl

Updated Aug 28, 2023
Python

hengyuan-hu / bottom-up-attention-vqa

An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.

pytorch vqa bottom-up-attention

Updated Mar 10, 2024
Python

open-compass / VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 30+ benchmarks

computer-vision evaluation pytorch gemini openai vqa vit gpt multi-modal clip claude openai-api gpt4 large-language-models llm chatgpt llava qwen gpt-4v

Updated Jul 21, 2024
Python

Cadene / vqa.pytorch

Visual Question Answering in Pytorch

deep-learning torch pytorch vqa coco resnet skipthoughts clevr vgenome

Updated Dec 11, 2019
Python

jayleicn / ClipBERT

[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.

pytorch vqa vision-and-language video-retrieval video-question-answering cvpr2021

Updated Aug 8, 2023
Python

stanfordnlp / mac-network

Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)

tensorflow vqa question-answering attention clevr machine-reasoning compositional-attention-networks

Updated Jul 10, 2021
Python

OpenGVLab / Multi-Modality-Arena

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

chat chatbot vqa gradio multi-modality large-language-models llms chatgpt vision-language-model

Updated Apr 21, 2024
Python

vacancy / NSCL-PyTorch-Release

PyTorch implementation for the Neuro-Symbolic Concept Learner (NS-CL).

vqa concept-learning neuro-symbolic-learning

Updated Oct 24, 2020
Python

MILVLG / openvqa

A lightweight, scalable, and general framework for visual question answering research

benchmark deep-learning pytorch vqa visual-question-answering

Updated Sep 3, 2021
Python

Cyanogenoid / pytorch-vqa

Strong baseline for visual question answering

pytorch vqa baseline visual-question-answering

Updated Mar 13, 2023
Python

FuxiaoLiu / LRV-Instruction

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

evaluation vision vqa llama object-detection gpt evaluation-metrics iclr multimodal vision-and-language hallucination vicuna gpt-4 foundation-models prompt-engineering chatgpt llava iclr2024

Updated Mar 13, 2024
Python

AIRI-Institute / OmniFusion

OmniFusion — a multimodal model to communicate using text and images

transformer vqa vcr visual-encoding multimodal ai-assistant large-language-models

Updated Apr 28, 2024
Python

X-PLUG / mPLUG-2

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)

video vqa image-retrieval multimodal video-retrieval video-question-answering foundation-models multimodal-pretraining mllm mplug

Updated Jul 21, 2023
Python

yuzcccc / vqa-mfb

deep-learning vqa bilinear-pooling high-order-pooling

Updated Jul 30, 2019
Python

linjieli222 / VQA_ReGAT

Research Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"

pytorch vqa attention

Updated Apr 15, 2021
Python

antoyang / FrozenBiLM

[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

vqa video-understanding weakly-supervised-learning multimodal-learning visual-question-answering vision-and-language videoqa pre-training video-question-answering large-language-models

Updated Sep 24, 2023
Python

thaolmk54 / hcrn-videoqa

Implementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)

vqa question-answering tgif-qa videoqa

Updated May 6, 2020
Python

Improve this page

Add a description, image, and links to the vqa topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vqa topic, visit your repo's landing page and select "manage topics."