vqa

Here are 142 public repositories matching this topic...

open-compass / VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 30+ benchmarks

computer-vision evaluation pytorch gemini openai vqa vit gpt multi-modal clip claude openai-api gpt4 large-language-models llm chatgpt llava qwen gpt-4v

Updated Jul 21, 2024
Python

phucthaiv02 / VQA

Star

Visual Question Answering in Pytorch

python deep-learning pytorch vqa

Updated Jul 15, 2024
Python

0xnu / tiny_llm_trainer

Sponsor

Star

The experiment implements a tiny language model trainer using PyTorch.

text-to-speech wiki wikipedia text-generation pytorch tts vqa visual-question-answering large-language-models llm large-language-model llm-training

Updated Jun 30, 2024
Python

This package is a flexible python implementation of the Quantum Approximate Optimization Algorithm /Quantum Alternating Operator ansatz (QAOA) aimed at researchers to readily test the performance of a new ansatz, a new classical optimizers, etc.

quantum-computing vqa quantumcomputing qaoa

Updated Jun 27, 2024
Python

abdur75648 / MedicalGPT

Star

Medical Report Generation And VQA (Adapting XrayGPT to Any Modality)

medical-imaging vqa llama vqa-dataset medical-dataset vicuna llm medical-report-generation llms chatgpt minigpt4 multimodal-llm medicalgpt chatgpt4o xraygpt

Updated Jun 24, 2024
Python

avinabsaha / HIDRO-VQA

Star

Official Implementation of WACV 2024 Paper "HIDRO-VQA : High Dynamic Range Oracle for Video Quality Assessment"

video vqa hdr video-quality-assessment self-supervised-learning contrastive-learning videoqualityassessment wacv2024

Updated Jun 19, 2024
Python

gutbash / lmm-graph-vision

Star

How well do the GPT-4V, Gemini Pro Vision, and Claude 3 Opus models perform zero-shot vision tasks on data structures?

data-structures openai vqa visual-question-answering vqa-dataset google-generative-ai gpt-4v gpt-4-vision-preview gemini-pro-vision claude-3

Updated Jun 13, 2024
Python

ycchen218 / VisionQA-Llama2-OWLViT

Star

This is a multimodal model design for the Vision Question Answering (VQA) task. It integrates the Llama2 13B, OWL-ViT, and YOLOv8 models.

deep-learning vqa llama gqa yolov8 owl-vit

Updated Jun 13, 2024
Python

nngocson2002 / ViVQA

Star

The Multimodal Model for Vietnamese Visual Question Answering (ViVQA)

vqa multimodal-deep-learning efficientnet bartpho beit-3 blip2 vivqa

Updated Jun 9, 2024
Python

OpenGVLab / InternGPT

Star

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)

Updated Jun 9, 2024
Python

facebookresearch / mmf

Star

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

deep-learning dialog pytorch vqa pretrained-models captioning multimodal multi-tasking textvqa hateful-memes

Updated May 25, 2024
Python

oliverc1623 / ceriad

Star

An extension of the Planner-Actor-Reporter framework applied to autonomous vehicles in Highway-Env and CARLA.

reinforcement-learning vqa autonomous-driving autonomous-vehicles carla-simulator highway-env llms llava

Updated Jul 20, 2024
Python

AIRI-Institute / OmniFusion

Star

OmniFusion — a multimodal model to communicate using text and images

transformer vqa vcr visual-encoding multimodal ai-assistant large-language-models

Updated Apr 28, 2024
Python

AdrianBZG / llama-multimodal-vqa

Star

Multimodal Instruction Tuning for Llama 3

chatbot vqa llama language-models visual-question-answering multimodal huggingface gpt-4 visual-language-learning chatgpt instruction-tuning multimodal-instruction-tuning llama2 llama3

Updated Apr 25, 2024
Python

rabiulcste / vqazero

Star

visual question answering prompting recipes for large vision-language models

vqa vision-and-language prompt-engineering

Updated Apr 22, 2024
Python

OpenGVLab / Multi-Modality-Arena

Star

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

chat chatbot vqa gradio multi-modality large-language-models llms chatgpt vision-language-model