Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks
-
Updated
Jun 19, 2024 - Python
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks
Official Implementation of WACV 2024 Paper "HIDRO-VQA : High Dynamic Range Oracle for Video Quality Assessment"
How well do the GPT-4V, Gemini Pro Vision, and Claude 3 Opus models perform zero-shot vision tasks on data structures?
The Multimodal Model for Vietnamese Visual Question Answering (ViVQA)
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
An extension of the Planner-Actor-Reporter framework applied to autonomous vehicles in Highway-Env and CARLA.
OmniFusion — a multimodal model to communicate using text and images
Multimodal Instruction Tuning for Llama 3
visual question answering prompting recipes for large vision-language models
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
This package is a flexible python implementation of the Quantum Approximate Optimization Algorithm /Quantum Alternating Operator ansatz (QAOA) aimed at researchers to readily test the performance of a new ansatz, a new classical optimizers, etc.
LLaVA inference with multiple images at once for cross-image analysis.
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.
[PR23] The implementation of the paper ''Learning Visual Question Answering on Controlled Semantic Noisy Labels''
[Paper][IJCKG 2022] LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection
[Paper][ISWC 2021] Zero-shot Visual Question Answering using Knowledge Graph
Add a description, image, and links to the vqa topic page so that developers can more easily learn about it.
To associate your repository with the vqa topic, visit your repo's landing page and select "manage topics."