[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型
-
Updated
Jun 20, 2024 - Python
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型
This is the official implementation of the paper "Needle In A Multimodal Haystack"
[ACLW'24] LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-tailed Multi-Label Visual Recognition
Seamlessly integrate state-of-the-art transformer models into robotics stacks
[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.
VLSM-Adapter: Finetuning Vision-Language Segmentation Efficiently with Lightweight Blocks
[CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning
[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
Evaluating text-to-image/video/3D models with VQAScore
[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"
This repository contains the implementation for the paper "Revisiting Few Shot Object Detection with Vision-Language Models"
Vision Language Model fine-tuning for TIL AI BrainHack - Advanced Track
An open-source implementation of LLaVA-NeXT.
FreeVA: Offline MLLM as Training-Free Video Assistant
Grounded Multimodal Large Language Model with Localized Visual Tokenization
An official implementation of ShareGPT4V: Improving Large Multi-modal Models with Better Captions
Add a description, image, and links to the vision-language-model topic page so that developers can more easily learn about it.
To associate your repository with the vision-language-model topic, visit your repo's landing page and select "manage topics."