vision-language

Here are 96 public repositories matching this topic...

IDEA-Research / GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

open-world object-detection vision-language vision-language-transformer open-world-detection

Updated Aug 12, 2024
Python

marqo-ai / marqo

Star

Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

Updated Nov 18, 2024
Python

OFA-Sys / Chinese-CLIP

Star

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

nlp computer-vision deep-learning transformers pytorch chinese pretrained-models multi-modal clip coreml-models contrastive-loss vision-language multi-modal-learning image-text-retrieval vision-and-language-pre-training

Updated Aug 6, 2024
Python

OFA-Sys / OFA

Star

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

prompt chinese image-captioning pretrained-models visual-question-answering multimodal text-to-image-synthesis vision-language pretraining referring-expression-comprehension prompt-tuning

Updated Apr 24, 2024
Python

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

chatbot llama clip mulit-modal vision-language vicuna gpt-4 vision-language-pretraining llava video-chatboat video-conversation

Updated Aug 27, 2024
Python

OFA-Sys / ONE-PEACE

Star

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

representation-learning multimodal vision-and-language contrastive-loss vision-language vision-transformer foundation-models audio-language

Updated Oct 6, 2024
Python

mbzuai-oryx / LLaVA-pp

Star

🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

conversation lmms vision-language llm llava llama3 phi3 llava-llama3 llava-phi3 llama3-llava phi3-llava llama-3-vision phi3-vision llama-3-llava phi-3-llava llama3-vision phi-3-vision

Updated Jul 10, 2024
Python

Algolzw / daclip-uir

Star

[ICLR 2024] Controlling Vision-Language Models for Universal Image Restoration. 5th place in the NTIRE 2024 Restore Any Image Model in the Wild Challenge.

deep-learning prompt pytorch image-denoising image-restoration image-deblurring low-level-vision shadow-removal image-dehazing face-inpainting vision-language diffusion-models low-light-image-enhancement image-deraining jpeg-artifacts-removal image-desnowing

Updated Aug 7, 2024
Python

TinyLLaVA / TinyLLaVA_Factory

Star

A Framework of Small-scale Large Multimodal Models

nlp transformers llama vision-language llava large-multimodal-models tinyllama

Updated Oct 16, 2024
Python

AILab-CVC / SEED

Star

Official implementation of SEED-LLaMA (ICLR 2024).

multimodal vision-language foundation-model

Updated Sep 21, 2024
Python

longzw1997 / Open-GroundingDino

Star

This is the third party implementation of the paper Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.

open-world object-detection vision-language open-world-detection

Updated Jun 25, 2024
Python

airaria / Visual-Chinese-LLaMA-Alpaca

Star

多模态中文LLaMA&Alpaca大语言模型（VisualCLA）

nlp chinese llama lora alpaca multimodal vision-language llm

Updated Jul 27, 2023
Python

mees / calvin

Star

CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

natural-language-processing computer-vision deep-learning robotics pytorch vision manipulation vision-and-language grounding vision-language

Updated Sep 1, 2024
Python

zdou0830 / METER

Star

METER: A Multimodal End-to-end TransformER Framework

vision-language

Updated Nov 16, 2022
Python

henghuiding / Vision-Language-Transformer

Star

[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation for Referring Segmentation

tensorflow keras transformer vision-language referring-segmentation tpami iccv2021 vision-language-transformer

Updated Jan 7, 2022
Python

HUANGLIZI / LViT

Star

[IEEE Transactions on Medical Imaging/TMI] This repo is the official implementation of "LViT: Language meets Vision Transformer in Medical Image Segmentation"

pytorch segmentation medical-image-analysis multimodal-learning vision-language