#

vision-language

Here are 81 public repositories matching this topic...

marqo

marqo-ai / marqo

Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

Updated Jul 2, 2024
Python

pha123661 / NTU-2022Fall-DLCV

Deep Learning for Computer Vision 深度學習於電腦視覺 by Frank Wang 王鈺強

computer-vision deep-learning cnn gan image-classification image-captioning image-generation image-segmentation point-cloud-segmentation self-supervised-learning adversarial-domain-adaptation vision-language novel-view-synthesis long-tailed-recognition zero-shot-classification

Updated Jun 30, 2024
Python

mees / calvin

CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

natural-language-processing computer-vision deep-learning robotics pytorch vision manipulation vision-and-language grounding vision-language

Updated Jun 29, 2024
Python

IDEA-Research / GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

open-world object-detection vision-language vision-language-transformer open-world-detection

Updated Jun 28, 2024
Python

OFA-Sys / ONE-PEACE

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

representation-learning multimodal vision-and-language contrastive-loss vision-language vision-transformer foundation-models audio-language

Updated Jun 27, 2024
Python

TinyLLaVA / TinyLLaVA_Factory

A Framework of Small-scale Large Multimodal Models

nlp transformers llama vision-language llava large-multimodal-models tinyllama

Updated Jun 25, 2024
Python

longzw1997 / Open-GroundingDino

This is the third party implementation of the paper Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.

open-world object-detection vision-language open-world-detection

Updated Jun 25, 2024
Python

mbzuai-oryx / VideoGPT-plus

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

chatbot clip image-encoder video-encoder multimodal dual-encoder vision-language vicuna gpt4 vision-language-pretraining llava video-conversation video-chatbot llama3 gpt4o phi-3-mini

Updated Jun 20, 2024
Python

bytedance / Shot2Story

A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.

benchmark video-summarization dataset video-captioning video-story vision-language video-question-answering video-language large-language-models video-language-pretraining video-story-generation

Updated Jun 17, 2024
Python

mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

chatbot llama clip mulit-modal vision-language vicuna gpt-4 vision-language-pretraining llava video-chatboat video-conversation

Updated Jun 16, 2024
Python

IRVLUTD / Proto-CLIP

Code release for Proto-CLIP: Vision-Language Prototypical Network for Few-Shot Learning

robotics multimodal-learning few-shot-learning vision-language few-shot-classifcation prototype-learning

Updated Jun 15, 2024
Python

WisconsinAIVision / ViP-LLaVA

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

chatbot llama multi-modal clip vision-language gpt-4 foundation-models visual-prompting llava llama2 cvpr2024 gpt-4-vision

Updated Jun 12, 2024
Python

hee-suk-yoon / C-TPT

[ICLR'24] Official code for "C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion"

calibration clip vision-language test-time-adaptation iclr2024

Updated Jun 9, 2024
Python

aimagelab / PMA-Net

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning. ICCV 2023

transformer image-captioning captioning-images captioning vision-and-language vision-language memory-augmented-neural-networks iccv2023

Updated Jun 7, 2024
Python

muzairkhattak / ProText

Official repository of paper titled "Learning to Prompt with Text Only Supervision for Vision-Language Models".

vision-language text-only-supervision visual-generalization

Updated Jun 4, 2024
Python

billpsomas / rscir

Official PyTorch implementation and benchmark dataset for IGARSS 2024 ORAL paper: "Composed Image Retrieval for Remote Sensing"

computer-vision deep-learning satellite remote-sensing satellite-imagery earth-observation vision-language vision-transformer vision-language-model

Updated May 31, 2024
Python

enkaranfiles / remote-sensing-dataset-construction

Vision Language Dataset Construction Library for Remote Sensing Domain

spectral multimodality vision-language llava

Updated May 30, 2024
Python

egeozsoy / ORacle

Official code of the paper ORacle: Large Vision-Language Models for Knowledge-Guided Holistic OR Domain Modeling accepted at MICCAI 2024.

video deep-learning knowledge sds scene-graph scene-graph-generation vision-language llm large-language-model vision-language-model

Updated May 27, 2024
Python

naver / shine

[CVPR'24 Highlight] SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection

vision-language open-vocabulary-detection

Updated May 27, 2024
Python

ProGamerGov / VLM-Captioning-Tools

Python scripts to use for captioning images with VLMs

text-summarization image-captioning mistral vlm vision-language llm moondream cogvlm llama3

Updated May 23, 2024
Python

Improve this page

Add a description, image, and links to the vision-language topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-language topic, visit your repo's landing page and select "manage topics."