#

vision-language

Here are 128 public repositories matching this topic...

marqo

marqo-ai / marqo

Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

Updated Jul 2, 2024
Python

OpenDriveLab / DriveLM

[ECCV 2024] DriveLM: Driving with Graph Visual Question Answering

autonomous-driving vision-language large-language-models llm prompt-engineering prompting chain-of-thought tree-of-thoughts graph-of-thoughts

Updated Jul 2, 2024
HTML

llm-jp / awesome-japanese-llm

日本語LLMまとめ - Overview of Japanese LLMs

japanese generative-model japanese-language language-models language-model generative-models multimodal vision-and-language vision-language foundation-models large-language-models llm llms generative-ai large-language-model vision-language-model japanese-llm japanese-language-model llm-japanese

Updated Jul 1, 2024
TypeScript

pha123661 / NTU-2022Fall-DLCV

Deep Learning for Computer Vision 深度學習於電腦視覺 by Frank Wang 王鈺強

computer-vision deep-learning cnn gan image-classification image-captioning image-generation image-segmentation point-cloud-segmentation self-supervised-learning adversarial-domain-adaptation vision-language novel-view-synthesis long-tailed-recognition zero-shot-classification

Updated Jun 30, 2024
Python

mees / calvin

CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

natural-language-processing computer-vision deep-learning robotics pytorch vision manipulation vision-and-language grounding vision-language

Updated Jun 29, 2024
Python

IDEA-Research / GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

open-world object-detection vision-language vision-language-transformer open-world-detection

Updated Jun 28, 2024
Python

ChenDelong1999 / RemoteCLIP

🛰️ Official repository of paper "RemoteCLIP: A Vision Language Foundation Model for Remote Sensing" (IEEE TGRS)

remote-sensing vision-language contrastive-language-image-pretraining

Updated Jun 27, 2024
Jupyter Notebook

OFA-Sys / ONE-PEACE

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

representation-learning multimodal vision-and-language contrastive-loss vision-language vision-transformer foundation-models audio-language

Updated Jun 27, 2024
Python

KAIST-Edlab / Study_Of_VL

KAIST medical VL research group

medical vision-language

Updated Jun 27, 2024

wjpoom / SPEC

[CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"

language computer-vision vision clip image-retrieval fine-grained robustness text-retrieval multimodal compositionality vision-language vision-language-model cvpr2024 compostional

Updated Jun 27, 2024
Jupyter Notebook

TinyLLaVA / TinyLLaVA_Factory

A Framework of Small-scale Large Multimodal Models

nlp transformers llama vision-language llava large-multimodal-models tinyllama

Updated Jun 25, 2024
Python

longzw1997 / Open-GroundingDino

This is the third party implementation of the paper Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.

open-world object-detection vision-language open-world-detection

Updated Jun 25, 2024
Python

mbzuai-oryx / VideoGPT-plus

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

chatbot clip image-encoder video-encoder multimodal dual-encoder vision-language vicuna gpt4 vision-language-pretraining llava video-conversation video-chatbot llama3 gpt4o phi-3-mini

Updated Jun 20, 2024
Python

bytedance / Shot2Story

A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.

benchmark video-summarization dataset video-captioning video-story vision-language video-question-answering video-language large-language-models video-language-pretraining video-story-generation

Updated Jun 17, 2024
Python

AlibabaResearch / AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

ocr computer-vision artificial-intelligence text-recognition document text-detection document-analysis end-to-end-ocr multimodal scene-text-recognition multimodal-deep-learning scene-text-detection vision-language document-understanding scene-text-detection-recognition document-recognition document-intelligence documentai vision-language-transformer vision-language-model

Updated Jun 16, 2024
C++

mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

chatbot llama clip mulit-modal vision-language vicuna gpt-4 vision-language-pretraining llava video-chatboat video-conversation

Updated Jun 16, 2024
Python

IRVLUTD / Proto-CLIP

Code release for Proto-CLIP: Vision-Language Prototypical Network for Few-Shot Learning

robotics multimodal-learning few-shot-learning vision-language few-shot-classifcation prototype-learning

Updated Jun 15, 2024
Python

sonstory / Paper-Review

Read and review various papers in the field of Vision and Vision-Language.

computer-vision paper-review vision-language image-anomaly-detection

Updated Jun 15, 2024

WisconsinAIVision / ViP-LLaVA

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

chatbot llama multi-modal clip vision-language gpt-4 foundation-models visual-prompting llava llama2 cvpr2024 gpt-4-vision

Updated Jun 12, 2024
Python

hee-suk-yoon / C-TPT

[ICLR'24] Official code for "C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion"

calibration clip vision-language test-time-adaptation iclr2024

Updated Jun 9, 2024
Python

Improve this page

Add a description, image, and links to the vision-language topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-language topic, visit your repo's landing page and select "manage topics."