visual-language-models

Star

Here are 35 public repositories matching this topic...

THUDM / CogVLM

Star

a state-of-the-art-level open visual language model | 多模态预训练模型

pretrained-models language-model multi-modal cross-modality visual-language-models

Updated May 29, 2024
Python

camel-ai / crab

Sponsor

Star

🦀️ CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/

multi-agent-systems gui-automation large-language-models language-model-agent visual-language-models

Updated May 30, 2025
Python

MiniMax-AI / One-RL-to-See-Them-All

Star

The official repo of One RL to See Them All: Visual Triple Unified Reinforcement Learning

rl vlm visual-language-models vlm-rl v-triune orsta

Updated May 31, 2025
Python

bilel-bj / ROSGPT_Vision

Star

Commanding robots using only Language Models' prompts

robotics language-models ros2 robotic-vision large-language-models llm prompt-engineering chatgpt language-models-are-next robotic-design-patterns prompting-robotic-modalities visual-language-models

Updated Feb 16, 2025
Python

hk-zh / language-conditioned-robot-manipulation-models

Star

https://arxiv.org/abs/2312.10807

reinforcement-learning imitation-learning robot-manipulation neural-symbolic foundation-models visual-language-models language-conditioned-learning large-languge-models

Updated Dec 1, 2024

xinyanghuang7 / Basic-Visual-Language-Model

Star

Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖

visual-language-learning large-language-models visual-language-models multimodel-large-language-model

Updated Jun 19, 2024
Python

tianyu-z / VCR

Star

Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.

benchmark deep-learning visual-language-models

Updated Feb 26, 2025
Python

AlignGPT-VL / AlignGPT

Star

Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"

large-language-models multimodal-large-language-models visual-language-models

Updated Jul 12, 2024
Python

jaisidhsingh / CoN-CLIP

Star

Implementation of the "Learn No to Say Yes Better" paper.

deep-learning pytorch multimodal compositionality image-captions image-text-matching visual-language-models

Updated May 28, 2025
Python

kesimeg / awesome-turkish-language-models

Star

A curated list of Turkish AI models, datasets, papers

awesome turkish speech awesome-list turkish-language vlm turkish-nlp large-language-models llm visual-language-models

Updated May 23, 2025

BioMedIA-MBZUAI / FetalCLIP

Star

Official repository of FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis

artificial-intelligence medical-imaging ultrasound-imaging foundation-models visual-language-models fetal-ultrasound fetalclip

Updated Mar 29, 2025
Python

Sid2697 / HOI-Ref

Star

Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"

dataset dataset-generation vlm hand-object-interaction egocentric-vision large-language-models visual-language-models

Updated Apr 16, 2024
Python

amathislab / wildclip

Star

Scene and animal attribute retrieval from camera trap data with domain-adapted vision-language models

behavior computer-vision clip camera-trap computervision visual-language-models

Updated Mar 8, 2024
Python

csebuetnlp / IllusionVQA

Star

This repository contains the data and code of the paper titled "IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models"

vqa vqa-dataset optical-illusions visual-language-models

Updated Apr 27, 2025
Jupyter Notebook

sduzpf / UAP_VLP

Star

Universal Adversarial Perturbations for Vision-Language Pre-trained Models

deep-neural-networks adversarial-attacks visual-language-models

Updated Mar 31, 2025
Python

declare-lab / Sealing

Star

[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"

multimodality video-understanding video-question-answering visual-language-models naacl2024

Updated Jul 25, 2024
Python

CristianoPatricio / concept-based-interpretability-VLM

Star

Code for the paper "Towards Concept-based Interpretability of Skin Lesion Diagnosis using Vision-Language Models", ISBI 2024 (Oral).

deep-learning medical-imaging clip interpretability explainable-ai skin-lesion-classification melanoma-diagnosis concept-based-explanations visual-language-models ieee-isbi

Updated Jun 5, 2024
Jupyter Notebook

GraphPKU / CoI

Star

Chain of Images for Intuitively Reasoning

chatbot llama multimodal chatgpt llava visual-language-models gpt4v dalle3 chain-of-throught chain-of-image

Updated Nov 29, 2023
Python

NxtGenLegend / TreeHacks-ZoneOut

Star

#3 Winner of Best Use of Zoom API at Stanford TreeHacks 2024! An AI-powered meeting assistant that captures video, audio and textual context from Zoom calls using multimodal RAG.

Updated Feb 16, 2025
JavaScript

AikyamLab / hallucinogen

Star

A benchmark for evaluating hallucinations in large visual language models

ai aisafety visual-language-models hallucination-evaluation hallucination-detection medical-safety medical-visual-language-model

Updated Mar 18, 2025
Python

Improve this page

Add a description, image, and links to the visual-language-models topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the visual-language-models topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

visual-language-models

Here are 35 public repositories matching this topic...

THUDM / CogVLM

camel-ai / crab

MiniMax-AI / One-RL-to-See-Them-All

bilel-bj / ROSGPT_Vision

hk-zh / language-conditioned-robot-manipulation-models

xinyanghuang7 / Basic-Visual-Language-Model

tianyu-z / VCR

AlignGPT-VL / AlignGPT

jaisidhsingh / CoN-CLIP

kesimeg / awesome-turkish-language-models

BioMedIA-MBZUAI / FetalCLIP

Sid2697 / HOI-Ref

amathislab / wildclip

csebuetnlp / IllusionVQA

sduzpf / UAP_VLP

declare-lab / Sealing

CristianoPatricio / concept-based-interpretability-VLM

GraphPKU / CoI

NxtGenLegend / TreeHacks-ZoneOut

AikyamLab / hallucinogen

Improve this page

Add this topic to your repo