#

multimodal-large-language-models

Here are 62 public repositories matching this topic...

X-PLUG / MobileAgent

Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

android agent harmony ios app gui automation mobile copilot multimodal mobile-agents mllm multimodal-large-language-models gpt4v multimodal-agent

Updated Aug 16, 2024
Python

modelscope / modelscope-agent

ModelScope-Agent: An agent framework connecting models in ModelScope with the world

agent data-science code chatbot android-application multi-agents rag mobile-agents gpts llm multimodal-large-language-models qwen assistantapi chatglm-4 open-gpts mobile-agent codexgraph data-science-assistant

Updated Aug 21, 2024
Python

cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

computer-vision chatbot representation-learning clip dino large-language-models llms instruction-tuning mllm multimodal-large-language-models

Updated Aug 3, 2024
Python

X-PLUG / mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

multimodal table-understanding document-understanding mllm multimodal-large-language-models chart-understanding

Updated Jul 16, 2024
Python

BAAI-DCAI / Bunny

A family of lightweight multimodal models.

english chinese vlm gpt-4 chatgpt mllm multimodal-large-language-models

Updated Aug 2, 2024
Python

LLaVA-VL / LLaVA-Plus-Codebase

LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills

agent tool-use large-language-models multimodal-large-language-models large-multimodal-models

Updated Feb 1, 2024
Python

BradyFU / Woodpecker

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.

multimodality hallucination hallucinations large-language-models llm mllm multimodal-large-language-models

Updated Jun 17, 2024
Python

rese1f / MovieChat

[CVPR 2024] 🎬💭 chat with over 10K frames of video!

computer-vision dataset llama large-language-models long-video-understanding multimodal-large-language-models

Updated Jun 16, 2024
Python

X-LANCE / SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

speech-processing audio-processing peft music-processing large-language-model multimodal-large-language-models

Updated Aug 20, 2024
Python

tsujuifu / pytorch_mgie

A Gradio demo of MGIE

pytorch image-editing vision-and-language multimodal-large-language-models iclr2024

Updated Feb 23, 2024
Python

X-PLUG / Youku-mPLUG

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks

benchmark video dataset chinese youku multimodal video-retrieval video-question-answering multimodal-pretraining mllm multimodal-large-language-models

Updated Jan 8, 2024
Python

baaivision / EVE

EVE: Encoder-Free Vision-Language Models

clip vlm instruction-following large-language-models llm mllm multimodal-large-language-models vision-language-models encoder-free-vlm

Updated Jul 20, 2024
Python

hustvl / EVF-SAM

Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"

segmentation multimodal referring-image-segmentation segment-anything multimodal-large-language-models

Updated Aug 21, 2024
Python

zjukg / KoPA

[Paper][ACM MM 2024] Making Large Language Models Perform Better in Knowledge Graph Completion

knowledge-graph knowledge-graph-completion multi-modal knowledge-graph-embeddings large-language-models instruction-tuning multimodal-large-language-models

Updated Aug 12, 2024
Python

JUNJIE99 / MLVU

🔥🔥MLVU: Multi-task Long Video Understanding Benchmark

benchmark videos multi-task long-video-understanding multimodal-large-language-models mlvu

Updated Aug 20, 2024
Python

zjysteven / lmms-finetune

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, qwen-vl, phi3-v etc.

finetuning multimodal vision-language foundation-models instruction-tuning large-language-model llava visual-instruction-tuning multimodal-large-language-models large-multimodal-models qwen-vl llava-next

Updated Aug 20, 2024
Python

baaivision / DenseFusion

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

vlm image-descriptions visual-perception mllm multimodal-large-language-models vision-language-models

Updated Jul 31, 2024
Python

zeyofu / BLINK_Benchmark

This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.org/abs/2404.12390 [ECCV 2024]

benchmark natural-language-processing ai computer-vision perception multimodal-learning multimodal vision-and-language 3d-understanding multimodal-large-language-models perception-evaluation

Updated Jul 3, 2024
Python

AIDC-AI / Ovis

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

chatbot multimodality multimodal vision-language-model multimodal-large-language-models vision-language-learning qwen llama3

Updated Aug 20, 2024
Python

invictus717 / MiCo

Explore the Limits of Omni-modal Pretraining at Scale

deep-learning scale-up multimodal pretraining multimodal-large-language-models omnimodal

Updated Jun 28, 2024
Python

Improve this page

Add a description, image, and links to the multimodal-large-language-models topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal-large-language-models topic, visit your repo's landing page and select "manage topics."