#

multimodal-large-language-models

Here are 83 public repositories matching this topic...

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

multi-modality instruction-following in-context-learning large-language-models chain-of-thought instruction-tuning visual-instruction-tuning large-vision-language-model multimodal-instruction-tuning large-vision-language-models multimodal-large-language-models visual-in-context-learning multimodal-in-context-learning visual-chain-of-thought multimodal-chain-of-thought

Updated Jul 11, 2024

modelscope / modelscope-agent

ModelScope-Agent: An agent framework connecting models in ModelScope with the world

agent chatbot android-application multi-agents rag mobile-agents gpts llm multimodal-large-language-models qwen assistantapi chatglm-4 open-gpts mobile-agent

Updated Jul 20, 2024
Python

X-PLUG / MobileAgent

Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

android agent harmony ios app gui automation mobile copilot multimodal mobile-agents mllm multimodal-large-language-models gpt4v multimodal-agent

Updated Jul 15, 2024
Python

cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

computer-vision chatbot representation-learning clip dino large-language-models llms instruction-tuning mllm multimodal-large-language-models

Updated Jul 6, 2024
Python

YangLing0818 / RPG-DiffusionMaster

[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)

text-to-image image-editting large-language-models multimodal-large-language-models

Updated Jun 6, 2024
Jupyter Notebook

X-PLUG / mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

multimodal table-understanding document-understanding mllm multimodal-large-language-models chart-understanding

Updated Jul 16, 2024
Python

BAAI-DCAI / Bunny

A family of lightweight multimodal models.

english chinese vlm gpt-4 chatgpt mllm multimodal-large-language-models

Updated Jul 14, 2024
Python

LLaVA-VL / LLaVA-Plus-Codebase

LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills

agent tool-use large-language-models multimodal-large-language-models large-multimodal-models

Updated Feb 1, 2024
Python

richard-peng-xia / awesome-multimodal-in-medical-imaging

A collection of resources on applications of multi-modal learning in medical imaging.

medical-imaging multimodal-learning visual-question-answering multimodal-deep-learning large-language-models medical-report-generation multimodal-large-language-models large-multimodal-models

Updated Jul 18, 2024

rese1f / MovieChat

[CVPR 2024] 🎬💭 chat with over 10K frames of video!

computer-vision dataset llama large-language-models long-video-understanding multimodal-large-language-models

Updated Jun 16, 2024
Python

X-LANCE / SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

speech-processing audio-processing peft music-processing large-language-model multimodal-large-language-models

Updated Jul 19, 2024
Python

BradyFU / Woodpecker

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.

multimodality hallucination hallucinations large-language-models llm mllm multimodal-large-language-models

Updated Jun 17, 2024
Python

tsujuifu / pytorch_mgie

A Gradio demo of MGIE

pytorch image-editing vision-and-language multimodal-large-language-models iclr2024

Updated Feb 23, 2024
Python

Coobiw / MPP-LLaVA

Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.

fine-tuning pipeline-parallelism pretraining model-parallel deepspeed mllm multimodal-large-language-models qwen video-large-language-models video-language-model

Updated Jul 16, 2024
Jupyter Notebook

HenryHZY / Awesome-Multimodal-LLM

Research Trends in LLM-guided Multimodal Learning.

multimodal-learning multimodal parameter-efficient-learning in-context-learning large-language-models llm parameter-efficient-tuning instruction-tuning multimodal-large-language-models

Updated Oct 17, 2023

YingqingHe / Awesome-LLMs-meet-Multimodal-Generation

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

text-to-speech multimodality text-to-image text-to-audio text-to-video text-to-music multimodal-models aigc large-language-models text-to-3d multimodal-generation text-to-sound large-vision-language-models multimodal-large-language-models

Updated Jul 8, 2024
HTML

burglarhobbit / Awesome-Medical-Large-Language-Models

Curated papers on Large Language Models in Healthcare and Medical domain

large-language-models large-vision-language-models multimodal-large-language-models

Updated Jul 15, 2024

BradyFU / Video-MME

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

video mme large-language-models large-vision-language-models multimodal-large-language-models video-mme

Updated Jun 18, 2024

X-PLUG / Youku-mPLUG

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks

benchmark video dataset chinese youku multimodal video-retrieval video-question-answering multimodal-pretraining mllm multimodal-large-language-models

Updated Jan 8, 2024
Python

AviSoori1x / seemore

From scratch implementation of a vision language model in pure PyTorch

deep-learning pytorch artificial-intelligence neural-networks multimodal-learning multimodal pytorch-implementation large-language-models llm vision-language-model llava multimodal-large-language-models

Updated May 6, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the multimodal-large-language-models topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal-large-language-models topic, visit your repo's landing page and select "manage topics."