ModelScope-Agent: An agent framework connecting models in ModelScope with the world
-
Updated
Jul 20, 2024 - Python
ModelScope-Agent: An agent framework connecting models in ModelScope with the world
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
[CVPR 2024] 🎬💭 chat with over 10K frames of video!
Speech, Language, Audio, Music Processing with Large Language Model
A Gradio demo of MGIE
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
[Paper][Preprint 2023] Making Large Language Models Perform Better in Knowledge Graph Completion
[Preprint] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Moondream is a lightweight multimodal large language model
Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs
This is the official implementation of the paper "Needle In A Multimodal Haystack"
🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)
Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"
EVE: Encoder-Free Vision-Language Models from BAAI
Matryoshka Multimodal Models
Add a description, image, and links to the multimodal-large-language-models topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-large-language-models topic, visit your repo's landing page and select "manage topics."