Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
-
Updated
Jun 18, 2024 - Python
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
ModelScope-Agent: An agent framework connecting models in ModelScope with the world
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
[CVPR 2024] 🎬💭 chat with over 10K frames of video!
Speech, Language, Audio, Music Processing with Large Language Model
A Gradio demo of MGIE
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
[Paper][Preprint 2023] Making Large Language Models Perform Better in Knowledge Graph Completion
🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)
mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating
Explore the Limits of Omni-modal Pretraining at Scale
An Easy-to-use Hallucination Detection Framework for LLMs.
Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs
This is the official implementation of the paper "Needle In A Multimodal Haystack"
Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".
Moondream is a lightweight multimodal large language model
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
Add a description, image, and links to the multimodal-large-language-models topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-large-language-models topic, visit your repo's landing page and select "manage topics."