Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
-
Updated
Jul 15, 2024 - Python
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
This project is the official implementation of 'LLMGA: Multimodal Large Language Model based Generation Assistant', ECCV2024
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection
Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".
A collection of visual instruction tuning datasets.
EVE: Encoder-Free Vision-Language Models from BAAI
Add a description, image, and links to the mllm topic page so that developers can more easily learn about it.
To associate your repository with the mllm topic, visit your repo's landing page and select "manage topics."