Multi-modal Large Language Model Collection 🦕

This is a curated list of Multi-modal Large Language Models (MLLM), Multimodal Benchmarks (MMB), Multimodal Instruction Tuning (MMIT), Multimodal In-context Learning (MMIL), Foundation Models (e.g., CLIP families) (FM), and the most popular Parameter-Efficient Tuning methods.

Alignment

MDPO: Conditional Preference Optimization for Multimodal Large Language Models [arXiv 2024/06/17] [Paper]
University of Southern California, University of California, Davis, Microsoft Research
RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness [arXiv 2024/05/27] [Paper] [Code]
Department of Computer Science and Technology, Tsinghua University, NExT++ Lab, School of Computing, National University of Singapore
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback [CVPR 2024] [Paper] [Code] [Homepage]
Tsinghua University, National University of Singapore, Shenzhen International Graduate School, Tsinghua University, Pengcheng Laboratory, Shenzhen, China

Multi-modal Large Language Models (MLLM)

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions [ECCV2024] [Paper] [Code] [Homepage]
University of Science and Technology of China, Shanghai AI Laboratory
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection [arXiv 2024/02/12] [Paper] [Code]
Peking University, Peng Cheng Laboratory, Sun Yat-sen University, Guangzhou, Tencent Data Platform, AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School, FarReel Ai Lab
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models [arXiv 2024/02/12] [Paper] [Code] [Evaluation]
Stanford, Toyota Research Institute
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models [arXiv 2024/03/27] [Paper] [Code] [Project Page]
The Chinese University of Hong Kong, SmartMore
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks [arXiv 2024/01/15] [Paper] [Code]
OpenGVLab, Shanghai AI Laboratory, Nanjing University, The University of Hong Kong, The Chinese University of Hong Kong, Tsinghua University, University of Science and Technology of China, SenseTime Research
GiT: Towards Generalist Vision Transformer through Universal Language Interface [arXiv 2024/03/14] [Paper]
Peking University, Max Planck Institute for Informatics, The Chinese University of Hong Kong Shenzhen, ETH Zurich, The Chinese University of Hong Kong
LLaMA: Open and Efficient Foundation Language Models [arXiv 2023] [Paper] [Github Repo]
Meta AI

Multimodal Benchmarks (MMB)

MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs [arXiv 2024/06/17] [Paper] [Code] [HomePage] [Space🤗]
Wuhan University, Shanghai AI Laboratory, The Chinese University of Hong Kong, MThreads, Inc.

Foundation Models (FM)

Parameter-Efficient Tuning Repo (PETR)

PEFT: Parameter-Efficient Fine-Tuning [HuggingFace 🤗] [Home Page] [Code]
PEFT, or Parameter-Efficient Fine-Tuning (PEFT), is a library for efficiently adapting pre-trained language models (PLMs) to various downstream applications without fine-tuning all the model’s parameters.
LLaMA Efficient Tuning [Github Repo]
Easy-to-use fine-tuning framework using PEFT (PT+SFT+RLHF with QLoRA) (LLaMA-2, BLOOM, Falcon, Baichuan, Qwen).
LLaMA-Adapter: Efficient Fine-tuning of LLaMA 🚀[Code]
Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
LLaMA2-Accessory 🚀[Code]
An Open-source Toolkit for LLM Development
LLaMA Factory: Training and Evaluating Large Language Models with Minimal Effort Code]
Easy-to-use LLM fine-tuning framework (LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, ChatGLM3)

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-modal Large Language Model Collection 🦕

📒Table of Contents

Alignment

Multi-modal Large Language Models (MLLM)

Multimodal Benchmarks (MMB)

Foundation Models (FM)

Parameter-Efficient Tuning Repo (PETR)

About

Releases

Packages

zchoi/Multi-Modal-Large-Language-Learning

Folders and files

Latest commit

History

Repository files navigation

Multi-modal Large Language Model Collection 🦕

📒Table of Contents

Alignment

Multi-modal Large Language Models (MLLM)

Multimodal Benchmarks (MMB)

Foundation Models (FM)

Parameter-Efficient Tuning Repo (PETR)

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages