Skip to content

Awesome multi-modal large language paper/project, collections of popular training strategies, e.g., PEFT, LoRA.

Notifications You must be signed in to change notification settings

zchoi/Multi-Modal-Large-Language-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 

Repository files navigation

Multi-modal Large Language Model Collection 🦕

This is a curated list of Multi-modal Large Language Models (MLLM), Multimodal Benchmarks (MMB), Multimodal Instruction Tuning (MMIT), Multimodal In-context Learning (MMIL), Foundation Models (e.g., CLIP families) (FM), and the most popular Parameter-Efficient Tuning methods.

📒Table of Contents

Alignment

  • MDPO: Conditional Preference Optimization for Multimodal Large Language Models [arXiv 2024/06/17] [Paper]
    University of Southern California, University of California, Davis, Microsoft Research

  • RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness [arXiv 2024/05/27] [Paper] [Code]
    Department of Computer Science and Technology, Tsinghua University, NExT++ Lab, School of Computing, National University of Singapore

  • RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback [CVPR 2024] [Paper] [Code] [Homepage]
    Tsinghua University, National University of Singapore, Shenzhen International Graduate School, Tsinghua University, Pengcheng Laboratory, Shenzhen, China

Multi-modal Large Language Models (MLLM)

  • ShareGPT4V: Improving Large Multi-Modal Models with Better Captions [ECCV2024] [Paper] [Code] [Homepage]
    University of Science and Technology of China, Shanghai AI Laboratory

  • Video-LLaVA: Learning United Visual Representation by Alignment Before Projection [arXiv 2024/02/12] [Paper] [Code]
    Peking University, Peng Cheng Laboratory, Sun Yat-sen University, Guangzhou, Tencent Data Platform, AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School, FarReel Ai Lab

  • Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models [arXiv 2024/02/12] [Paper] [Code] [Evaluation]
    Stanford, Toyota Research Institute

  • Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models [arXiv 2024/03/27] [Paper] [Code] [Project Page]
    The Chinese University of Hong Kong, SmartMore

  • InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks [arXiv 2024/01/15] [Paper] [Code]
    OpenGVLab, Shanghai AI Laboratory, Nanjing University, The University of Hong Kong, The Chinese University of Hong Kong, Tsinghua University, University of Science and Technology of China, SenseTime Research

  • GiT: Towards Generalist Vision Transformer through Universal Language Interface [arXiv 2024/03/14] [Paper]
    Peking University, Max Planck Institute for Informatics, The Chinese University of Hong Kong Shenzhen, ETH Zurich, The Chinese University of Hong Kong

  • LLaMA: Open and Efficient Foundation Language Models [arXiv 2023] [Paper] [Github Repo]
    Meta AI

Multimodal Benchmarks (MMB)

  • MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs [arXiv 2024/06/17] [Paper] [Code] [HomePage] [Space🤗]
    Wuhan University, Shanghai AI Laboratory, The Chinese University of Hong Kong, MThreads, Inc.

Foundation Models (FM)

Parameter-Efficient Tuning Repo (PETR)

  • PEFT: Parameter-Efficient Fine-Tuning [HuggingFace 🤗] [Home Page] [Code]
    PEFT, or Parameter-Efficient Fine-Tuning (PEFT), is a library for efficiently adapting pre-trained language models (PLMs) to various downstream applications without fine-tuning all the model’s parameters.

  • LLaMA Efficient Tuning [Github Repo]
    Easy-to-use fine-tuning framework using PEFT (PT+SFT+RLHF with QLoRA) (LLaMA-2, BLOOM, Falcon, Baichuan, Qwen).

  • LLaMA-Adapter: Efficient Fine-tuning of LLaMA 🚀[Code]
    Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

  • LLaMA2-Accessory 🚀[Code]
    An Open-source Toolkit for LLM Development

  • LLaMA Factory: Training and Evaluating Large Language Models with Minimal Effort Code]
    Easy-to-use LLM fine-tuning framework (LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, ChatGLM3)

About

Awesome multi-modal large language paper/project, collections of popular training strategies, e.g., PEFT, LoRA.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages