Skip to content

zzw-zwzhang/Awesome-of-Multimodal-Dialogue-Models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 

Repository files navigation

Awesome-of-Multimodal-Dialogue-Models Awesome

A curated list of multimodal dialogue models and related resources.

Please feel free to pull requests or open an issue to add papers.

🔆 Updated 2023-07-07


Table of Contents

Type of Multimodal Dialogue Models

Type UIUO MIUO MIMO L2V V2L Other
Explanation Unimodal Input & Unimodal Output Multimodal Input & Unimodal Output Multimodal Input & Multimodal Output Language to Vision Vision to Language other types

arXiv

Title Date Type Code Star
What Matters in Training a GPT4-Style Language Model with Multimodal Inputs? 2023.07.05 MIUO PyTorch(Author) Github stars
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs 2023.07.03 MIUO PyTorch(Author) Github stars
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding 2023.06.29 MIUO PyTorch(Author) Github stars
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn 2023.06.28 MIUO PyTorch(Author) Github stars
KOSMOS-2: Grounding Multimodal Large Language Models to the World 2023.06.27 MIUO PyTorch(Author) Github stars
Aligning Large Multi-Modal Model with Robust Instruction Tuning 2023.06.26 MIUO PyTorch(Author) Github stars
Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration 2023.06.15 MIUO PyTorch(Author) Github stars
Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation 2023.06.14 MIMO PyTorch(Author) Github stars
Grounding Language Models to Images for Multimodal Inputs and Outputs 2023.06.13 MIMO PyTorch(Author) Github stars
MultiModal-GPT: A Vision and Language Model for Dialogue with Humans 2023.06.13 MIUO PyTorch(Author) Github stars
MIMIC-IT: Multi-Modal In-Context Instruction Tuning 2023.06.08 MIUO PyTorch(Author) Github stars
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models 2023.06.08 MIUO PyTorch(Author) Github stars
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction 2023.05.30 MIMO PyTorch(Author) Github stars
Controllable Text-to-Image Generation with GPT-4 2023.05.29 MIUO PyTorch(Author) Github stars
Mindstorms in Natural Language-Based Societies of Mind 2023.05.26 MIMO PyTorch(Author) Github stars
Generating Images with Multimodal Language Models 2023.05.26 MIMO PyTorch(Author) Github stars
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face 2023.05.25 MIMO PyTorch(Author) Github stars
ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst 2023.05.25 MIUO PyTorch(Author) Github stars
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought 2023.05.24 MIUO PyTorch(Author) Github stars
LMEye: An Interactive Perception Network for Large Language Models 2023.05.19 MIUO PyTorch(Author) Github stars
Visual Instruction Tuning 2023.05.17 MIUO PyTorch(Author) Github stars
VideoChat: Chat-Centric Video Understanding 2023.05.10 MIUO PyTorch(Author) Github stars
Caption Anything: Interactive Image Description with Diverse Multimodal Controls 2023.05.08 MIUO PyTorch(Author) Github stars
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models 2023.04.20 MIUO PyTorch(Author) Github stars
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs 2023.03.29 MIUO PyTorch(Author) Github stars
GPT-4 Technical Report 2023.03.27 MIMO PyTorch(Author) Github stars
PandaGPT: One Model To Instruction-Follow Them All 2023.03.25 MIUO PyTorch(Author) Github stars
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models 2023.03.08 MIMO PyTorch(Author) Github stars
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action 2023.03.20 MIUO PyTorch(Author) Github stars
2023.06.13 MIUO PyTorch(Author) Github stars
2023.06.13 MIUO PyTorch(Author) Github stars

2023

Title Venue Type Code Star
InstructPix2Pix Learning to Follow Image Editing Instructions CVPR-Highlight MIUO PyTorch(Author) Github stars
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models ICML V2L PyTorch(Author) Github stars
NeurIPS UIUO PyTorch(Author) Github stars

2022

Title Venue Type Code Star
Flamingo: a Visual Language Model for Few-Shot Learning NeurIPS MIUO PyTorch(Author) Github stars
NeurIPS UIUO PyTorch(Author) Github stars
NeurIPS UIUO PyTorch(Author) Github stars

2021

Title Venue Type Code Star
NeurIPS UIUO PyTorch(Author) Github stars
NeurIPS UIUO PyTorch(Author) Github stars
NeurIPS UIUO PyTorch(Author) Github stars

2020

Title Venue Type Code Star
NeurIPS UIUO PyTorch(Author) Github stars
NeurIPS UIUO PyTorch(Author) Github stars
NeurIPS UIUO PyTorch(Author) Github stars

Previous Venues

Title Venue Type Code Star
NeurIPS UIUO PyTorch(Author) Github stars
NeurIPS UIUO PyTorch(Author) Github stars
NeurIPS UIUO PyTorch(Author) Github stars

Awesome Surveys

Awesome Blogs

Awesome Multimodal Datasets

About

A curated list of multimodal dialogue models resources.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published