Awesome-of-Multimodal-Dialogue-Models

A curated list of multimodal dialogue models and related resources.

Please feel free to pull requests or open an issue to add papers.

🔆 Updated 2023-07-07

Title	Date	Type	Code
What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?	2023.07.05	`MIUO`	PyTorch(Author)
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs	2023.07.03	`MIUO`	PyTorch(Author)
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding	2023.06.29	`MIUO`	PyTorch(Author)
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn	2023.06.28	`MIUO`	PyTorch(Author)
KOSMOS-2: Grounding Multimodal Large Language Models to the World	2023.06.27	`MIUO`	PyTorch(Author)
Aligning Large Multi-Modal Model with Robust Instruction Tuning	2023.06.26	`MIUO`	PyTorch(Author)
Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration	2023.06.15	`MIUO`	PyTorch(Author)
Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation	2023.06.14	`MIMO`	PyTorch(Author)
Grounding Language Models to Images for Multimodal Inputs and Outputs	2023.06.13	`MIMO`	PyTorch(Author)
MultiModal-GPT: A Vision and Language Model for Dialogue with Humans	2023.06.13	`MIUO`	PyTorch(Author)
MIMIC-IT: Multi-Modal In-Context Instruction Tuning	2023.06.08	`MIUO`	PyTorch(Author)
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models	2023.06.08	`MIUO`	PyTorch(Author)
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction	2023.05.30	`MIMO`	PyTorch(Author)
Controllable Text-to-Image Generation with GPT-4	2023.05.29	`MIUO`	PyTorch(Author)
Mindstorms in Natural Language-Based Societies of Mind	2023.05.26	`MIMO`	PyTorch(Author)
Generating Images with Multimodal Language Models	2023.05.26	`MIMO`	PyTorch(Author)
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face	2023.05.25	`MIMO`	PyTorch(Author)
ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst	2023.05.25	`MIUO`	PyTorch(Author)
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought	2023.05.24	`MIUO`	PyTorch(Author)
LMEye: An Interactive Perception Network for Large Language Models	2023.05.19	`MIUO`	PyTorch(Author)
Visual Instruction Tuning	2023.05.17	`MIUO`	PyTorch(Author)
VideoChat: Chat-Centric Video Understanding	2023.05.10	`MIUO`	PyTorch(Author)
Caption Anything: Interactive Image Description with Diverse Multimodal Controls	2023.05.08	`MIUO`	PyTorch(Author)
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models	2023.04.20	`MIUO`	PyTorch(Author)
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs	2023.03.29	`MIUO`	PyTorch(Author)
GPT-4 Technical Report	2023.03.27	`MIMO`	PyTorch(Author)
PandaGPT: One Model To Instruction-Follow Them All	2023.03.25	`MIUO`	PyTorch(Author)
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models	2023.03.08	`MIMO`	PyTorch(Author)
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action	2023.03.20	`MIUO`	PyTorch(Author)
	2023.06.13	`MIUO`	PyTorch(Author)
	2023.06.13	`MIUO`	PyTorch(Author)

2023

Title	Venue	Type	Code
InstructPix2Pix Learning to Follow Image Editing Instructions	CVPR-Highlight	`MIUO`	PyTorch(Author)
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models	ICML	`V2L`	PyTorch(Author)
	NeurIPS	`UIUO`	PyTorch(Author)