Open Source + Multilingual MLLM + Fine-tuning + Distillation + More efficient models and learning + ?
-
Updated
Mar 27, 2023 - C
Open Source + Multilingual MLLM + Fine-tuning + Distillation + More efficient models and learning + ?
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
This repository includes the official implementation of our paper "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating
A Video Chat Agent with Temporal Prior
Awesome list for attacks on large language models.
MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discover
A collection of visual instruction tuning datasets.
[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection
Code for the MultipanelVQA benchmark "Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA"
Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
Unified Multi-modal IAA Baseline and Benchmark
Datasets, case studies and benchmarks for extracting structured information from PDFs, HTML files or images, created by the Parsee.ai team. Datasets also on Hugging Face: https://huggingface.co/parsee-ai
中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine
Add a description, image, and links to the mllm topic page so that developers can more easily learn about it.
To associate your repository with the mllm topic, visit your repo's landing page and select "manage topics."