Composition of Multimodal Language Models From Scratch
-
Updated
Jul 2, 2024 - Jupyter Notebook
Composition of Multimodal Language Models From Scratch
This is the official implementation (code, data) of the paper "MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?""
[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds
Code for the MultipanelVQA benchmark "Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA"
AUITestAgent is the first automatic, natural language-driven GUI testing tool for mobile apps, capable of fully automating the entire process of GUI interaction and function verification.
MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discover
Awesome list for attacks on large language models.
使用LLaMA-Factory微调多模态大语言模型的示例代码 Demo of Finetuning Multimodal LLM with LLaMA-Factory
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust)
Datasets, case studies and benchmarks for extracting structured information from PDFs, HTML files or images, created by the Parsee.ai team. Datasets also on Hugging Face: https://huggingface.co/parsee-ai
A Video Chat Agent with Temporal Prior
mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
MOSSBench: A webpage for an oversensitivity benchmark
中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine
Undergraduate Dissertation of Guilin University of Electronic Technology
Add a description, image, and links to the mllm topic page so that developers can more easily learn about it.
To associate your repository with the mllm topic, visit your repo's landing page and select "manage topics."