mllm

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

image-captioning nodes vlm custom-nodes img2text llm mllm llava comfyui siglip phi15 joytag img2sfx

Updated Jul 4, 2024
Python

X-PLUG / mPLUG-2

Star

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)

video vqa image-retrieval multimodal video-retrieval video-question-answering foundation-models multimodal-pretraining mllm mplug

Updated Jul 21, 2023
Python

X-PLUG / Youku-mPLUG

Star

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks

benchmark video dataset chinese youku multimodal video-retrieval video-question-answering multimodal-pretraining mllm multimodal-large-language-models

Updated Jan 8, 2024
Python

TIGER-AI-Lab / Mantis

Star

Official code for Paper "Mantis: Multi-Image Instruction Tuning"

language video vision mantis vlm multimodal lmm fuyu mllm llava-llama3 multi-image-understanding

Updated Jul 12, 2024
Python

360CVGroup / SEEChat

Star

Multimodal chatbot with computer vision capabilities integrated

chatbot gpt4 mllm

Updated May 17, 2024
Python

FoundationVision / GenerateU

Star

[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection

open-world object-detection multimodality open-vocabulary mllm open-vocabulary-detection

Updated Mar 25, 2024
Python

sterzhang / image-textualization

Star

Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions

dense-captioning text-image mllm

Updated Jul 8, 2024
Python

CircleRadon / TokenPacker

Star

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".

connector lmm mllm token-reduction visual-projector tokenpacker

Updated Jul 9, 2024
Python

BAAI-DCAI / DataOptim

Star

A collection of visual instruction tuning datasets.

llm mllm visual-instruction-tuning

Updated Mar 14, 2024
Python

baaivision / EVE

Star

EVE: Encoder-Free Vision-Language Models from BAAI

clip vlm instruction-following large-language-models llm mllm multimodal-large-language-models vision-language-models encoder-free-vlm

Updated Jul 14, 2024
Python

Improve this page

Add a description, image, and links to the mllm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the mllm topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mllm

Here are 35 public repositories matching this topic...

microsoft / unilm

X-PLUG / MobileAgent

InternLM / InternLM-XComposer

cambrian-mllm / cambrian

X-PLUG / mPLUG-DocOwl

BAAI-DCAI / Bunny

FoundationVision / Groma

CircleRadon / Osprey

BradyFU / Woodpecker

dvlab-research / LLMGA

gokayfem / ComfyUI_VLM_nodes

X-PLUG / mPLUG-2

X-PLUG / Youku-mPLUG

TIGER-AI-Lab / Mantis

360CVGroup / SEEChat

FoundationVision / GenerateU

sterzhang / image-textualization

CircleRadon / TokenPacker

BAAI-DCAI / DataOptim

baaivision / EVE

Improve this page

Add this topic to your repo