#

multimodal-large-language-models

Here are 56 public repositories matching this topic...

NotYuSheng / Multimodal-Large-Language-Model

Localized Multimodal Large Language Model (MLLM) integrated with Streamlit and Ollama for text and image processing tasks.

multimodal large-language-models llm llava multimodal-large-language-models ollama visual-large-language-models

Updated Jul 19, 2024
Python

sitamgithub-MSIT / TechSage

chatbot artificial-intelligence gradio techbot gemini-api multimodal-data huggingface-spaces generative-ai multimodal-large-language-models gemini-pro-vision gemini-pro

Updated Jul 1, 2024
Python

surakku / cadence-gemma

Giving RecurrentGemma sight.

natural-language-processing multimodal-large-language-models

Updated Jul 16, 2024
Python

CKeibel / FHSWF-deep-learning

Multimodal RAG and comparisons between language models. (Project for Deep Learning Module at the FHSWF)

machine-learning deep-learning multimodal rag multimodal-large-language-models multimodal-rag

Updated Jul 19, 2024
Python

DistilledCode / mmrl

Multi-Modal Representational Learning for Social Media Popularity Prediction

neural-network embeddings data-pipeline multimodal-deep-learning praw-reddit airflow-dags chromadb multimodal-large-language-models

Updated Jun 30, 2024
Python

pipixin321 / Arcana

Implementation of "Arcana: Improving Multi-modal Large Language Model through Boosting Vision Capabilitie"

visual perception lora clip multimodal-large-language-models

Updated Jun 7, 2024
Python

sitamgithub-MSIT / well-being

Reducing neonatal and under-5 mortality rates via an AI-driven awareness platform with a Gradio app, Gemini API integration, and essential project utilities. #AIForGood

chatbot artificial-intelligence gradio gemini-api multimodal-data huggingface-spaces generative-ai multimodal-large-language-models gemini-15-pro

Updated Jul 1, 2024
Python

philbertmukunzi / OmniSage

OmniSage: AI-Powered Discord Bot. OmniSage is a versatile Discord bot that leverages Large Language Model (LLMs) to generate intelligent responses, join voice channels, provide text-to-speech functionality, and includes an interactive, AI-powered trivia game. It's designed to be your all-knowing companion in Discord servers.

fun ai discord chatbot discord-bot llm chatgpt multimodal-large-language-models

Updated Jul 17, 2024
Python

sitamgithub-MSIT / streamlit-app-builder

A Streamlit-based AI assistant generates custom Streamlit app code from user-provided images or text using the Google Gemini model.

artificial-intelligence code-generation gemini-api multimodal-data streamlit streamlit-webapp generative-ai multimodal-large-language-models gemini-15-pro

Updated Jun 29, 2024
Python

ntropy-ai / ntropy

Ntropy AI: unleash the power of multimodal agents

agents multimodal-large-language-models

Updated Jul 19, 2024
Python

hari-huynh / viVQA-voice-assistant

Voice assistant using Multimodal LLMs - LLaVA-NeXT (Mistral 7B) finetuned & PhoWhisper

text-to-speech lora visual-question-answering llava multimodal-large-language-models audio-speech-recognition mistral-7b

Updated May 15, 2024
Python

patrick-tssn / MM-NIAVH

Pressure Testing Large Video-Language Models (LVLM): Doing multimodal retrieval from LVLM at any video lengths to measure accuracy

video-language llm pressure-testing multimodal-large-language-models

Updated Jun 21, 2024
Python

zjr2000 / REVERIE

[ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models

dataset rationale vision-language visual-instruction-tuning multimodal-large-language-models

Updated Jul 17, 2024
Python

rohit901 / VANE-Bench

Contains code and documentation for our VANE-Bench paper.

benchmark-datasets multimodal-deep-learning video-anomaly-detection large-language-models multimodal-large-language-models large-multimodal-models

Updated Jun 18, 2024
Python

scofield7419 / EmpathyEar

Multimodal Empathetic Chatbot

empathetic-responses empathetic-ai multimodal-large-language-models

Updated Jul 16, 2024
Python

zjunlp / EasyDetect

[ACL 2024] An Easy-to-use Hallucination Detection Framework for LLMs.

natural-language-processing artificial-intelligence knowledge-graph generation multimodal hallucination aigc large-language-models generative-ai model-editing knowledge-editing multimodal-large-language-models knowlm easydetect hallucination-detection

Updated Jul 15, 2024
Python

patrick-tssn / VideoHallucer

VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)

multimodal-large-language-models hallucination-detection video-language-model video-hallucination

Updated Jun 25, 2024
Python

richard-peng-xia / RULE

[arXiv'24] RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models

medical-image-analysis multimodal-large-language-models retrieval-augmented-generation medical-vision-language-model

Updated Jul 14, 2024
Python

eric-ai-lab / MMWorld

Official repo of the paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"

evaluation video-understanding video-dataset multi-disciplinary multimodal-large-language-models world-model

Updated Jul 2, 2024
Python

bigai-nlco / LSTP-Chat

A Video Chat Agent with Temporal Prior

spatial-temporal video-language llm mllm visual-instruction-tuning multimodal-large-language-models

Updated Feb 28, 2024
Python

Improve this page

Add a description, image, and links to the multimodal-large-language-models topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal-large-language-models topic, visit your repo's landing page and select "manage topics."