Localized Multimodal Large Language Model (MLLM) integrated with Streamlit and Ollama for text and image processing tasks.
-
Updated
Jul 19, 2024 - Python
Localized Multimodal Large Language Model (MLLM) integrated with Streamlit and Ollama for text and image processing tasks.
Giving RecurrentGemma sight.
Multimodal RAG and comparisons between language models. (Project for Deep Learning Module at the FHSWF)
Multi-Modal Representational Learning for Social Media Popularity Prediction
Implementation of "Arcana: Improving Multi-modal Large Language Model through Boosting Vision Capabilitie"
Reducing neonatal and under-5 mortality rates via an AI-driven awareness platform with a Gradio app, Gemini API integration, and essential project utilities. #AIForGood
OmniSage: AI-Powered Discord Bot. OmniSage is a versatile Discord bot that leverages Large Language Model (LLMs) to generate intelligent responses, join voice channels, provide text-to-speech functionality, and includes an interactive, AI-powered trivia game. It's designed to be your all-knowing companion in Discord servers.
A Streamlit-based AI assistant generates custom Streamlit app code from user-provided images or text using the Google Gemini model.
Ntropy AI: unleash the power of multimodal agents
Voice assistant using Multimodal LLMs - LLaVA-NeXT (Mistral 7B) finetuned & PhoWhisper
Pressure Testing Large Video-Language Models (LVLM): Doing multimodal retrieval from LVLM at any video lengths to measure accuracy
[ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
Contains code and documentation for our VANE-Bench paper.
Multimodal Empathetic Chatbot
[ACL 2024] An Easy-to-use Hallucination Detection Framework for LLMs.
VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)
[arXiv'24] RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models
Official repo of the paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
A Video Chat Agent with Temporal Prior
Add a description, image, and links to the multimodal-large-language-models topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-large-language-models topic, visit your repo's landing page and select "manage topics."