Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
-
Updated
Sep 5, 2023 - Python
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
This repository includes the official implementation of our paper "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"
Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.
Kani extension for supporting vision-language models (VLMs). Comes with model-agnostic support for GPT-Vision and LLaVA.
Chain of Images for Intuitively Reasoning
Python-based WebSocket for CLI LLaVA inference.
Computer Vision Research for Multimedia Understanding at DSO National Laboratories Internship 2023 under the DSTA JC Scholarship
A Multimodal Discord bot with machine learning functions, including LLM chat, Image generation, and Speech Generation capabilities
⚗️ Llava 7b model repository trained by liuhaotian managed by DVC
Image Classification Testing with LLMs
LLaVA base model for use with Autodistill.
Joint work as part of a bachelor's thesis on utilizing a combination of NLP and CV methods in implementing multimodal approaches to combat hate speech in memes.
Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥
Fine tune LLaVA 1.5 - based on article by wandb
a Discord chatbot trained on Mistral and LLaVA language models
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
Tiny-scale experiment showing that CLIP models trained using detailed captions generated by multimodal models (CogVLM and LLaVA 1.5) outperform models trained using the original alt-texts on a range of classification and retrieval tasks.
Add a description, image, and links to the llava topic page so that developers can more easily learn about it.
To associate your repository with the llava topic, visit your repo's landing page and select "manage topics."