👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
-
Updated
Feb 29, 2024 - Python
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
Chain together LLMs for reasoning & orchestrate multiple large models for accomplishing complex tasks
Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
This repository provides an interactive image colorization tool that leverages Stable Diffusion (SDXL) and BLIP for user-controlled color generation. With a retrained model using the ControlNet approach, users can upload images and specify colors for different objects, enhancing the colorization process through a user-friendly Gradio interface.
Image captioning using python and BLIP
[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives
Collection of OSS models that are containerized into a serving container
CLIP Interrogator, fully in HuggingFace Transformers 🤗, with LongCLIP & CLIP's own words and / or *your* own words!
SAM + CLIP + DIFFUSION for image to edit objects in images using plain text
oCaption: Leveraging OpenAI's GPT-4 Vision for Advanced Image Captioning
Securade.ai Sentinel - A monitoring and surveillance application that enables visual Q&A and video captioning for existing CCTV cameras.
Explore a project that develops a SLAM-based navigation system using vision-language data inputs. This project integrates natural language vocal instructions and image feeds to guide a differential drive robot equipped with a Kinect V2 sensor through dynamic environments.
MultiCLIP: A framework for multimodal-multilabel-multistage classification utilizing advanced pretrained models like CLIP and BLIP. 一个多模态多标签多阶段分类框架,利用像CLIP和BLIP这样的先进预训练模型。
This advanced script let you manipulate XML folder to change GTAV blips position and generate new XML.
Add a description, image, and links to the blip topic page so that developers can more easily learn about it.
To associate your repository with the blip topic, visit your repo's landing page and select "manage topics."