[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
-
Updated
Oct 16, 2024 - Python
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
🎩 An Alfred 5 Workflow for using OpenAI Chat API to interact with GPT models 🤖💬 It also allows image generation/editing/understanding 🖼️, speech-to-text conversion 🎤, and text-to-speech synthesis 🔈
A Unified Framework for Image-to-Graph Generation. Paper accepted @ ECCV22.
WACV 2024 Papers: Discover cutting-edge research from WACV 2024, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ support visual intelligence development!
This is the implement of the paper "DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding"
A deep learning project to tell a story with an image or a video.
This GitHub repository shows how to integrate openai GPT-3 language model and ChatGPT API into a Unity project. It can be a useful way to add natural language processing capabilities to your application.
Collection of open datasets in computer vision.
A large-scale curated dataset of Visual.ly infographics with metadata and additional crowdsourced annotations for research applications in computer vision and natural language processing.
Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models
HumanVLM (LLaVA-based): Foundation for Human-Scene Vision-Language Model (Journal of Information Fusion 2025)
🖼️📄E2E Multi-modal Document Preprocessing for Search Indexing with Azure Document Intelligence
A reimplementation of the paper Human-Aligned Image Models Improve Visual Decoding from the Brain
Annuncio generates product advertisements from user inputs, utilizing Aria for descriptions, Allegro for promotional videos, and hashtags for social media discoverability.
🏷This repository contains the lab sheets of Image Understanding & Processing (SE4130) Module in Year 4 Semester 1.
2022-1 Image Understanding Assignments & Projects
This is the implement of the paper "DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding"
Add a description, image, and links to the image-understanding topic page so that developers can more easily learn about it.
To associate your repository with the image-understanding topic, visit your repo's landing page and select "manage topics."