GPT-4V in Wonderland: LMMs as Smartphone Agents
-
Updated
Jul 17, 2024 - Python
GPT-4V in Wonderland: LMMs as Smartphone Agents
Monitor the performance of OpenAI's GPT-4V model over time.
Vision utilities for web interaction agents 👀
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
Control Any Computer Using LLMs
Multi-Modal Multi-Embodied Hivemind-like Iteration of RTX-2
Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta
Vision-Assisted Camera Orientation
Discover the GPT-4o multimodal model at Microsoft Build 2024, now with text and image capabilities. My prototype enhances chats with real-time camera snapshots, powered by Flask, OpenCV, and Azure’s OpenAI Services. It’s interactive, visual, and simple to use. Give it a try!
中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine
AI Voiceover with GPT4V
The ultimate sketch to code app made using GPT4 vision. Choose your desired framework (React, Next, React Native, Flutter) for your app. It will instantly generate code and preview (sandbox) from a simple hand drawn sketch on paper captured from webcam
Explore the rich flavors of Indian desserts with TunedLlavaDelights. Utilizing the in Llava fine-tuning, our project unveils detailed nutritional profiles, taste notes, and optimal consumption times for beloved sweets. Dive into a fusion of AI innovation and culinary tradition
Convert different model APIs into the OpenAI API format out of the box.
Add a description, image, and links to the gpt4v topic page so that developers can more easily learn about it.
To associate your repository with the gpt4v topic, visit your repo's landing page and select "manage topics."