A simple "Be My Eyes" web app with a llama.cpp/llava backend
-
Updated
Nov 28, 2023 - JavaScript
A simple "Be My Eyes" web app with a llama.cpp/llava backend
[CVPR2021] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events
Sample skill which demonstrates the new Alexa Presentation Language (APL). The multi modal skill functionality is same as Alexa Fact Skill template it will select a fact at random and tell it to the user when the multi modal skill is invoked and is compatible with devices having display.
Build and explore multimodal web interactives with pieces of paper!
Amazon Alexa Skill - "Alexa, ask Fork On The Road"
How you can add semantic search to your applications. This sample shows how you can use a multimodal model to find images which are semantically similar to some text. New blog coming out soon.
Three-level multimodal emotion recognition framework to detect emotions combining different inputs with different formats.
A Vision Assistance Multimodal Application build on top of google gemini vision pro.
Turn yourself into a Halloween-styled character and get an original roast with the power of AI.
Web-Based Exercise Posture Evaluation and AI Voice Feedback System
[ICCV2021 Workshop] Multi-Modal Video Reasoning and Analyzing Competition
🧠 | Multimodal Integration of Oncology Data System
Our project enhances Trulens analytics through two key initiatives: developing an interactive visual node for integration in Jupyter notebooks, and creating a comprehensive RAG framework for Trulens documentation. These efforts aim to simplify and enrich the user experience with Trulens, making advanced data analysis more accessible and intuitive.
TerraWatch is a proof of concept system developed during the TUM AI Hackathon 2024 to detect deforestation from satellite images and reason out the causes and potential environmental effects using computer vision models and multimodal large language models.
This is a simple application that generates scripts for the user to read. Based on the audio, the application would provide a score for their pronunciation and suggest possible methods to improve it.
Add a description, image, and links to the multimodal topic page so that developers can more easily learn about it.
To associate your repository with the multimodal topic, visit your repo's landing page and select "manage topics."