Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"
-
Updated
Mar 12, 2025 - Python
Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"
🦙 echoOLlama: A real-time voice AI platform powered by local LLMs. Features WebSocket streaming, voice interactions, and OpenAI API compatibility. Built with FastAPI, Redis, and PostgreSQL. Perfect for private AI conversations and custom voice assistants.
Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖
A multimodal image search engine built on the GME model, capable of handling diverse input types. Whether you're querying with text, images, or both, provides powerful and flexible image retrieval under arbitrary inputs. Perfect for research and demos.
🎉 The code repository for "Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning" in PyTorch.
This repository showcases a collection of innovative projects by Charan H U, focusing on cutting-edge technologies such as facial emotion recognition, fitness tracking, and multi-model applications. Each project demonstrates practical implementations of advanced AI/ML techniques, making it a valuable resource for developers and researchers.
AI multi-model using RAG and Langchain
Evaluating ‘Graphical Perception’ with Multimodal Large Language Models
This repo contains integration of LangChain with Google Gemini LLM
Using MAIRA-2 multimodal transformer designed for the generation of grounded or non-grounded radiology reports from chest X-rays.
Create a tool that uses a multimodal LLM to describe testing instructions for any digital product's features, based on the screenshots.
Add a description, image, and links to the multimodel-large-language-model topic page so that developers can more easily learn about it.
To associate your repository with the multimodel-large-language-model topic, visit your repo's landing page and select "manage topics."