A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
-
Updated
May 24, 2024 - Python
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Orchestrate Swarms of Agents From Any Framework Like OpenAI, Langchain, and Etc for Business Operation Automation. Join our Community: https://discord.gg/DbjBMJTSWD
autoupdate paper list
React component library for crafting user-friendly and engaging conversational experiences
notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.
Build real-time multimodal AI applications 🤖🎙️📹
Robust multimodal brain registration via keypoints
Data Infrastructure for Multimodal AI: Data, models, and orchestration in a unified declarative interface.
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Audio, Image, Video, Music and 3D content. 🔥
ALICE and its prior work. Paper and implementation of the Unity Package Voice2Action.
Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
Awesome-Biomolecule-Language-Cross-Modeling: a curated list of resources for paper "Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey"
HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
LLMs based multi-model framework for building AI apps.
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
Implementation for the different ML tasks on Kaggle platform with GPUs.
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
Add a description, image, and links to the multimodal topic page so that developers can more easily learn about it.
To associate your repository with the multimodal topic, visit your repo's landing page and select "manage topics."