multimodal

Here are 668 public repositories matching this topic...

NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

machine-translation tts speech-synthesis neural-networks deeplearning speaker-recognition asr multimodal speech-translation large-language-models speaker-diariazation generative-ai

Updated May 24, 2024
Python

kyegomez / swarms

Sponsor

Star

Orchestrate Swarms of Agents From Any Framework Like OpenAI, Langchain, and Etc for Business Operation Automation. Join our Community: https://discord.gg/DbjBMJTSWD

Updated May 24, 2024
Python

isLinXu / paper-list

Star

autoupdate paper list

reinforcement-learning classification image-generation object-detection transfer-learning optical-flow object-tracking semantic-segmentation action-recognition audio-processing pose-estimation depth-estimation anomaly-detection multimodal scene-understanding graph-neural-networks llm

Updated May 24, 2024
Python

rustic-ai / ui-components

Star

React component library for crafting user-friendly and engaging conversational experiences

chat ai reactjs mui reactjs-components conversational-ai multimodal

Updated May 23, 2024
TypeScript

notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.

ai openai gpt multimodal gpt-3 prompt-engineering stable-diffusion

Updated May 23, 2024
HTML

livekit / agents

Star

Build real-time multimodal AI applications 🤖🎙️📹

real-time video ai voice agents voice-assistant multimodal

Updated May 23, 2024
Python

alanqrwang / keymorph

Star

Robust multimodal brain registration via keypoints

deep-learning neural-network pytorch affine registration robust keypoints brain interpretability multimodal

Updated May 23, 2024
Python

pixeltable / pixeltable

Star

Data Infrastructure for Multimodal AI: Data, models, and orchestration in a unified declarative interface.

data-science machine-learning database ai computer-vision chatbot ml artificial-intelligence multimodal vector-database llm genai

Updated May 23, 2024
Python

Yangyi-Chen / Multimodal-AND-Large-Language-Models

Star

Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.

machine-learning multimodal large-language-models general-purpose-model

Updated May 23, 2024

Yuan-ManX / ai-multimodal-timeline

Star

Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Audio, Image, Video, Music and 3D content. 🔥

ai multi-modal deeplearning-ai multimodal multimodal-deep-learning llm

Updated May 23, 2024

yang-su2000 / Voice2Action

Star

ALICE and its prior work. Paper and implementation of the Unity Package Voice2Action.

nlp ai unity virtual-reality multimodal unity-ml unity-package llms large-language-model llm-agent

Updated May 23, 2024
C#

rerun-io / rerun

Star

Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.

visualization python rust computer-vision cpp robotics multimodal

Updated May 23, 2024
Rust

QizhiPei / Awesome-Biomolecule-Language-Cross-Modeling

Star

Awesome-Biomolecule-Language-Cross-Modeling: a curated list of resources for paper "Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey"

natural-language-processing bioinformatics biomolecule multimodal

Updated May 23, 2024

InternLM / HuixiangDou

Star

HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance

application ocr robot pipeline dsl chatbot wechat assistance lark multimodal rag llm

Updated May 23, 2024
Python

microsoft / unilm

Star

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Updated May 23, 2024
Python

parvbhullar / superpilot

Star

LLMs based multi-model framework for building AI apps.

ai prompt-toolkit ai-assistants ai-agents multimodal gpt-3 gpt4 midjourney stable-diffusion langchain llmops autogpt llama2 ai-agents-framework

Updated May 23, 2024
Python

PaddlePaddle / PaddleMIX

Star

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.