multimodal

Star

Here are 1,018 public repositories matching this topic...

Mintplex-Labs / anything-llm

Sponsor

Star

The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, and more.

Updated Mar 4, 2025
JavaScript

haotian-liu / LLaVA

Star

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

chatbot llama multimodal multi-modality gpt-4 foundation-models visual-language-learning chatgpt instruction-tuning vision-language-model llava llama2 llama-2

Updated Aug 12, 2024
Python

jina-ai / serve

Star

☁️ Build multimodal AI applications with cloud-native stack

Updated Feb 27, 2025
Python

microsoft / unilm

Star

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Updated Mar 4, 2025
Python

deepseek-ai / Janus

Star

Janus-Series: Unified Multimodal Understanding and Generation Models

multimodal unified-model any-to-any foundation-models llm vision-language-pretraining

Updated Feb 1, 2025
Python

NVIDIA / NeMo

Star

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

machine-translation tts speech-synthesis neural-networks deeplearning speaker-recognition asr multimodal speech-translation large-language-models speaker-diariazation generative-ai

Updated Mar 7, 2025
Python

mediar-ai / screenpipe

Star

AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording

machine-learning ai computer-vision ml agi vision agents multimodal llm

Updated Mar 7, 2025
TypeScript

rerun-io / rerun

Star

Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.

visualization python rust computer-vision cpp robotics multimodal

Updated Mar 7, 2025
Rust

bentoml / BentoML

Star

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering ai-inference llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated Mar 6, 2025
Python

AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.

ui beam agi openai gpt mistral multimodal groq openai-api gpt-4 large-language-models stable-diffusion generative-ai chatgpt chatgpt-ui gpt-5 anthropic

Updated Mar 6, 2025
TypeScript

modelscope / ms-swift

Star

Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).

agent deploy llama lora embedding liger peft multimodal sft distill rft llm internvl qwen2-vl qwen2-5 llama3-3 deepseek-r1 grpo open-r1

Updated Mar 7, 2025
Python

SkalskiP / courses

Sponsor

Star

This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)

nlp machine-learning natural-language-processing tutorial deep-neural-networks computer-vision deep-learning transformers generative-model multimodal mlops stable-diffusion

Updated Apr 22, 2024
Python

facebookresearch / mmf

Star

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

deep-learning dialog pytorch vqa pretrained-models captioning multimodal multi-tasking textvqa hateful-memes

Updated Mar 3, 2025
Python

swyxio / ai-notes

Star

notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.

ai openai gpt multimodal gpt-3 prompt-engineering stable-diffusion

Updated Feb 20, 2025
HTML

livekit / agents

Star

Build real-time multimodal AI applications 🤖🎙️📹

real-time video ai voice agents voice-assistant multimodal

Updated Mar 7, 2025
Python

TEN-framework / TEN-Agent

Star

TEN Agent is a conversational voice AI agent powered by TEN, integrating Deepseek, Gemini, OpenAI, RTC, and hardware like ESP32. It enables realtime AI capabilities like seeing, hearing, and speaking, and is fully compatible with platforms like Dify and Coze.

Updated Mar 7, 2025
Python

kyegomez / swarms

Sponsor

Star

The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework. Website: https://swarms.ai

Updated Mar 7, 2025
Python

kyegomez / tree-of-thoughts

Sponsor

Star

Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%

deep-learning prompt artificial-intelligence multimodal gpt4 prompt-learning prompt-tuning prompt-engineering chatgpt

Updated Oct 29, 2024
Python

luban-agi / Awesome-AIGC-Tutorials

Star

Curated tutorials and resources for Large Language Models, AI Painting, and more.

nlp awesome ai deep-learning tutorials multimodal courses-resource aigc llm midjourney prompt-engineering stable-diffusion chatgpt

Updated Mar 31, 2024

IDEA-CCNL / Fengshenbang-LM

Star

Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系，成为中文AIGC和认知智能的基础设施。

transformers pytorch chinese-nlp pretrained-models distributed-training multimodal aigc

Updated Aug 13, 2024
Python

Improve this page

Add a description, image, and links to the multimodal topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multimodal

Here are 1,018 public repositories matching this topic...

Mintplex-Labs / anything-llm

haotian-liu / LLaVA

jina-ai / serve

microsoft / unilm

deepseek-ai / Janus

NVIDIA / NeMo

mediar-ai / screenpipe

rerun-io / rerun

bentoml / BentoML

enricoros / big-AGI

modelscope / ms-swift

SkalskiP / courses

facebookresearch / mmf

swyxio / ai-notes

livekit / agents

TEN-framework / TEN-Agent

kyegomez / swarms

kyegomez / tree-of-thoughts

luban-agi / Awesome-AIGC-Tutorials

IDEA-CCNL / Fengshenbang-LM

Improve this page

Add this topic to your repo