Stars
A Datacenter Scale Distributed Inference Serving Framework
Cost-efficient and pluggable Infrastructure components for GenAI inference
DeepEP: an efficient expert-parallel communication library
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
A highly optimized LLM inference acceleration engine for Llama and its variants.
Collective communications library with various primitives for multi-machine training.
Optimized primitives for collective multi-GPU communication
目前已囊括203个大模型,覆盖chatgpt、gpt-4o、o3-mini、谷歌gemini、Claude3.5、智谱GLM-Zero、文心一言、qwen-max、百川、讯飞星火、商汤senseChat、minimax等商用模型, 以及DeepSeek-R1、qwq-32b、deepseek-v3、qwen2.5、llama3.3、phi-4、glm4、gemma3、mistral、书生in…
Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour
Distributed Task Queue (development branch)
Simple, reliable, and efficient distributed task queue in Go
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…
a cluster solution for Janus WebRTC server, by API proxy approach
A high-throughput and memory-efficient inference and serving engine for LLMs
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
A generative speech model for daily dialogue.
使用selenium对Discuz建站的论坛发布资源进行爬取,自动评论获取隐藏内容,模拟滑动验证码拖动,转存飞猫云
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.