
Starred repositories
Open-source LLMOps platform for hosting and scaling AI in your own infrastructure 🏓🦙
KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale
synthetic dataset generation workflow using local file resources for finetuning llms.
An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
Secure, high-performance AI infrastructure in Python.
Open Source LDAP Virtual Directory
High-performance, real-time optimized, and statically typed embedded language implemented in C.
Data transformation framework for AI. Ultra performant, with incremental processing.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
[Preprint] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.
Wassette: A security-oriented runtime that runs WebAssembly Components via MCP
Serverless LLM Serving for Everyone.
An AI agent development platform with all-in-one visual tools, simplifying agent creation, debugging, and deployment like never before. Coze your way to AI Agent creation.
Next-generation AI Agent Optimization Platform: Cozeloop addresses challenges in AI agent development by providing full-lifecycle management capabilities from development, debugging, and evaluation…
Whim is a simple and secure app for sharing secret messages anonymously. The messages are encrypted and are vanished after being read. No account required.
Single-file, pure CUDA C implementation for running inference on Qwen3 0.6B GGUF. No Dependencies.
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Lla…
Best practices for distilling large language models.
Hierarchical Reasoning Model Official Release
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
Fast caching software with a focus on low latency and cpu efficiency.
The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"