-
Huazhong University of Science and Technology
- Huazhong University of Science and Technology
-
01:32
- 8h ahead - https://www.hust.edu.cn/
Highlights
- Pro
Lists (5)
Sort Name ascending (A-Z)
Stars
10 Lessons to Get Started Building AI Agents
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
Utilities intended for use with Llama models.
SGLang is a fast serving framework for large language models and vision language models.
High-speed Large Language Model Serving for Local Deployment
Unified KV Cache Compression Methods for Auto-Regressive Models
Running large language models on a single GPU for throughput-oriented scenarios.
纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行
A self-learning tutorail for CUDA High Performance Programing.
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
how to optimize some algorithm in cuda.
Transformer: PyTorch Implementation of "Attention Is All You Need"
A PyTorch native library for large model training
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
CPU inference for the DeepSeek family of large language models in pure C++
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
Sky-T1: Train your own O1 preview model within $450
My learning notes/codes for ML SYS.