Skip to content
View pprp's full-sized avatar
:octocat:
Be open
:octocat:
Be open

Block or report pprp

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 3,032 318 Updated Mar 27, 2025

A repository aimed at pruning DeepSeek V3, R1 and R1-zero to a usable size

Python 36 3 Updated Mar 23, 2025

XAttention: Block Sparse Attention with Antidiagonal Scoring

Python 116 2 Updated Mar 26, 2025

Build effective agents using Model Context Protocol and simple workflow patterns

Python 2,038 170 Updated Mar 26, 2025

ZO2 (Zeroth-Order Offloading): Full Parameter Fine-Tuning 175B LLMs with 18GB GPU Memory

Python 69 6 Updated Mar 20, 2025

A Datacenter Scale Distributed Inference Serving Framework

Rust 3,294 221 Updated Mar 27, 2025

😎 A curated list of tensor decomposition resources for model compression.

55 5 Updated Mar 27, 2025

Academic Survey Paper Generation.

TeX 791 66 Updated Mar 19, 2025

YaRN: Efficient Context Window Extension of Large Language Models

Python 1,453 120 Updated Apr 17, 2024

An open-source solution for full parameter fine-tuning of DeepSeek-V3/R1 671B, including complete code and scripts from training to inference, as well as some practical experiences and conclusions.…

Python 551 69 Updated Mar 13, 2025

LLM KV cache compression made easy

Python 442 31 Updated Mar 19, 2025

[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training

Python 169 9 Updated Mar 27, 2025

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 808 50 Updated Mar 19, 2025

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 2,305 388 Updated Mar 27, 2025

🏠 将小爱音箱接入 ChatGPT 和豆包,改造成你的专属语音助手。

TypeScript 10,503 1,327 Updated Mar 21, 2025

Systems for GenAI

123 8 Updated Mar 8, 2025

A series of math-specific large language models of our Qwen2 series.

Python 875 121 Updated Jan 11, 2025

A lightweight data processing framework built on DuckDB and 3FS.

Python 4,422 386 Updated Mar 5, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 8,362 807 Updated Mar 27, 2025

Expert Parallelism Load Balancer

Python 1,107 174 Updated Mar 24, 2025

Analyze computation-communication overlap in V3/R1.

967 127 Updated Mar 21, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,670 281 Updated Mar 10, 2025

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 111 33 Updated Jan 2, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,097 531 Updated Mar 26, 2025

A pandoc LaTeX template to convert markdown files to PDF or LaTeX.

Shell 6,470 977 Updated Jan 18, 2025

A pandoc LaTeX template to convert markdown files to PDF or LaTeX.

TeX 44 3 Updated Dec 4, 2020

D^2-MoE: Delta Decompression for MoE-based LLMs Compression

Python 36 3 Updated Mar 25, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 7,317 677 Updated Mar 27, 2025

UltraScale Playbook 中文版

Python 29 3 Updated Mar 15, 2025

FlashMLA: Efficient MLA decoding kernels

C++ 11,379 810 Updated Mar 1, 2025
Next
Showing results