pprp

Follow

Be open

Peijie(Peyton) Dong pprp

Be open

Follow

Embrace changes.

702 followers · 502 following

Data Science and Analytic Thrust, Information Hub, HKUST(GZ)
GuangZhou
https://www.zhihu.com/people/peijieDong
https://pprp.github.io
https://scholar.google.com/citations?user=TqS6s4gAAAAJ

Achievements

Achievements

Lists (32)

Sort

Attention

40 repositories

C++

CSBasic

DataAug

Dataset

diffusion

Distill

40 repositories

GPT

45 repositories

🗡️ Graph

Graph Structure Learning;

👹incremental

incremental learning

📥 interest

102 repositories

KAN

⭐ life

lightweight

👍 Meta

MLP

NAS

192 repositories

Object Detection

optimization

21 repositories

PEFT

🌟 Prune

quant

12 repositories

sparse_training

SPP

16 repositories

SSL

10 repositories

SSM

symbol

template

TestTimeAdaptation

utils

14 repositories

VIT

75 repositories

数字人

Starred repositories

xlite-dev / CUDA-Learn-Notes

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 3,032 318 Updated Mar 27, 2025

gabrielolympie / moe-pruner

A repository aimed at pruning DeepSeek V3, R1 and R1-zero to a usable size

Python 36 3 Updated Mar 23, 2025

mit-han-lab / x-attention

XAttention: Block Sparse Attention with Antidiagonal Scoring

Python 116 2 Updated Mar 26, 2025

lastmile-ai / mcp-agent

Build effective agents using Model Context Protocol and simple workflow patterns

Python 2,038 170 Updated Mar 26, 2025

liangyuwang / zo2

ZO2 (Zeroth-Order Offloading): Full Parameter Fine-Tuning 175B LLMs with 18GB GPU Memory

Python 69 6 Updated Mar 20, 2025

ai-dynamo / dynamo

A Datacenter Scale Distributed Inference Serving Framework

Rust 3,294 221 Updated Mar 27, 2025

pvti / Awesome-Tensor-Decomposition

😎 A curated list of tensor decomposition resources for model compression.

55 5 Updated Mar 27, 2025

IAAR-Shanghai / SurveyX

Academic Survey Paper Generation.

TeX 791 66 Updated Mar 19, 2025

jquesnelle / yarn

YaRN: Efficient Context Window Extension of Large Language Models

Python 1,453 120 Updated Apr 17, 2024

ScienceOne-AI / DeepSeek-671B-SFT-Guide

An open-source solution for full parameter fine-tuning of DeepSeek-V3/R1 671B, including complete code and scripts from training to inference, as well as some practical experiences and conclusions.…

Python 551 69 Updated Mar 13, 2025

NVIDIA / kvpress

LLM KV cache compression made easy

Python 442 31 Updated Mar 19, 2025

NVlabs / COAT

[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training

Python 169 9 Updated Mar 27, 2025

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 808 50 Updated Mar 19, 2025

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 2,305 388 Updated Mar 27, 2025

idootop / mi-gpt

🏠 将小爱音箱接入 ChatGPT 和豆包，改造成你的专属语音助手。

TypeScript 10,503 1,327 Updated Mar 21, 2025

fanlai0990 / CS598

Systems for GenAI

123 8 Updated Mar 8, 2025

QwenLM / Qwen2.5-Math

A series of math-specific large language models of our Qwen2 series.

Python 875 121 Updated Jan 11, 2025

deepseek-ai / smallpond

A lightweight data processing framework built on DuckDB and 3FS.

Python 4,422 386 Updated Mar 5, 2025

deepseek-ai / 3FS

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 8,362 807 Updated Mar 27, 2025

deepseek-ai / EPLB

Expert Parallelism Load Balancer

Python 1,107 174 Updated Mar 24, 2025

deepseek-ai / profile-data

Analyze computation-communication overlap in V3/R1.

967 127 Updated Mar 21, 2025

deepseek-ai / DualPipe

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,670 281 Updated Mar 10, 2025

fanshiqing / grouped_gemm

Forked from tgale96/grouped_gemm

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 111 33 Updated Jan 2, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,097 531 Updated Mar 26, 2025

Wandmalfarbe / pandoc-latex-template

A pandoc LaTeX template to convert markdown files to PDF or LaTeX.

Shell 6,470 977 Updated Jan 18, 2025

enhuiz / eisvogel

Forked from Wandmalfarbe/pandoc-latex-template

A pandoc LaTeX template to convert markdown files to PDF or LaTeX.

TeX 44 3 Updated Dec 4, 2020

lliai / D2MoE

D^2-MoE: Delta Decompression for MoE-based LLMs Compression

Python 36 3 Updated Mar 25, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 7,317 677 Updated Mar 27, 2025

pprp / ultrascale-playbook-zh

UltraScale Playbook 中文版

Python 29 3 Updated Mar 15, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient MLA decoding kernels

C++ 11,379 810 Updated Mar 1, 2025

Starred topics

symbolic-regression

Emoji

Code quality

loss-landscape

scikit-learn

Deep learning

Awesome Lists

Python

Markdown

Machine learning

See all starred topics