DefTruth

Follow

🎯

#pragma unroll

DefTruth DefTruth

🎯

#pragma unroll

Follow

@xlite-dev, @vipshop, LeetCUDA.

1.8k followers · 155 following

@xlite-dev, @vipshop
Guangzhou, China
20:29 - 8h ahead
https://github.com/xlite-dev

Achievements

Achievements

Organizations

DefTruth/README.md

🏢 Group: Owner. @xlite-dev | @vipshop | Prev. @PaddlePaddle 🏰

🛠 Creator: lite.ai.toolkit | Awesome-LLM-Inference | LeetCUDA | ffpa-attn 🎧

🖥 HGEMM | 🤗cache-dit | Awesome-DiT-Inference | torchlm 🖱

🎉 Contributor: FastDeploy | vLLM | SGLang | Many Others ⚙️

✉️ Contact: qyjdef@163.com | GitHub: DefTruth | 知乎: DefTruth 🤖

♥️ I love open source, bro, and I think you do too. ♥️

Pinned Loading

xlite-dev/LeetCUDA Public

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA.

Cuda 4.9k 532
xlite-dev/lite.ai.toolkit Public

🛠 A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.🎉

C++ 4.1k 745
xlite-dev/Awesome-LLM-Inference Public

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.

Python 4.2k 287
vllm-project/vllm Public

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 50.6k 8.3k
xlite-dev/ffpa-attn Public

⚡️FFPA: Extend FlashAttention-2 with Split-D, achieve ~O(1) SRAM complexity for large headdim, 1.8x~3x↑ vs SDPA.

Cuda 186 8
vipshop/cache-dit Public

🤗CacheDiT: A Training-free and Easy-to-use Cache Acceleration Toolbox for Diffusion Transformers🔥

Python 61 2