DefTruth

Follow

🎯

#pragma unroll

DefTruth DefTruth

🎯

#pragma unroll

Follow

📚CUDA | LLM | VLM | Diffusion | AI Infra

1.7k followers · 133 following

Statistics Department of JNU
Guangzhou, China
14:09 - 8h ahead
https://github.com/DefTruth
https://www.zhihu.com/people/qyjdef

Achievements

Achievements

DefTruth/README.md

Pinned Loading

lite.ai.toolkit Public

🛠 A lite C++ toolkit of 100+ Awesome AI models, support ORT, MNN, NCNN, TNN and TensorRT. 🎉🎉

C++ 4k 736
vllm-project/vllm Public

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 42.2k 6.4k
Awesome-LLM-Inference Public

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, MLA, Parallelism, Prefix-Cache, Chunked-Prefill, etc. 🎉🎉

3.7k 260
PaddlePaddle/FastDeploy Public

⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end…

C++ 3.1k 475
CUDA-Learn-Notes Public

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 2.9k 305
ffpa-attn-mma Public

📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.

Cuda 148 6