xlite-dev

lite.ai.toolkit Public

🛠 A lite C++ AI toolkit: contains 100+ Awesome AI models, support MNN, NCNN, TNN, ONNXRuntime and TensorRT. 🎉🎉

Awesome-LLM-Inference Public

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, etc. 🎉🎉

CUDA-Learn-Notes Public

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 3k 324

statistic-learning-R-note Public

📒200-page PDF Notes for "Statistical Learning Methods-Li Hang", detailed explanations of various math formulas, implemented in R.🎉

torchlm Public

💎A high level python lib for face landmarks detection: training, eval, export, inference(Python/C++) and 100+ data augmentations.

Python 255 24

ffpa-attn-mma Public

📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.

Cuda 157 7

Provide feedback