Skip to content
@xlite-dev

xlite-dev

Develop ML/AI toolkits and ML/AI/CUDA Learning resources.

Pinned Loading

  1. lite.ai.toolkit Public

    🛠 A lite C++ AI toolkit: contains 100+ Awesome AI models, support MNN, NCNN, TNN, ONNXRuntime and TensorRT. 🎉🎉

    C++ 4k 737

  2. Awesome-LLM-Inference Public

    📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, etc. 🎉🎉

    3.7k 263

  3. CUDA-Learn-Notes Public

    📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

    Cuda 3k 324

  4. statistic-learning-R-note Public

    📒200-page PDF Notes for "Statistical Learning Methods-Li Hang", detailed explanations of various math formulas, implemented in R.🎉

    443 55

  5. torchlm Public

    💎A high level python lib for face landmarks detection: training, eval, export, inference(Python/C++) and 100+ data augmentations.

    Python 255 24

  6. ffpa-attn-mma Public

    📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.

    Cuda 157 7

Repositories

Showing 10 of 19 repositories
  • CUDA-Learn-Notes Public

    📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

    Cuda 3,049 GPL-3.0 324 6 0 Updated Mar 27, 2025
  • hgemm-mma Public

    ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.

    Cuda 63 GPL-3.0 3 0 0 Updated Mar 25, 2025
  • ffpa-attn-mma Public

    📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.

    Cuda 157 GPL-3.0 7 4 0 Updated Mar 25, 2025
  • Awesome-LLM-Inference Public

    📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, etc. 🎉🎉

    3,730 GPL-3.0 263 0 0 Updated Mar 25, 2025
  • lite.ai.toolkit Public

    🛠 A lite C++ AI toolkit: contains 100+ Awesome AI models, support MNN, NCNN, TNN, ONNXRuntime and TensorRT. 🎉🎉

    C++ 3,994 GPL-3.0 737 0 1 Updated Mar 25, 2025
  • Awesome-Diffusion-Inference Public

    📖A curated list of Awesome Diffusion Inference Papers with codes: Sampling, Caching, Multi-GPUs, etc. 🎉🎉

    200 GPL-3.0 13 0 0 Updated Mar 23, 2025
  • SageAttention Public Forked from thu-ml/SageAttention

    Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

    Cuda 0 Apache-2.0 83 0 0 Updated Mar 23, 2025
  • flashinfer Public Forked from flashinfer-ai/flashinfer

    FlashInfer: Kernel Library for LLM Serving

    Cuda 0 Apache-2.0 263 0 0 Updated Mar 23, 2025
  • statistic-learning-R-note Public

    📒200-page PDF Notes for "Statistical Learning Methods-Li Hang", detailed explanations of various math formulas, implemented in R.🎉

    443 GPL-3.0 55 2 0 Updated Feb 7, 2025
  • torchlm Public

    💎A high level python lib for face landmarks detection: training, eval, export, inference(Python/C++) and 100+ data augmentations.

    Python 255 MIT 24 14 0 Updated Feb 7, 2025