Skip to content
@xlite-dev

xlite-dev

Develop ML/AI toolkits and ML/AI/CUDA Learning resources.

Pinned Loading

  1. lite.ai.toolkit Public

    🛠 A lite C++ toolkit that contains 100+ Awesome AI models (Stable-Diffusion, FaceFusion, YOLO series, Face/Object Detection, Seg, Matting, etc), support MNN, ORT and TensorRT. 🎉🎉

    C++ 4k 737

  2. Awesome-LLM-Inference Public

    📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, Prefix Cache, Chunked Prefill, PD Disaggregate, etc. 🎉🎉

    Python 3.7k 264

  3. CUDA-Learn-Notes Public

    📚Modern CUDA Learn Notes with PyTorch: 200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe API (Achieve 98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

    Cuda 3.1k 328

  4. statistic-learning-R-note Public

    📒《统计学习方法-李航: 笔记-从原理到实现》200-page PDF Notes, with detailed explanations of various math formulas, implementations of many algorithms using the R language. 🎉

    443 55

  5. torchlm Public

    💎A high level python lib for face landmarks detection: training, eval, export, inference(Python/C++) and 100+ data augmentations.

    Python 255 24

  6. ffpa-attn-mma Public

    📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.

    Cuda 157 7

Repositories

Showing 10 of 21 repositories
  • Awesome-LLM-Inference Public

    📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, Prefix Cache, Chunked Prefill, PD Disaggregate, etc. 🎉🎉

    Python 3,746 GPL-3.0 264 0 0 Updated Mar 30, 2025
  • CUDA-Learn-Notes Public

    📚Modern CUDA Learn Notes with PyTorch: 200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe API (Achieve 98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

    Cuda 3,074 GPL-3.0 328 6 0 Updated Mar 30, 2025
  • hgemm-tensorcores-mma Public

    ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.

    Cuda 63 GPL-3.0 3 0 0 Updated Mar 30, 2025
  • .github Public
    1 0 0 0 Updated Mar 30, 2025
  • lite.ai.toolkit Public

    🛠 A lite C++ toolkit that contains 100+ Awesome AI models (Stable-Diffusion, FaceFusion, YOLO series, Face/Object Detection, Seg, Matting, etc), support MNN, ORT and TensorRT. 🎉🎉

    C++ 3,998 GPL-3.0 737 0 0 Updated Mar 29, 2025
  • ffpa-attn-mma Public

    📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.

    Cuda 157 GPL-3.0 7 2 0 Updated Mar 25, 2025
  • Awesome-Diffusion-Inference Public

    📖A curated list of Awesome Diffusion Inference Papers with codes: Sampling, Caching, Multi-GPUs, etc. 🎉🎉

    201 GPL-3.0 13 0 0 Updated Mar 23, 2025
  • SageAttention Public Forked from thu-ml/SageAttention

    Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

    Cuda 0 Apache-2.0 84 0 0 Updated Mar 23, 2025
  • flashinfer Public Forked from flashinfer-ai/flashinfer

    FlashInfer: Kernel Library for LLM Serving

    Cuda 0 Apache-2.0 264 0 0 Updated Mar 23, 2025
  • statistic-learning-R-note Public

    📒《统计学习方法-李航: 笔记-从原理到实现》200-page PDF Notes, with detailed explanations of various math formulas, implementations of many algorithms using the R language. 🎉

    443 GPL-3.0 55 2 0 Updated Feb 7, 2025