Skip to content
@xlite-dev

xlite-dev

Develop ML/AI toolkits and ML/AI/CUDA Learning resources.

Pinned Loading

  1. LeetCUDA LeetCUDA Public

    📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA.

    Cuda 4.8k 524

  2. lite.ai.toolkit lite.ai.toolkit Public

    🛠 A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.🎉

    C++ 4.1k 745

  3. Awesome-LLM-Inference Awesome-LLM-Inference Public

    📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.

    Python 4.1k 286

  4. Awesome-DiT-Inference Awesome-DiT-Inference Public

    📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Caching, Quantization, Parallelism, etc.

    Python 272 16

  5. torchlm torchlm Public

    💎An easy-to-use PyTorch library for face landmarks detection: training, evaluation, inference, and 100+ data augmentations.

    Python 259 24

  6. ffpa-attn ffpa-attn Public

    ⚡️FFPA: Extend FlashAttention-2 with Split-D, achieve ~O(1) SRAM complexity for large headdim, 1.8x~3x↑ vs SDPA.

    Cuda 186 8

Repositories

Showing 10 of 25 repositories
  • Awesome-DiT-Inference Public

    📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Caching, Quantization, Parallelism, etc.

    xlite-dev/Awesome-DiT-Inference’s past year of commit activity
    Python 272 GPL-3.0 16 0 0 Updated Jun 22, 2025
  • LeetCUDA Public

    📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA.

    xlite-dev/LeetCUDA’s past year of commit activity
    Cuda 4,833 GPL-3.0 524 4 0 Updated Jun 21, 2025
  • cache-dit Public Forked from vipshop/cache-dit

    🤗CacheDiT: A Training-free and Easy-to-use Cache Acceleration Toolbox for Diffusion Transformers

    xlite-dev/cache-dit’s past year of commit activity
    Python 2 2 0 0 Updated Jun 21, 2025
  • Awesome-LLM-Inference Public

    📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.

    xlite-dev/Awesome-LLM-Inference’s past year of commit activity
    Python 4,139 GPL-3.0 286 0 0 Updated Jun 20, 2025
  • lite.ai.toolkit Public

    🛠 A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.🎉

    xlite-dev/lite.ai.toolkit’s past year of commit activity
    C++ 4,134 GPL-3.0 745 0 0 Updated Jun 20, 2025
  • .github Public
    xlite-dev/.github’s past year of commit activity
    1 0 0 0 Updated Jun 20, 2025
  • torchlm Public

    💎An easy-to-use PyTorch library for face landmarks detection: training, evaluation, inference, and 100+ data augmentations.

    xlite-dev/torchlm’s past year of commit activity
    Python 259 MIT 24 15 0 Updated Jun 17, 2025
  • SpargeAttn Public Forked from thu-ml/SpargeAttn

    SpargeAttention: A training-free sparse attention that can accelerate any model inference.

    xlite-dev/SpargeAttn’s past year of commit activity
    Cuda 6 Apache-2.0 44 0 0 Updated May 24, 2025
  • SageAttention Public Forked from thu-ml/SageAttention

    Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

    xlite-dev/SageAttention’s past year of commit activity
    Cuda 0 Apache-2.0 132 0 0 Updated May 24, 2025
  • lihang-notes Public

    📚《统计学习方法-李航: 笔记》 200页PDF,公式细节讲解🎉

    xlite-dev/lihang-notes’s past year of commit activity
    Shell 465 GPL-3.0 57 2 0 Updated May 17, 2025

Most used topics

Loading…