Skip to content

Navigation Menu

xlite-dev

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

xlite-dev

Develop ML/AI toolkits and ML/AI/CUDA Learning resources.

26 followers
China
https://github.com/xlite-dev

Overview
Repositories 21
Projects
Packages
People 1

More

Overview
Repositories
Projects
Packages
People

README.md

🛠Creator @DefTruth | Main Contributor @wangzijian1010 | All Team Members📚

Pinned Loading

lite.ai.toolkit Public

🛠 A lite C++ toolkit that contains 100+ Awesome AI models (Stable-Diffusion, FaceFusion, YOLO series, Face/Object Detection, Seg, Matting, etc), support MNN, ORT and TensorRT. 🎉🎉

C++ 4k 737
Awesome-LLM-Inference Public

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, Prefix Cache, Chunked Prefill, PD Disaggregate, etc. 🎉🎉

Python 3.7k 264
CUDA-Learn-Notes Public

📚Modern CUDA Learn Notes with PyTorch: 200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe API (Achieve 98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 3.1k 328
statistic-learning-R-note Public

📒《统计学习方法-李航: 笔记-从原理到实现》200-page PDF Notes, with detailed explanations of various math formulas, implementations of many algorithms using the R language. 🎉

443 55
torchlm Public

💎A high level python lib for face landmarks detection: training, eval, export, inference(Python/C++) and 100+ data augmentations.

Python 255 24
ffpa-attn-mma Public

📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.

Cuda 157 7

Repositories

Loading

Type

Select type

All Public Sources Forks Archived Mirrors Templates

Language

Select language

All C++ Cuda Python TypeScript

Sort

Select order

Last updated Name Stars

Showing 10 of 21 repositories

Awesome-LLM-Inference Public
📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, Prefix Cache, Chunked Prefill, PD Disaggregate, etc. 🎉🎉

Python 3,746 GPL-3.0 264 0 0 Updated Mar 30, 2025
CUDA-Learn-Notes Public
📚Modern CUDA Learn Notes with PyTorch: 200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe API (Achieve 98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 3,074 GPL-3.0 328 6 0 Updated Mar 30, 2025
hgemm-tensorcores-mma Public
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.

Cuda 63 GPL-3.0 3 0 0 Updated Mar 30, 2025
.github Public

1 0 0 0 Updated Mar 30, 2025
lite.ai.toolkit Public
🛠 A lite C++ toolkit that contains 100+ Awesome AI models (Stable-Diffusion, FaceFusion, YOLO series, Face/Object Detection, Seg, Matting, etc), support MNN, ORT and TensorRT. 🎉🎉

C++ 3,998 GPL-3.0 737 0 0 Updated Mar 29, 2025
ffpa-attn-mma Public
📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.

Cuda 157 GPL-3.0 7 2 0 Updated Mar 25, 2025
Awesome-Diffusion-Inference Public
📖A curated list of Awesome Diffusion Inference Papers with codes: Sampling, Caching, Multi-GPUs, etc. 🎉🎉

201 GPL-3.0 13 0 0 Updated Mar 23, 2025
SageAttention Public Forked from thu-ml/SageAttention
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda 0 Apache-2.0 84 0 0 Updated Mar 23, 2025
flashinfer Public Forked from flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving

Cuda 0 Apache-2.0 264 0 0 Updated Mar 23, 2025
statistic-learning-R-note Public
📒《统计学习方法-李航: 笔记-从原理到实现》200-page PDF Notes, with detailed explanations of various math formulas, implementations of many algorithms using the R language. 🎉

443 GPL-3.0 55 2 0 Updated Feb 7, 2025

View all repositories

People

Top languages

C++ Cuda Python TypeScript

Most used topics

mnn onnxruntime tnn ncnn cpp

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.