forked from thu-ml/SageAttention
-
Notifications
You must be signed in to change notification settings - Fork 0
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
License
xlite-dev/SageAttention
ErrorLooks like something went wrong!
About
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- Cuda 62.2%
- Python 32.9%
- C++ 2.5%
- C 2.1%
- Shell 0.3%