Skip to content

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

License

Notifications You must be signed in to change notification settings

xlite-dev/SageAttention

Error
Looks like something went wrong!

About

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Cuda 62.2%
  • Python 32.9%
  • C++ 2.5%
  • C 2.1%
  • Shell 0.3%