v0.1.11 2026-05-19
What's Changed
- chore: Update README.md by @DefTruth in #186
- chore: fix online docs render broken by @DefTruth in #187
- feat: specialized fwd/bwd cutedsl kernel for d512 by @Butterfingrz in #188
- chore: add 'why not ctas=2' for triton kernels by @DefTruth in #195
- Fix cutedsl bwd compile crash with quack-kernels 0.4.x by @Butterfingrz in #196
- chore: fix workflows broken by @DefTruth in #197
- test: consolidate cutedsl non-aligned forward tests with SDPA oracle by @Butterfingrz in #198
- kernel: support attn_mask & dropout for cuda fwd by @DefTruth in #199
- autotune: reduce max autotune configs by @DefTruth in #201
- feat: fp16/fp32 buffer and persist kernel for dk/dv by @DefTruth in #202
- chore: unified triton sm80 fwd kernel name by @DefTruth in #203
- chore: Update README.md by @DefTruth in #204
New Contributors
- @Butterfingrz made their first contribution in #188
Full Changelog: v0.1.10...v0.1.11