v0.1.3 Backward
What's Changed
- chore: fix env docs by @DefTruth in #132
- chore: Update README.md by @DefTruth in #133
- chore: fix typo by @DefTruth in #134
- chore: update docs by @DefTruth in #135
- chore: add
why not TMAsection by @DefTruth in #136 - backward: reuse sdpa bwd for ffpa bwd by @DefTruth in #137
- chore: refactor ffpa-attn codebase by @DefTruth in #138
- [1/N] feat: support split-d native bwd (still slower than sdpa bwd) by @DefTruth in #139
- chore: add backward examples to docs by @DefTruth in #140
- chore: update README by @DefTruth in #141
- chore: simplify ffpa-attn codebase by @DefTruth in #142
- chore: support bwd backend args for bwd examples by @DefTruth in #143
- bwd: add triton bwd v2 kernel & autotune by @DefTruth in #144
- bwd: fix autotune for ffpa bwd pre kernel by @DefTruth in #145
- fwd: support ffpa triton fwd kernel by @DefTruth in #146
- chore: update default config for triton kernel by @DefTruth in #147
- feat: support D_CHUNK for triton bwd pre kernel by @DefTruth in #148
- feat: support D<=256 by flash-attention by @DefTruth in #149
- chore: fix ffpa_attn_func param docs by @DefTruth in #150
Full Changelog: v0.1.2...v0.1.3