Alpha Release

Latest

Latest

vhrabar released this 01 Jul 11:20

90630b4

Morphottention v0.2.0 (Alpha)

Added

Backward pass: MorphoAttention is now fully differentiable. A fused CUDA backward kernel computes gradients for all inputs (x, W_phi, gate_q, gate_k, W_V), wired into autograd via MorphoAttentionFunction.
- Backward pass signature and autograd wiring.
- Shared-memory carve-out and data load/store paths for the backward kernel.
- Central K/V loop ported from the forward pass with on-the-fly LSE recompute (no saved attention matrix).
- Backward Phi projection GEMM.
- Stage-1 and stage-2 gradient computation with the full SMEM carve and scratch layout.
Matmul kernels: runtime-dynamic (RT-dyn) matmul and transpose support for frag_a, backing the backward GEMMs.
Packaging: prebuilt wheels for additional CPython versions (3.12–3.14) and a build/release workflow.

Assets 12