Releases: vhrabar/Morphottention
Releases · vhrabar/Morphottention
Release list
Alpha Release
Morphottention v0.2.0 (Alpha)
Added
- Backward pass:
MorphoAttentionis now fully differentiable. A fused CUDA backward kernel computes gradients for all inputs (x,W_phi,gate_q,gate_k,W_V), wired into autograd viaMorphoAttentionFunction.- Backward pass signature and autograd wiring.
- Shared-memory carve-out and data load/store paths for the backward kernel.
- Central K/V loop ported from the forward pass with on-the-fly LSE recompute (no saved attention matrix).
- Backward
Phiprojection GEMM. - Stage-1 and stage-2 gradient computation with the full SMEM carve and scratch layout.
- Matmul kernels: runtime-dynamic (
RT-dyn) matmul and transpose support forfrag_a, backing the backward GEMMs. - Packaging: prebuilt wheels for additional CPython versions (3.12–3.14) and a build/release workflow.
Pre-Alpha Release
Morphottention v0.1.0
First public release: Mathematical Morphology-based self-attention for PyTorch, built around a Flash-style fused CUDA kernel
Pre-Alpha. The forward pass is implemented and usable for inference; the backward pass has not yetbeen implemented.
Highlights
- Fused forward kernel : morphological hypercube attention computed in a single Flash-style streaming pass (online softmax, no materialized score matrix).
MorphoAttentionnn.Module: drop-in attention layer with learnable projection, per-head gates, and value projection.- Functional & autograd API :
morpho_attention(...)andMorphoAttentionFunctionfor lower-level use. - GPU support : kernels compiled for
sm_90,sm_100, andsm_120(Hopper / Blackwell).