Skip to content

Releases: vhrabar/Morphottention

Alpha Release

Choose a tag to compare

@vhrabar vhrabar released this 01 Jul 11:20
90630b4

Morphottention v0.2.0 (Alpha)

Added

  • Backward pass: MorphoAttention is now fully differentiable. A fused CUDA backward kernel computes gradients for all inputs (x, W_phi, gate_q, gate_k, W_V), wired into autograd via MorphoAttentionFunction.
    • Backward pass signature and autograd wiring.
    • Shared-memory carve-out and data load/store paths for the backward kernel.
    • Central K/V loop ported from the forward pass with on-the-fly LSE recompute (no saved attention matrix).
    • Backward Phi projection GEMM.
    • Stage-1 and stage-2 gradient computation with the full SMEM carve and scratch layout.
  • Matmul kernels: runtime-dynamic (RT-dyn) matmul and transpose support for frag_a, backing the backward GEMMs.
  • Packaging: prebuilt wheels for additional CPython versions (3.12–3.14) and a build/release workflow.

Pre-Alpha Release

Pre-Alpha Release Pre-release
Pre-release

Choose a tag to compare

@vhrabar vhrabar released this 29 Jun 14:43
5ff3582

Morphottention v0.1.0

First public release: Mathematical Morphology-based self-attention for PyTorch, built around a Flash-style fused CUDA kernel

Pre-Alpha. The forward pass is implemented and usable for inference; the backward pass has not yetbeen implemented.

Highlights

  • Fused forward kernel : morphological hypercube attention computed in a single Flash-style streaming pass (online softmax, no materialized score matrix).
  • MorphoAttention nn.Module : drop-in attention layer with learnable projection, per-head gates, and value projection.
  • Functional & autograd API : morpho_attention(...) and MorphoAttentionFunction for lower-level use.
  • GPU support : kernels compiled for sm_90, sm_100, and sm_120 (Hopper / Blackwell).