Mathematical Morphology-based self-attention module for PyTorch (CUDA) using Flash-style kernel fusion.
This is a uv workspace:
packages/morphottention- the published kernel package (README).attn-bench- benchmarks and dataset harnesses.
Prebuilt wheels (CPython 3.12–3.14; Linux x86_64/aarch64, Windows x86_64) require a
CUDA-enabled torch >= 2.12 already installed:
pip install morphottentionDrop-in self-attention module. Inputs must be CUDA tensors; the module defaults to float16.
import torch
from morphottention import MorphoAttention
attn = MorphoAttention(
dim=256, # model dimension D
num_heads=8, # number of attention heads H
cube_m=16, # hypercube width per head
scale=1.0, # softmax temperature
causal=False, # casual masking flag
device="cuda"
)
x = torch.randn(2, 128, 256, dtype=torch.float16, device="cuda") # (B, N, D)
out = attn(x) # (B, N, D)
out.sum().backward()Set up the workspace:
uv syncBuilding the CUDA extension from source needs the CUDA 13.X toolkit (nvcc):
uv sync --package morphottention --no-dev --group build
uv build --package morphottention --wheel --no-build-isolationReleased under the MIT License. See LICENSE.
Copyright © 2026 Vedran Hrabar