Memory-bounded compressed sparse attention via streaming top-k. Triton kernels for the DeepSeek-V4 lightning indexer. 32x regime extension on a single H200 | by RightNow https://www.rightnowai.co/
cuda triton attention sparse-attention long-context deepseek deepseek-v4 compressed-sparse-attention lightning-indexer
-
Updated
May 5, 2026 - Python