Fused Triton kernels for Transformer inference: RMSNorm+RoPE, Gated MLP, and FP8 GEMM.
-
Updated
Apr 29, 2026 - Python
Fused Triton kernels for Transformer inference: RMSNorm+RoPE, Gated MLP, and FP8 GEMM.
A specialized compiler that optimizes deep learning models for AI accelerators with operator fusion, memory optimization, and hardware-specific passes.
Add a description, image, and links to the operator-fusion topic page so that developers can more easily learn about it.
To associate your repository with the operator-fusion topic, visit your repo's landing page and select "manage topics."