Skip to content
#

kernel-programming

Here is 1 public repository matching this topic...

A high-performance kernel implementation of multi-head attention using Triton. Focused on minimizing memory overhead and maximizing throughput for large-scale transformer layers. Includes clean-tensor layouts, head-grouping optimisations, and ready-to-benchmark code you can plug into custom models.

  • Updated Aug 12, 2025
  • Python

Improve this page

Add a description, image, and links to the kernel-programming topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the kernel-programming topic, visit your repo's landing page and select "manage topics."

Learn more