kernel-programming

Here is 1 public repository matching this topic...

adityakamat24 / triton-fast-mha

A high-performance kernel implementation of multi-head attention using Triton. Focused on minimizing memory overhead and maximizing throughput for large-scale transformer layers. Includes clean-tensor layouts, head-grouping optimisations, and ready-to-benchmark code you can plug into custom models.

transformers parallelism triton memory-efficiency gpu-optimization multi-head-attention kernel-programming flashattention

Updated Aug 12, 2025
Python

Improve this page

Add a description, image, and links to the kernel-programming topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the kernel-programming topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kernel-programming

Here is 1 public repository matching this topic...

adityakamat24 / triton-fast-mha

Improve this page

Add this topic to your repo