mla

An efficient and scalable attention module designed to reduce memory usage and improve inference speed in large language models. Designed and implemented the Multi-Head Latent Attention (MLA) module as a drop-in replacement for traditional multi-head attention (MHA) in large language models.

pytorch attention-mechanism mla multi-head-attention llm deepseek

Updated Jun 25, 2025
Python

haukzero / from-mha-to-mla

Star

MHA, MQA, GQA, MLA 相关原理及简要实现

attention mha mla gqa mqa

Updated Jan 23, 2025
Python

scar-ai / The-Latentformer

Star

A Mixture of Experts model with latent attention designed for efficient training and inference.

training machine-learning ai transformer mla llm

Updated Jul 17, 2025
Python

Improve this page

Add a description, image, and links to the mla topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the mla topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mla

Here are 9 public repositories matching this topic...

xlite-dev / LLM-Infra

fxmeng / TransMLA

hahnec / plenopticam

shadowpa0327 / Palu

hahnec / plenoptisign

abdelfattah-lab / xKV

junfanz1 / MiniGPT-and-DeepSeek-MLA-Multi-Head-Latent-Attention

haukzero / from-mha-to-mla

scar-ai / The-Latentformer

Improve this page

Add this topic to your repo