ShallowMLA

The PyTorch implementation of Multi-head Latent Attention.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
log_dir/triton_1744234882		log_dir/triton_1744234882
logdir/torch_1744234881		logdir/torch_1744234881
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cache_manager.py		cache_manager.py
kernel.py		kernel.py
kerner_flash_attn.py		kerner_flash_attn.py
mla.py		mla.py
test_mla.py		test_mla.py
throughput_comparison.png		throughput_comparison.png

Provide feedback