#

triattention

Here is 1 public repository matching this topic...

atomicmilkshake / llama-cpp-turboquant

llama.cpp fork with TurboQuant quantization (turbo2/3/4) and TriAttention GPU-accelerated KV cache pruning. 75 tok/s on Qwen3-8B / RTX 3080.

windows cuda inference quantization kv-cache llm llama-cpp ggml turboquant triattention

Updated Apr 9, 2026
C++

Improve this page

Add a description, image, and links to the triattention topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the triattention topic, visit your repo's landing page and select "manage topics."