TurboQuant KV cache compression for MLX with fused Metal kernels. 4.6x compression at 98% FP16 speed.
-
Updated
Mar 28, 2026 - Python
TurboQuant KV cache compression for MLX with fused Metal kernels. 4.6x compression at 98% FP16 speed.
Near-optimal vector quantization from Google's ICLR 2026 paper — 95% recall, 5x compression, zero preprocessing, pure Python FAISS replacement
First open-source TurboQuant KV cache compression for LLM inference. Drop-in for HuggingFace. pip install turboquant.
No bs theatricals. Real automated pentesting. Mac only.
TurboQuant‑style embedding compression for RAG: an SDK using fixed rotations, PolarQuant, and QJL residual sketches for compact storage and fast similarity search
KV Cache with PagedAttention vs PagedAttention + TurboQuant - experiments across token sizes comparing memory, latency, and accuracy.
TurboQuant (ICLR 2026) ported to Apple Silicon — KV cache compression with MLX Metal kernels + PyTorch CPU
Interactive Benchmarking Tool for TurboQuant KV Cache Compression. Supports 2-4 bit quantization with Real-time Metrics
AI Code Review Memory - learns from your team's bug history and warns when similar patterns appear
Turbo Index
ChatMind: Semantic search for Discord & KakaoTalk chat messages. Search by meaning, not keywords. Powered by TurboQuant compression (ICLR 2026).
AI agent skill implementing Google's TurboQuant compression algorithm (ICLR 2026) — 6x KV cache memory reduction, 8x speedup, zero accuracy loss. Compatible with Claude Code, Codex CLI, and all Agent Skills-compatible tools.
Near-optimal vector quantization for LLM KV cache compression. Python implementation of TurboQuant (ICLR 2026) — PolarQuant + QJL for 3-bit quantization with minimal accuracy loss and up to 8x memory reduction.
CommitMind: Semantic search for Git commit history powered by TurboQuant vector compression (ICLR 2026). Search commits by meaning, not just keywords.
Near-optimal vector quantization with zero metadata overhead — PyTorch SDK based on Google Research ICLR 2026
First open-source implementation of TurboQuant (arXiv 2504.19874) — 4-7x LLM KV cache compression. pip install turbokv
TurboQuant (ICLR 2026) vector quantization for memory/RAG embedding compression | 5-8x压缩 98%+召回率 | numpy only, no GPU
TurboQuant KV cache compression evaluation on Apple M1 Pro 16GB. Two-round study: MLX path (100% needle at 16K) and llama.cpp Metal path. Five implementation bugs found and fixed.
AI-powered log anomaly detection CLI — learns normal patterns, detects anomalies with semantic embeddings, matches past incidents. Powered by TurboQuant 3-bit compression (ICLR 2026).
Compress embeddings 6x instantly with TurboQuant. First pip package using Google's TurboQuant (ICLR 2026) for vector search. 71.9% recall vs FAISS PQ 13.3%.
Add a description, image, and links to the turboquant topic page so that developers can more easily learn about it.
To associate your repository with the turboquant topic, visit your repo's landing page and select "manage topics."