turboquant

Star

Here are 22 public repositories matching this topic...

arozanov / turboquant-mlx

Star

TurboQuant KV cache compression for MLX with fused Metal kernels. 4.6x compression at 98% FP16 speed.

metal quantization mlx kv-cache apple-silicon llm turboquant

Updated Mar 28, 2026
Python

Firmamento-Technologies / TurboQuant

Star

Near-optimal vector quantization from Google's ICLR 2026 paper — 95% recall, 5x compression, zero preprocessing, pure Python FAISS replacement

Updated Mar 28, 2026
Python

back2matching / turboquant

Star

First open-source TurboQuant KV cache compression for LLM inference. Drop-in for HuggingFace. pip install turboquant.

machine-learning compression gpu transformers inference pytorch quantization vram huggingface kv-cache llm turboquant

Updated Mar 27, 2026
Python

jjang-ai / exploitbot

Star

No bs theatricals. Real automated pentesting. Mac only.

Updated Mar 26, 2026
Python

diyagk01 / TurboRAG

Star

TurboQuant‑style embedding compression for RAG: an SDK using fixed rotations, PolarQuant, and QJL residual sketches for compact storage and fast similarity search

rag turboquant

Updated Mar 28, 2026
Python

amitshekhariitbhu / turboquant-experiment

Star

KV Cache with PagedAttention vs PagedAttention + TurboQuant - experiments across token sizes comparing memory, latency, and accuracy.

inference large-language-models llm llms llm-inference kvcache kvcache-optimization kvcache-compression turboquant

Updated Mar 26, 2026
Python

yzamari / turboQuantPlayground

Star

TurboQuant (ICLR 2026) ported to Apple Silicon — KV cache compression with MLX Metal kernels + PyTorch CPU

machine-learning deep-learning metal transformers inference pytorch attention quantization mlx iclr kv-cache apple-silicon llm llm-inference turboquant

Updated Mar 28, 2026
Python

chahero / turboquant-experiments

Star

Interactive Benchmarking Tool for TurboQuant KV Cache Compression. Supports 2-4 bit quantization with Real-time Metrics

nlp machine-learning deep-learning pytorch transformer mistral vector-quantization model-compression inference-optimization kv-cache llm vllm qwen iclr-2026 turboquant

Updated Mar 28, 2026
Python

wjddusrb03 / diffmind

Star

AI Code Review Memory - learns from your team's bug history and warns when similar patterns appear

python git ai developer-tools code-review semantic-search bug-detection turboquant

Updated Mar 28, 2026
Python

tushu1232 / turboquant-server

Star

Turbo Index

google hpc gpu information-theory pytorch nearest-neighbor-search quantization vector-quantization kv-cache large-language-models llm-inference turboquant turboindexer

Updated Mar 25, 2026
Python

wjddusrb03 / chatmind

Star

ChatMind: Semantic search for Discord & KakaoTalk chat messages. Search by meaning, not keywords. Powered by TurboQuant compression (ICLR 2026).

multilingual python nlp discord embeddings developer-tools semantic-search kakaotalk cli-tool sentence-embeddings chat-export iclr2026 natural-language-search turboquant vector-compression chat-search message-search

Updated Mar 28, 2026
Python

Ryuketsukami / turboquant-skill

Star

AI agent skill implementing Google's TurboQuant compression algorithm (ICLR 2026) — 6x KV cache memory reduction, 8x speedup, zero accuracy loss. Compatible with Claude Code, Codex CLI, and all Agent Skills-compatible tools.

Updated Mar 28, 2026
Python

Ryuketsukami / turboquant-compression

Star

Near-optimal vector quantization for LLM KV cache compression. Python implementation of TurboQuant (ICLR 2026) — PolarQuant + QJL for 3-bit quantization with minimal accuracy loss and up to 8x memory reduction.

Updated Mar 28, 2026
Python

wjddusrb03 / commitmind

Star

CommitMind: Semantic search for Git commit history powered by TurboQuant vector compression (ICLR 2026). Search commits by meaning, not just keywords.

python git nlp machine-learning embeddings code-search developer-tools quantization semantic-search git-history cli-tool sentence-embeddings commit-history commit-search iclr2026 natural-language-search turboquant vector-compression

Updated Mar 28, 2026
Python

GenauraApp / TurboQuant

Star

Near-optimal vector quantization with zero metadata overhead — PyTorch SDK based on Google Research ICLR 2026

Updated Mar 25, 2026
Python

vivekvar-dl / turboquant

Star

First open-source implementation of TurboQuant (arXiv 2504.19874) — 4-7x LLM KV cache compression. pip install turbokv

google compression ai deep-learning transformers inference pytorch deepmind quantization huggingface kv-cache llm turboquant

Updated Mar 29, 2026
Python

Sunnyztj / turboquant-memory

Star

TurboQuant (ICLR 2026) vector quantization for memory/RAG embedding compression | 5-8x压缩 98%+召回率 | numpy only, no GPU

numpy vector-quantization rag embedding-compression memory-search iclr2026 openclaw hadamard-transform turboquant

Updated Mar 27, 2026
Python

devYRPauli / turboquant-m1pro-evaluation

Star

TurboQuant KV cache compression evaluation on Apple M1 Pro 16GB. Two-round study: MLX path (100% needle at 16K) and llama.cpp Metal path. Five implementation bugs found and fixed.

quantization kv-cache apple-silicon llm turboquant

Updated Mar 27, 2026
Python

wjddusrb03 / logmind

Star

AI-powered log anomaly detection CLI — learns normal patterns, detects anomalies with semantic embeddings, matches past incidents. Powered by TurboQuant 3-bit compression (ICLR 2026).

Updated Mar 28, 2026
Python

back2matching / turboquant-vectors

Star

Compress embeddings 6x instantly with TurboQuant. First pip package using Google's TurboQuant (ICLR 2026) for vector search. 71.9% recall vs FAISS PQ 13.3%.

machine-learning compression numpy embeddings quantization faiss rag vector-search turboquant

Updated Mar 26, 2026
Python

Improve this page

Add a description, image, and links to the turboquant topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the turboquant topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

turboquant

Here are 22 public repositories matching this topic...

arozanov / turboquant-mlx

Firmamento-Technologies / TurboQuant

back2matching / turboquant

jjang-ai / exploitbot

diyagk01 / TurboRAG

amitshekhariitbhu / turboquant-experiment

yzamari / turboQuantPlayground

chahero / turboquant-experiments

wjddusrb03 / diffmind

tushu1232 / turboquant-server

wjddusrb03 / chatmind

Ryuketsukami / turboquant-skill

Ryuketsukami / turboquant-compression

wjddusrb03 / commitmind

GenauraApp / TurboQuant

vivekvar-dl / turboquant

Sunnyztj / turboquant-memory

devYRPauli / turboquant-m1pro-evaluation

wjddusrb03 / logmind

back2matching / turboquant-vectors

Improve this page

Add this topic to your repo