quantization

Here are 18 public repositories matching this topic...

wafer-ai / chipbenchmark

a platform for monitoring the chip situation

python benchmark cpp amd gpu nvidia high-performance-computing quantization large-language-models llm llms llm-benchmarking

Updated Jul 19, 2025
Shell

Intelligent-Microsystems-Lab / QuantizedLSTM

Star

Models and training scripts for "LSTMs for Keyword Spotting with ReRAM-based Compute-In-Memory Architectures" (ISCAS 2021).

lstm quantization kws

Updated Mar 25, 2021
Shell

Docker image for a self-hosted WhisperLive real-time speech-to-text server, powered by faster-whisper. Provides WebSocket streaming for live audio transcription and an OpenAI-compatible REST API. Supports all Whisper models, VAD, NVIDIA GPU (CUDA) acceleration, offline mode, and multi-arch (amd64, arm64).

Updated Jun 6, 2026
Shell

daudix / Pixfect

Star

A pretty way to compress images

bash imagemagick image video ffmpeg pixel-art image-processing dithering bash-script quantization dither

Updated Jul 9, 2023
Shell

hwdsl2 / whisper-install

Sponsor

Star

Whisper speech-to-text server installer for Ubuntu, Debian, AlmaLinux, Rocky Linux, CentOS, RHEL and Fedora. OpenAI-compatible transcription and translation APIs powered by faster-whisper. Supports all Whisper models, word-level timestamps, JSON/SRT/VTT output, SSE streaming and offline mode.

Updated Jun 5, 2026
Shell

hamr0 / coding-assistant

Star

Coding assistant is a lightweight llama.cpp wrapper for quantized local SLM deployment

nodejs cli slm quantization coding-assistant llama-cpp local-llm

Updated May 22, 2026
Shell

custom-build-robots / tensorrt-llm-edge-prep

Star

Build, run, and setup scripts for the complete TensorRT-LLM pipeline on RTX A6000 Ada (SM89). Reproducible path from HuggingFace checkpoint to deployable .engine file, with FP16 baseline and FP8 quantization. Companion material to the 4-part blog series on ai-box.eu — in preparation for the NVIDIA TensorRT Edge-LLM ecosystem.

inference nvidia quantization rtx fp16 ai-agents edge-ai llm ada-architecture fp8 tensorrt-llm

Updated May 16, 2026
Shell

GumbiiDigital / spark-nvfp4-lab

Star

Local GPU inference experiments for NVFP4 quantization and Spark model workflows.

python shell gpu quantization local-inference

Updated Jun 2, 2026
Shell

DrizzleLouseTurbine / KVarN-807

Star

KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.

quantization kv-cache llm long-context vllm llm-inference agentic-ai

Updated Jun 6, 2026
Shell

ChainArtisan / KVarN-723

Star

KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.

quantization kv-cache llm long-context vllm llm-inference agentic-ai

Updated Jun 6, 2026
Shell

BookkeeperShelter / KVarN-674

Star

KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.

quantization kv-cache llm long-context vllm llm-inference agentic-ai

Updated Jun 6, 2026
Shell

Powderbatpatch / KVarN-700

Star

KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.

quantization kv-cache llm long-context vllm llm-inference agentic-ai

Updated Jun 6, 2026
Shell

timteh / timteh-forge

Star

⚡ TIMTEH Model Forge — Uncensored, abliterated & reasoning-distilled GGUFs. Forged on 8×H200 SXM5 | 1.1TB VRAM

open-source machine-learning ai quantization uncensored h200 huggingface llm gguf abliteration

Updated Mar 27, 2026
Shell

zetta-app / llama.cpp_turboquant

Star

LLM inference with 7x KV cache compression. Combines llama.cpp (production inference engine) with TurboQuant (KV quantization). Run 131K token context on 16GB VRAM. OpenAI-compatible API server. Supports 100+ model architectures.

machine-learning deep-learning gpu cuda inference neural-networks quantization language-model kv-cache openai-api llm llama-cpp local-ai memory-compression turboquant

Updated Apr 9, 2026
Shell

ApparitionEmperor / KVarN-204

Star

KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.

quantization kv-cache llm long-context vllm llm-inference agentic-ai

Updated Jun 5, 2026
Shell

NguyenPhamMC / whisperer

Star

🎤 Record and transcribe voice dictation on Linux with push-to-talk functionality, injecting text directly into any focused application.

python docker elasticsearch machine-learning elasticstack translation transformer voice-recognition openai automatic-speech-recognition nessus quantization dictation hacktoberfest whisper transcribe tensorrt-llm

Updated Jun 6, 2026
Shell

BoatwrightLevel / KVarN-147

Star

KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.

quantization kv-cache llm long-context vllm llm-inference agentic-ai

Updated Jun 6, 2026
Shell

H4RUming / gemma4-vllm-stack

Star

vLLM serving stack for Gemma 4 31B on RTX PRO 6000 Blackwell, with FP8 KV cache, MTP speculative decoding, and an async FastAPI logging proxy in front.

quantization gemma fp8 vllm llm-inference speculative-decoding nvfp4

Updated May 24, 2026
Shell

Improve this page

Add a description, image, and links to the quantization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the quantization topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quantization

Here are 18 public repositories matching this topic...

wafer-ai / chipbenchmark

Intelligent-Microsystems-Lab / QuantizedLSTM

hwdsl2 / docker-whisper-live

daudix / Pixfect

hwdsl2 / whisper-install

hamr0 / coding-assistant

custom-build-robots / tensorrt-llm-edge-prep

GumbiiDigital / spark-nvfp4-lab

DrizzleLouseTurbine / KVarN-807

ChainArtisan / KVarN-723

BookkeeperShelter / KVarN-674

Powderbatpatch / KVarN-700

timteh / timteh-forge

zetta-app / llama.cpp_turboquant

ApparitionEmperor / KVarN-204

NguyenPhamMC / whisperer

BoatwrightLevel / KVarN-147

H4RUming / gemma4-vllm-stack

Improve this page

Add this topic to your repo