Skip to content
#

quantization

Here are 18 public repositories matching this topic...

Docker image for a self-hosted WhisperLive real-time speech-to-text server, powered by faster-whisper. Provides WebSocket streaming for live audio transcription and an OpenAI-compatible REST API. Supports all Whisper models, VAD, NVIDIA GPU (CUDA) acceleration, offline mode, and multi-arch (amd64, arm64).

  • Updated Jun 6, 2026
  • Shell

Whisper speech-to-text server installer for Ubuntu, Debian, AlmaLinux, Rocky Linux, CentOS, RHEL and Fedora. OpenAI-compatible transcription and translation APIs powered by faster-whisper. Supports all Whisper models, word-level timestamps, JSON/SRT/VTT output, SSE streaming and offline mode.

  • Updated Jun 5, 2026
  • Shell

Build, run, and setup scripts for the complete TensorRT-LLM pipeline on RTX A6000 Ada (SM89). Reproducible path from HuggingFace checkpoint to deployable .engine file, with FP16 baseline and FP8 quantization. Companion material to the 4-part blog series on ai-box.eu — in preparation for the NVIDIA TensorRT Edge-LLM ecosystem.

  • Updated May 16, 2026
  • Shell

Improve this page

Add a description, image, and links to the quantization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the quantization topic, visit your repo's landing page and select "manage topics."

Learn more