A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
-
Updated
Aug 5, 2025 - Python
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Automagically synchronize subtitles with video.
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
faster_whisper GUI with PySide6
Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.
Voice Activity Detector(VAD) from TEN: low-latency, high-performance and lightweight
Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.
An audio/acoustic activity detection and audio segmentation tool
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!
Fully Native Swift and CoreML. Efficient Speaker Diarization, VAD, and Speech-to-Text for realtime workloads
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.
Runtime Audio Importer plugin for Unreal Engine. Importing audio of various formats at runtime.
Voice Activity Detection based on Deep Learning & TensorFlow
Experimental code: sound file preprocessing to optimize Whisper transcriptions without hallucinated texts
On-device voice activity detection (VAD) powered by deep learning
Python bindings of WebRTC Audio Processing
A statistical model-based Voice Activity Detection
This is an on-CPU real-time conversational system for two-way speech communication with AI models, utilizing a continuous streaming architecture for fluid conversations with immediate responses and natural interruption handling.
Add a description, image, and links to the vad topic page so that developers can more easily learn about it.
To associate your repository with the vad topic, visit your repo's landing page and select "manage topics."