Transcription / Speech
Companion repo for the paper "PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings" published at Odyssey 2024
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Advanced data structures for handling temporal segments with attached labels.
A toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems
Mirror of hf.co/pyannote/speaker-diarization-3.1
kaldi-asr/kaldi is the official location of the Kaldi project.
Synchronized Translation for Videos. Video dubbing
Automatically create synchronised lyrics files in ASS and LRC with word-level timestamps, using Whisper and lyrics from online sources, with anchor sequences and LLMs to auto-correct transcription
JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.
Speech, Language, Audio, Music Processing with Large Language Model
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
ScribeWizard: Generate organized notes from audio using Groq, Whisper, and Llama3
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2E, F5-TTS, CosyVoice), with Whisper audio processing, RVC voice changer, YouTube downlo…
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover and Transcription.
Fast and accurate automatic speech recognition (ASR) for edge devices
VoiceRestore: Flow-Matching Transformers for Universal Speech Restoration
An experiment in meeting transcription and diarization with just an LLM. Maybe I went a little overboard though
Useful resources for LLM-based Diarization and Transcription.
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
A WebAssembly-powered Voice Activity Detection library for the browser.
Cog implementation of transcribing + diarization pipeline with Whisper & Pyannote
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
An audio/acoustic activity detection and audio segmentation tool
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization