Skip to content
View fabiodr's full-sized avatar

Block or report fabiodr

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

Transcription / Speech

73 repositories

Companion repo for the paper "PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings" published at Odyssey 2024

Python 82 5 Updated Jan 10, 2025

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook 7,029 843 Updated Mar 6, 2025

Advanced data structures for handling temporal segments with attached labels.

Jupyter Notebook 110 44 Updated Feb 9, 2025

A toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems

Python 202 35 Updated Feb 19, 2025

Mirror of hf.co/pyannote/speaker-diarization-3.1

Python 20 2 Updated Jan 7, 2024

kaldi-asr/kaldi is the official location of the Kaldi project.

Shell 14,651 5,347 Updated Jan 28, 2025

Synchronized Translation for Videos. Video dubbing

Python 1,049 211 Updated Jan 30, 2025

Automatically create synchronised lyrics files in ASS and LRC with word-level timestamps, using Whisper and lyrics from online sources, with anchor sequences and LLMs to auto-correct transcription

Python 46 12 Updated Mar 12, 2025

A fast, local neural text to speech system

C++ 8,146 616 Updated Mar 3, 2025

JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.

Jupyter Notebook 4,570 398 Updated Apr 3, 2024

Speech, Language, Audio, Music Processing with Large Language Model

Python 753 72 Updated Mar 11, 2025

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,853 195 Updated Nov 14, 2024

ScribeWizard: Generate organized notes from audio using Groq, Whisper, and Llama3

Python 483 116 Updated Jan 22, 2025

Identify speakers with stable voice timbre.

Python 28 3 Updated Jun 20, 2024
Jupyter Notebook 115 5 Updated Jun 26, 2024

Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2E, F5-TTS, CosyVoice), with Whisper audio processing, RVC voice changer, YouTube downlo…

Python 3,460 261 Updated Mar 13, 2025

Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover and Transcription.

Python 42 4 Updated Nov 22, 2024

Fast and accurate automatic speech recognition (ASR) for edge devices

Python 2,629 137 Updated Feb 26, 2025

VoiceRestore: Flow-Matching Transformers for Universal Speech Restoration

Python 153 15 Updated Feb 8, 2025
Python 3 Updated Oct 1, 2024

An experiment in meeting transcription and diarization with just an LLM. Maybe I went a little overboard though

TypeScript 524 39 Updated Feb 8, 2025

Useful resources for LLM-based Diarization and Transcription.

TypeScript 55 4 Updated Oct 15, 2024

Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection

Python 616 31 Updated Dec 19, 2024

A WebAssembly-powered Voice Activity Detection library for the browser.

TypeScript 9 Updated Sep 30, 2024
TypeScript 3 Updated Sep 30, 2024

Cog implementation of transcribing + diarization pipeline with Whisper & Pyannote

Python 193 57 Updated Feb 19, 2025

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

Python 2,291 174 Updated Feb 14, 2025

An audio/acoustic activity detection and audio segmentation tool

Python 769 95 Updated Dec 11, 2024

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Python 1,779 148 Updated Mar 12, 2025