fabiodr

Fabio Dias Rollo fabiodr

Systems architect, passionated with tech and software development

190 followers · 797 following

@fabio_rollo

Stars

Transcription / Speech

73 repositories

huggingface / open_asr_leaderboard

Python 73 25 Updated Mar 13, 2025

joonaskalda / PixIT

Companion repo for the paper "PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings" published at Odyssey 2024

Python 82 5 Updated Jan 10, 2025

pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook 7,029 843 Updated Mar 6, 2025

pyannote / pyannote-core

Advanced data structures for handling temporal segments with attached labels.

Jupyter Notebook 110 44 Updated Feb 9, 2025

pyannote / pyannote-metrics

A toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems

Python 202 35 Updated Feb 19, 2025

pyannote / hf-speaker-diarization-3.1

Mirror of hf.co/pyannote/speaker-diarization-3.1

Python 20 2 Updated Jan 7, 2024

kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

Shell 14,651 5,347 Updated Jan 28, 2025

R3gm / SoniTranslate

Synchronized Translation for Videos. Video dubbing

Python 1,049 211 Updated Jan 30, 2025

nomadkaraoke / python-lyrics-transcriber

Automatically create synchronised lyrics files in ASS and LRC with word-level timestamps, using Whisper and lyrics from online sources, with anchor sequences and LLMs to auto-correct transcription

Python 46 12 Updated Mar 12, 2025

rhasspy / piper

A fast, local neural text to speech system

C++ 8,146 616 Updated Mar 3, 2025

sanchit-gandhi / whisper-jax

JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.

Jupyter Notebook 4,570 398 Updated Apr 3, 2024

X-LANCE / SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

Python 753 72 Updated Mar 11, 2025

ictnlp / LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,853 195 Updated Nov 14, 2024

Bklieger / ScribeWizard

ScribeWizard: Generate organized notes from audio using Groq, Whisper, and Llama3

Python 483 116 Updated Jan 22, 2025

2noise / ChatEval

Identify speakers with stable voice timbre.

Python 28 3 Updated Jun 20, 2024

futo-org / whisper-acft

Jupyter Notebook 115 5 Updated Jun 26, 2024

abus-aikorea / voice-pro

Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2E, F5-TTS, CosyVoice), with Whisper audio processing, RVC voice changer, YouTube downlo…

Python 3,460 261 Updated Mar 13, 2025