Starred repositories
A community-maintained Python framework for creating mathematical animations.
An open collection of annotated voices in Japanese language
Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits for the end of the source utterance to start translating--- H…
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
An unofficial PyTorch implementation of the audio LM VALL-E
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".
This repository contains the Hugging Face Agents Course.
On-device voice activity detection (VAD) powered by deep learning
a Repository of Open-WebUI tools to use with your favourite LLMs
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
SALMONN: Speech Audio Language Music Open Neural Network
Audio Dataset for training CLAP and other models
A set of scripts to grab public datasets from resources related to arXiv
SpeechGPT Series: Speech Large Language Models
🔊 Text-Prompted Generative Audio Model
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
A curated list of awesome voice conversion, projects and communities.
Easily train a good VC model with voice data <= 10 mins!
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
GPT-4o-level, real-time spoken dialogue system.