A PyTorch-based Speech Toolkit
-
Updated
Jul 17, 2025 - Python
A PyTorch-based Speech Toolkit
End-to-End Speech Processing Toolkit
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
The PyTorch-based audio source separation toolkit for researchers
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Unofficial PyTorch implementation of Google AI's VoiceFilter system
A must-read paper for speech separation based on neural networks
A PyTorch implementation of Conv-TasNet described in "TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation" with Permutation Invariant Training (PIT).
PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."
Deep Xi: A deep learning approach to a priori SNR estimation implemented in TensorFlow 2/Keras. For speech enhancement and robust ASR.
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation Pytorch's Implement
UniSpeech - Large Scale Self-Supervised Learning for Speech
This repo summarizes the tutorials, datasets, papers, codes and tools for speech separation and speaker extraction task. You are kindly invited to pull requests.
Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation implemented by Pytorch
The dataset of Speech Recognition
Tools for Speech Enhancement integrated with Kaldi
The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.
Deep Recurrent Neural Networks for Source Separation
Code for SuDoRm-Rf networks for efficient audio source separation. SuDoRm-Rf stands for SUccessive DOwnsampling and Resampling of Multi-Resolution Features which enables a more efficient way of separating sources from mixtures.
Real-time GCC-NMF Blind Speech Separation and Enhancement
Add a description, image, and links to the speech-separation topic page so that developers can more easily learn about it.
To associate your repository with the speech-separation topic, visit your repo's landing page and select "manage topics."