Stars
Audio processing by using pytorch 1D convolution network
The Microsoft Scalable Noisy Speech Dataset (MS-SNSD) is a noisy speech dataset that can scale to arbitrary sizes depending on the number of speakers, noise types, and Speech to Noise Ratio (SNR) l…
Collection of audio-focused loss functions in PyTorch
Official PyTorch implementation of BigVGAN (ICLR 2023)
A PyTorch implementation of EfficientNet
A memory-efficient implementation of DenseNets
Official implementation for "iTransformer: Inverted Transformers Are Effective for Time Series Forecasting" (ICLR 2024 Spotlight), https://openreview.net/forum?id=JePfAI8fah
Code release for ConvNeXt V2 model
Noise Conditional Score Networks (NeurIPS 2019, Oral)
Time series Timeseries Deep Learning Machine Learning Python Pytorch fastai | State-of-the-art Deep Learning library for Time Series and Sequences in Pytorch / fastai
[NeurIPS 2024 Spotlight] Official repository of the CycleNet paper: "CycleNet: Enhancing Time Series Forecasting through Modeling Periodic Patterns". This work is developed by the Lab of Professor …
A Python wrapper for the high-quality vocoder "World"
Vector (and Scalar) Quantization, in Pytorch
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
Pytorch implementation of the CREPE pitch tracker
MambaOut: Do We Really Need Mamba for Vision? (CVPR 2025)
vits2 backbone with multilingual-bert
Code and slides of my YouTube series called "Audio Signal Proessing for Machine Learning"
A feature-rich command-line audio/video downloader
Browser-based Amazon Mechanical Turk management console
Django-based clone of Amazon's Mechanical Turk service running in your local environment.
Python library for calculating the mean opinion score and 95% confidence interval of the standard deviation of text-to-speech ratings according to Ribeiro et al. (2011).
A Collection of Papers and Codes for CVPR2025/CVPR2024/ECCV2024 AIGC
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Python interface to the WebRTC Voice Activity Detector