Skip to content

zelokuo/VPIDM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

(VPIDM) Variance Preserving Interpolation Diffusion Model for Speech Enhancement

variance-preserving-based interpolation diffusion models for speech enhancement, in which we apply the diffusion model to the speech enhancement (denoising) task. (Diffusion Models for Speech Enhancement)

Listening Demo

We provide a listening Demo here using the model trained on the DNS corpus.

Preparation

Install requirements in requirements.txt via

pip install -r requirements.txt

Training from Scratch

Prepare your dataset dir in the form of

.../mydataset/train/clean
.../mydataset/train/noisy
.../mydataset/valid/clean
.../mydataset/valid/noisy

Use the following command for training your VPIDM

python train.py --base_dir <your dataset dir, e.g., .../mydataset/ >
                --gpus 4
                --no_wandb
                --sde vpsde
                --eta 1.5
                --beta-max 2
                --N 25
                --t_eps 4e-2
                --logdir <your log dir>

Our code is also compatible with SGMSE+

python train.py -base_dir <your dataset dir, e.g., .../mydataset/ >
                --gpus 4
                --no_wandb
                --sde ouve
                --theta 1.5
                --N 30
                --t_eps 0.03
                --logdir <your log dir>

Enhancing

For denoising via the VPIDM

python enhancement.py --test_dir <test_noisy_dir>
                      --corrector_step 0
                      --N 25
                      --enhanced_dir <outputs_dir>
                      --ckpt <your checkpoint best.ckpt>

For denoising via the SGMSE+

python enhancement.py --test_dir <test_noisy_dir>
                      --corrector_step 1
                      --N 30
                      --enhanced_dir <outputs_dir>
                      --ckpt <your checkpoint best.ckpt>

Checkpoints

We release the checkpoint trained on the VoiceBank+Demand dataset here.

We release the checkpoint trained on the DNS corpus (only using additive noises) here.

Thanks and Citations

This code is mainly built on the SGMSE+. grateful for their open-source spirit. If you find this project helpful, please kindly cite the following papers.

@inproceedings{guo23_interspeech,
  author={Zilu Guo and Jun Du and Chin-Hui Lee and Yu Gao and Wenbin Zhang},
  title={{Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
  pages={1065--1069},
  doi={10.21437/Interspeech.2023-1265}
}
@article{richter2023speech,
  title={Speech Enhancement and Dereverberation with Diffusion-based Generative Models},
  author={Richter, Julius and Welker, Simon and Lemercier, Jean-Marie and Lay, Bunlong and Gerkmann, Timo},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  volume={31},
  pages={2351-2364},
  year={2023},
  doi={10.1109/TASLP.2023.3285241}
}

About

This is official repository for "variance-preserving-based interpolation diffusion models for speech enhanc"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published