Deep Complex U-Net

Unofficial PyTorch Implementation of Phase-Aware Speech Enhancement with Deep Complex U-Net, (H. Choi et al., 2018)

Note

This is NOT author's implementation.

Architecture

(TO BE)

Requirements

torch==1.1
soundfile==0.9.0
easydict==1.9
git+https://github.com/keunwoochoi/torchaudio-contrib@61fc6a804c941dec3cf8a06478704d19fc5e415a
git+https://github.com/sweetcocoa/PinkBlack@e45a65623c1b511181f7ea697ca841a7b2900f17
torchcontrib==0.0.2
git+https://github.com/vBaiCai/python-pesq
# gputil # if you need to execute multiple training process

Train

Download Datasets:

https://datashare.is.ed.ac.uk/handle/10283/2791

Separate each train / test wavs
Downsample wavs

# prerequisite : ffmpeg.
# sudo apt-get install ffmpeg (Ubuntu)
bash downsample.sh   # all wavs below $PWD will be converted to .flac, 16k samplerate

Train

python train_dcunet.py --batch_size 12 \
                       --train_signal /path/to/train/clean/speech/ \
                       --train_noise /path/to/train/noisy/speech/ \
                       --test_signal /path/to/test/clean/speech/ \
                       --test_noise /path/to/test/noisy/speech/ \
                       --ckpt /path/to/save/checkpoint.pth \
                       --num_step 50000 \
                       --validation_interval 500 \
                       --complex

# You can check other arguments from the source code. ( Sorry for the lack description. )

Test

python estimate_directory.py --input_dir /path/to/noisy/speech/ \
                             --output_dir /path/to/estimate/dir/ \
                             --ckpt /path/to/saved/checkpoint.pth

Results

PESQ(cRMCn/cRMRn)	Paper	Mine*
DCUNet - 10	2.72/2.51	3.03/3.07
DCUNet - 20	3.24/2.74	3.12/3.11

cRMCn : Complex-valued input/Output
cRMRn : Real-valued input/Output

Comparing the two(Paper's, Mine) values above is inappropriate for the following reasons:

* I did not use matlab code that the author used to calculate pesq, but instead used pypesq.
* The Architecture of model is slightly different from the original paper. (Such as kernel size of convolution filters)
MelSpec

Notes

Log amplitute estimate has slightly worse performance than non-log amplitude
Complex-valued network does not make the metric better..

Sample Wavs

Mixture	Estimated Speech	GT(Clean Speech)
mixture1.wav	Estimated1.wav	GroundTruth1.wav
mixture2.wav	Estimated2.wav	GroundTruth2.wav

Contact

Jongho Choi(sweetcocoa@snu.ac.kr / Seoul National Univ., ESTsoft )

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
DCUNet		DCUNet
assets		assets
README.md		README.md
downsample.sh		downsample.sh
estimate_directory.py		estimate_directory.py
evaluation.py		evaluation.py
train.py		train.py
train_dcunet.py		train_dcunet.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DCUNet

DCUNet

assets

assets

README.md

README.md

downsample.sh

downsample.sh

estimate_directory.py

estimate_directory.py

evaluation.py

evaluation.py

train.py

train.py

train_dcunet.py

train_dcunet.py

Repository files navigation

Deep Complex U-Net

Architecture

Requirements

Train

Test

Results

Notes

Sample Wavs

Contact

About

Releases

Packages

Languages

wangtianrui/DeepComplexUNetPyTorch

Folders and files

Latest commit

History

Repository files navigation

Deep Complex U-Net

Architecture

Requirements

Train

Test

Results

Notes

Sample Wavs

Contact

About

Resources

Stars

Watchers

Forks

Languages