Unofficial implementation of phase reconstruction method based on recurrent phase unwrapping with DNNs
This repository provides an unofficial implementation of phase reconstruction based on RPU with DNNs [1].
Additionally, a weighted RPU is also implemented [2].
MIT licence.
Copyright (C) 2024 Akira Tamamori
We tested the implemention on Ubuntu 22.04. The verion of Python was 3.10.12
. The following modules are required:
- hydra-core
- joblib
- librosa
- numpy
- progressbar2
- pydub
- pypesq
- pyroomacoustics
- pystoi
- scikit-learn
- scipy
- soundfile
- torch
You need to prepare the following two datasets from JSUT corpus.
-
basic5000: for training
-
onomatopee300: for evaluation
-
Download the two datasets. Put those in /root_dir/trainset_dir and /root_dir/evalset_dir/, respectively.
-
Modify
config.yaml
according to your environment. It contains settings for experimental conditions. For immediate use, you can edit mainly the directory paths according to your environment. -
Run
preprocess.py
. It performs preprocessing steps. -
Run
training.py
. It performs model training. -
Run
evaluate_scores.py
. It generates reconstructed audio data and computes objective scores (PESQ, STOI, LSC). In this script, the functioncompute_rpu
implements RPU and weighted RPU. -
Run
evaluate_scores_zerophase.py
. It also generates reconstructed audio data and computes objective scores (PESQ, STOI, LSC), where phase spectrum is assumed to be zero (zero-phase). -
Run
evaluate_scores_randomphase.py
. It also generates reconstructed audio data and computes objective scores (PESQ, STOI, LSC), where phase spectrum is assumed to be sampled uniformly between$-\pi$ and$\pi$ . -
Run
plot_boxplot.py
. It plots boxplot of objective scores.
[1] Y. Masuyama, K. Yatabe, Y. Koizumi, Y. Oikawa and N. Harada, "Phase reconstruction based on recurrent phase unwrapping with deep neural networks," IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), May 2020.
[2] N. B. Thien, Y. Wakabayashi, K. Iwai and T. Nishiura, "Inter-Frequency Phase Difference for Phase Reconstruction Using Deep Neural Networks and Maximum Likelihood," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 1667-1680, 2023, doi: 10.1109/TASLP.2023.3268577.