WSJ Data Preparation
This repository aims at providing some useful scritps to do data preparation for WSJ data.
Install Necessary Tools
cd tools make
How to Use
# convert sphere to waveform bash wsj0/1_sph2wav.sh # remember to change wsj0_dir and save_dir # add noise python wsj0/2_prep_noisy_data.py -h
There are some public datasets we can use, including noise, RIR and well-simulated noisy speech.
You can use any noise corpus. But the sample rate of noise and clean speech must be same. Ohterwise, you need to use
tools/resample.py to down-sample clean speech or noise. There are some open source noise we can use: