Skip to content
Convert WSJ sphere format to waveform and do data simulation.
Python Shell Makefile
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
tools
wsj0
.gitignore
LICENSE
README.md

README.md

WSJ Data Preparation

This repository aims at providing some useful scritps to do data preparation for WSJ data.

Install Necessary Tools

cd tools
make

How to Use

WSJ0

# convert sphere to waveform
bash wsj0/1_sph2wav.sh   # remember to change wsj0_dir and save_dir

# add noise
python wsj0/2_prep_noisy_data.py -h

Public Dataset

There are some public datasets we can use, including noise, RIR and well-simulated noisy speech.

Noise Datasets

You can use any noise corpus. But the sample rate of noise and clean speech must be same. Ohterwise, you need to use tools/resample.py to down-sample clean speech or noise. There are some open source noise we can use:

  1. Nonspeech100
  2. MUSAN
  3. freesound

Room Impulse Response (RIR)

  1. OpenSLR
  2. AcouSP

Noisy Speech Datasets

  1. SUPERSEDED
You can’t perform that action at this time.