-
Notifications
You must be signed in to change notification settings - Fork 45.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update deep speech model with pure tensorflow API implementation (#4730)
* update * update * update * update * update
- Loading branch information
1 parent
37ba230
commit d90f558
Showing
8 changed files
with
548 additions
and
496 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
# DeepSpeech2 Model | ||
## Overview | ||
This is an implementation of the [DeepSpeech2](https://arxiv.org/pdf/1512.02595.pdf) model. Current implementation is based on the code from the authors' [DeepSpeech code](https://github.com/PaddlePaddle/DeepSpeech) and the implementation in the [MLPerf Repo](https://github.com/mlperf/reference/tree/master/speech_recognition). | ||
|
||
DeepSpeech2 is an end-to-end deep neural network for automatic speech | ||
recognition (ASR). It consists of 2 convolutional layers, 5 bidirectional RNN | ||
layers and a fully connected layer. The feature in use is linear spectrogram | ||
extracted from audio input. The network uses Connectionist Temporal Classification [CTC](https://www.cs.toronto.edu/~graves/icml_2006.pdf) as the loss function. | ||
|
||
## Dataset | ||
The [OpenSLR LibriSpeech Corpus](http://www.openslr.org/12/) are used for model training and evaluation. | ||
|
||
The training data is a combination of train-clean-100 and train-clean-360 (~130k | ||
examples in total). The validation set is dev-clean which has 2.7K lines. | ||
The download script will preprocess the data into three columns: wav_filename, | ||
wav_filesize, transcript. data/dataset.py will parse the csv file and build a | ||
tf.data.Dataset object to feed data. Within each epoch (except for the | ||
first if sortagrad is enabled), the training data will be shuffled batch-wise. | ||
|
||
## Running Code | ||
|
||
### Configure Python path | ||
Add the top-level /models folder to the Python path with the command: | ||
``` | ||
export PYTHONPATH="$PYTHONPATH:/path/to/models" | ||
``` | ||
|
||
### Install dependencies | ||
|
||
First install shared dependencies before running the code. Issue the following command: | ||
``` | ||
pip3 install -r requirements.txt | ||
``` | ||
or | ||
``` | ||
pip install -r requirements.txt | ||
``` | ||
|
||
### Download and preprocess dataset | ||
To download the dataset, issue the following command: | ||
``` | ||
python data/download.py | ||
``` | ||
Arguments: | ||
* `--data_dir`: Directory where to download and save the preprocessed data. By default, it is `/tmp/librispeech_data`. | ||
|
||
Use the `--help` or `-h` flag to get a full list of possible arguments. | ||
|
||
### Train and evaluate model | ||
To train and evaluate the model, issue the following command: | ||
``` | ||
python deep_speech.py | ||
``` | ||
Arguments: | ||
* `--model_dir`: Directory to save model training checkpoints. By default, it is `/tmp/deep_speech_model/`. | ||
* `--train_data_dir`: Directory of the training dataset. | ||
* `--eval_data_dir`: Directory of the evaluation dataset. | ||
* `--num_gpus`: Number of GPUs to use (specify -1 if you want to use all available GPUs). | ||
|
||
There are other arguments about DeepSpeech2 model and training/evaluation process. Use the `--help` or `-h` flag to get a full list of possible arguments with detailed descriptions. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.