Skip to content
Implementation of a seq2seq model for Speech Recognition using the latest version of TensorFlow. Architecture similar to Listen, Attend and Spell.
Jupyter Notebook Python
Branch: master
Clone or download
Latest commit 67d14d5 Jun 21, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
imgs img Jun 20, 2018
LICENSE Initial commit Jun 11, 2018
README.md Update README.md Jun 20, 2018
SpeechRecognizer.py Speech_Recognition_with_Tensorflow Jun 11, 2018
sr.ipynb Update sr.ipynb Jun 12, 2018
sr_data_utils.py Speech_Recognition_with_Tensorflow Jun 11, 2018
sr_model_utils.py Speech_Recognition_with_Tensorflow Jun 11, 2018

README.md

Speech_Recognition_with_Tensorflow

Implementation of a seq2seq model for speech recognition. Architecture similar to "Listen, Attend and Spell". https://arxiv.org/pdf/1508.01211.pdf

alt text

Created: ['S', 'E', 'V', 'E', 'N', 'T', 'E', 'E', 'N', '<SPACE>', 'T', 'W', 'E', 'N', 'T', 'Y', '<SPACE>', 'F', 'O', 'U', 'R']
Actual: ['S', 'E', 'V', 'E', 'N', 'T', 'E', 'E', 'N', '<SPACE>', 'T', 'W', 'E', 'N', 'T', 'Y', '<SPACE>', 'F', 'O', 'U', 'R']

Prerequisites

  • Tensorflow
  • numpy
  • pandas
  • librosa
  • python_speech_features

Datasets

The dataset I used is the LibriSpeech dataset. It contains about 1000 hours of 16kHz read English speech. It is available here: http://www.openslr.org/12/

Code

I uploaded three .py files and one .ipynb file. The .py files contain the network implementation and utilities. The Jupyter Notebook is a demo of how to apply the model.

Architecture

Seq2Seq model
As I mentioned above the model architecture is similar to the one used in "Listen, Attend and Spell", i.e. we are using pyramidal bidirectional LSTMs in the encoder. This reduces the time resolution and enhances the performance on longer sequences.

  • Encoder-Decoder
  • Pyramidal Bidirectional LSTM
  • Bahdanau Attention
  • Adam Optimizer
  • exponential or cyclic learning rate
  • Beam Search or Greedy Decoding
You can’t perform that action at this time.