# Appendix

The purpose of this notebook is to give some orientation when exploring the code base.

## Getting started

The code was written for [Python](https://www.python.org/) `3.6.6` using [Anaconda](https://anaconda.org/) `4.5.8`.

### Dependencies
* This application uses [Pydub](http://pydub.com/) which means you will need **libav** or **ffmpeg** on your `PATH`. See the [Pydub Repository](https://github.com/jiaaro/pydub#installation) for further instructions.
* Visual C++ build tools to work with webrtcvad (google it for download link): must be installed before installing the python requirements (see below)!

### Installation
1. Clone [the repository](https://github.com/tiefenauer/forced-alignment): `git clone git@github.com:tiefenauer/forced-alignment.git` 
2. Install Python requirements: `pip install -r requirements.txt`
3. Install [TensorFlow](https://www.tensorflow.org/install/): TF is not included in `requirements.txt` because you can choose between the `tensorflow` (no GPU acceleration) and `tensorflow-gpu` (with GPU-acceleration). If your computer does not have a CUDA-supported GPU (like mine does) you will install the former, else the latter. Installing `tensorflow-gpu` on a computer without GPU does not work (at least I did not get it to work).
3. Run Jupyter Notebook: `jupyter notebook`

### Jupyter Notebook extensions

The following extensions for Jupyter Notebook were used:

* [jupyter_contrib_nbextensions](https://github.com/ipython-contrib/jupyter_contrib_nbextensions): A collection of useful extensions (like a TOC) that also includes a manager that allows enabling/disabling individual extensiosn from the web interface
* [cite2c](https://github.com/takluyver/cite2c): For managing citations  (works with [Zotero](https://www.zotero.org/))

## Code Structure

| Folder | Description |
|---|---|
| / | root folder containing some Bash scripts to train the RNN |
| assets | binary data (images, audio, etc...) used for the Jupyter Notebooks |
| demos | HTML-files to visualize the result of the alignment pipeline. |
| src | scripts and Python source files containing all application logic. Also, the documentation is stored here |
| test | some unit tests |
| tmp | temporary folder, e.g. needed for the VAD stage. No persistent files should be stored here as this folder might be deleted at any time by application logic! |

## Important scripts

The following scripts exist in the `src` folder. Type

    python {script-name}.py -h
    
to see how to use the script (arguments, flags, default values, ...). The code is self-documenting as far as possible but has been amended with some helpful comments where necessary. 

The most important scripts are:

* **`create_dataset.py`**: precompute audio features (MFCC, Mel-Spectrogram or Power-Spectrograms) of a corpus
* **`create_ls_corpus.py`**: (Re-)create the LibriSpeech corpus from raw data
* **`create_rl_corpus.py`**: (Re-)create the ReadyLingua corpus from raw data
* **`e2e_demo.py`**: (Re-)create the HTML- and JS-files needed to demonstrate the result of the processing pipeline (end-to-end). This works for both corpus entries or arbitrary combinations of audio/transcript.
* **`test_brnn.py`**: Evaluate a trained model by loading it from disk, making predictions and measuring some metrics for data that was not seen during training (test-set)
* **`train_brnn.py`**: Train the model used for the ASR stage (simplified DeepSpeech model)
* **`train_poc.py`**: Train the PoC (simple unidirectional RNN) with different features 
* **`vad_demo.py`**: Explore the VAD stage by splitting an audio file into speech segments (either using WebRTC or detecting silent intervals)