# Appendix

The purpose of this notebook is to give some orientation when exploring the code base.

## Structure

| Folder | Description |
|---|---|
| / | root folder containing some Bash scripts to train the RNN |
| assets | binary data (images, audio, etc...) used for the Jupyter Notebooks |
| demos | HTML-files to visualize the result of the alignment pipeline. |
| src | scripts and Python source filesc containing all application logic. Also, the documentation is stored here |
| test | some unit tests |
| tmp | temporary folder, e.g. needed for the VAD stage. No persistent files should be stored here as this folder might be deleted at any time by application logic! |

## Important scripts

The following scripts exist in the `src` folder. Type

    python {script-name}.py -h
    
to see how to use the script (arguments, flags, default values, ...). The code is self-documenting as far as possible but has been amended with some helpful comments where necessary. 

The most important scripts are:

* **`create_dataset.py`**: precompute audio features (MFCC, Mel-Spectrogram or Power-Spectrograms) of a corpus
* **`create_ls_corpus.py`**: (Re-)create the LibriSpeech corpus from raw data
* **`create_rl_corpus.py`**: (Re-)create the ReadyLingua corpus from raw data
* **`e2e_demo.py`**: (Re-)create the HTML- and JS-files needed to demonstrate the result of the processing pipeline (end-to-end). This works for both corpus entries or arbitrary combinations of audio/transcript.
* **`smith-waterman.py`**: Implementation of the Smith-Waterman algorithm for local sequence alignment
* **`test_brnn.py`**: Evaluate a trained model by loading it from disk, making predictions and measuring some metrics for data that was not seen during training (test-set)
* **`train_brnn.py`**: Train the model used for the ASR stage (simplified DeepSpeech model)
* **`train_poc.py`**: Train the PoC (simple unidirectional RNN) with different features 
* **`vad_demo.py`**: Explore the VAD stage by splitting an audio file into speech segments (either using WebRTC or detecting silent intervals)