# Week 6

__Goals for this week__

We will talk about best practices for deep learning development - how to use cloud for training, how to manage your projects and how to gradually improve your model. Hopefully these will help you with your future projects.

__Feedback__

This lab is a work in progress. If you notice a mistake, notify us or you can even make a pull request. Also please fill the [questionnaire](https://forms.gle/r27nBAvnMC7jbjJ58) after you finish this lab to give us feedback.


## Project reminder

- 9th/10th week is project checkpoint - data analysis, baseline model trained
- 11th/12th week is project final submission - completed experiments, thorough documentation of your work


## Google Cloud Platform

Hardware accelerators, such as GPUs, are necessary for deep learning projects. Not all of you have GPU time available. Instead, we partner up with Google to provide you with cloud-based GPU computing. Check [our tutorial](../project/gcp.ipynb) to see how to setup your own GCP-based virtual machine with GPU. This should help you with your compute requirements, but we are ready to provide you with additioanl resources, if you go through the credit we gave you.

## Training in cloud

In this section we provide some tips on how to approach development and training of your model in the cloud. GPU time is expensive, so it does make sense to be careful with what you deploy and how you deploy it.

### 1. Use Docker

Docker is a great way of having unified environment on your personal computers and in the cloud as well. TensorFlow has many different tags in their [DockerHub repository](https://hub.docker.com/r/tensorflow/tensorflow/). `tensorflow:2.0.0-gpu-py3` is a repository with newest. `gpu` grants you GPU support and `py3` makes it run on Python 3.6. The docker image we use during our labs is derived from these.

The training can take several hours, you want to have your docker image running even when you do not have it open in your terminal. You can keep your notebook running in the background by pressing `Ctrl+P` and `Ctrl+Q` and then come back to it via `docker attach`. This [Microsoft tutorial](https://blogs.technet.microsoft.com/machinelearning/2018/03/15/demystifying-docker-for-data-scientists-a-docker-tutorial-for-your-deep-learning-projects/) shows how to use Docker for machine learning projects.

If you want to have more control, e.g. add additional libraries, create your own `Dockerfile`. To create your own check out [this tutorial](https://rominirani.com/docker-tutorial-series-writing-a-dockerfile-ce5746617cd) by Romin Irani, or check how does the Dockerfile look like in the repository of this course (`backstage/docker/Dockerfile`).
 
### 2. Monitor GPU usage

TensorFlow code can run on GPU by default, you do not need to change anything in your code. If you want to run it from docker, you need to have `nvidia-docker` installed, otherwise you do not see the GPU from the container. You can see [TensorFlow GPU guide](https://www.tensorflow.org/guide/gpu) for more information.

After you start your training, you can check how much is your GPU used using tool called `nvidia-smi`. `GPU-Util` is the information you are most interested in. It tells you how much do you use your GPU. If you have a small usage, e.g. below 20%, it is possible that you keep needlessly waiting for your data pre-processing. In that case you should optimize your code.

### 3. Project folder structure

Below is a possible project structure. It is inspired by [Cookiecutter Data Science project structure](https://drivendata.github.io/cookiecutter-data-science).

<br/><br/>

```
├── .gitignore          <- You usually don't want to push your data, logs or models to your repo
├── README.md
│
├── data
│   ├── processed       <- The data prepared to be fed into your model
│   └── raw             <- The original data you got
│
├── docker
│   ├── Dockerfile      <- Dockerfile to build the image for your project
│   └── setup           <- Additional files needed for Dockerfile
│
├── logs                <- Saved evaluation results
│
├── models              <- Saved models
│
├── notebooks           <- Jupyter notebooks for data analysis and model interaction.
│
└── src
    │
    ├── data            <- Scripts that load your data, e.g. tf.data pipeline
    │   └── load_data.py
    │
    └── models         
        ├── model.py    <- Your model definition
        ├── predict.py  <- Makes prediction with trained model on new data
        └── train.py    <- Training loop

```
<br/><br/>

Note that you should have your model definition, training loops and data processing pipeline in normal Python scripts. Jupyter notebook is a great tool for communication and data visualization, but I would not recommend to implement your whole project in it. This usually results in a so called _Big Ass Script_ architecture. This is considered to be a development anti-pattern. 
 
## Project management

In this section we provide several helpful tips on how to approach managing deep learning projects. Projects like these are quite specific, mainly because of the high computational requirements and experimental nature of machine learning. One experimental run can take hours of GPU time even for smaller projects. Some state-of-the-art projects can take even months of GPU time to complete. It costs yout time and if you compute in cloud, it also costs you money. For this reason we want to __make every run count__ and we want to avoid redoing any calculations.

Some of the tips here might seem like an additional work, but they will very likely save you a lot of time down the road. They are also generally considered to be good practices and you should use in any other machine leaning project in the future.

### 4. Log your evaluation results

You should regularly measure various metrics and keep these records for future reference. These will help you compare different runs and they will also help you with early bug diagnosis.

__What should you log?__ You should minimally record your _loss_ value and your _main evaluation metric_ value for both _train_ and _validation_ set. You can also measure some additional evaluation metrics (e.g. per class precision and recall for classification) and memory (how much RAM is taken) and time (how long does one batch takes) requirements.

__How often should you log?__ Most often you evaluate after each epoch. In some cases we have a really big dataset even just one epoch can take several hours. In that case we usually want to have some information about the training earlier and more often than once in an epoch. We can simply log our metrics every X steps instead of at the end of an epoch.

__How to calculate the training set metrics?__ Calculating the metrics for the whole training set is usually quite costly, these datasets are much bigger than validation sets. You can either (1) calculate these results only for a subset of training set or (2) calculate these metrics as you train and aggregate them at the end.

__How to implement logging?__ You can use TensorBoard. It provides a convenient API for logging your results and you get cool visualizations for free. `Keras` callbacks (seen in Week 4 lab) provide basic functionality, but you can also log custom metrics, [check their tutorial](https://www.tensorflow.org/tensorboard/scalars_and_keras). If you use TensorBoard, make sure you create a logical system for naming your runs so you can identify them later.

### 5. Save your models

You should regularly save your models - its parameters - so you do not lose your progress. You can restore your model anytime and continue with training or run additional evaluation on it.

__How often should you save your model?__ You can just save your model anytime you evaluate.

__How many snapshots should you keep?__ You can keep all the snapshots, but this approach can fill your HDD quite easily for bigger models. In that case you can as a bare minimum keep only your last snapshot and your best performing snapshot.

__How to implement saving?__ TensorFlow provides a convenient API for model saving and restoring. [Check their tutorial](https://www.tensorflow.org/tutorials/keras/save_and_load#manually_save_weights).

### 6. Hyperparameter management

__Reminder about hyperparameter tuning:__ Random search is generally the safest bet, however it still might be too expensive for the resources you have available (depending on how difficult your project is training-wise). Look at the hyperparameters other people are using. You can tune them manually at first and then experiment at least with the most important hyperparameters - learning rate, batch size, optimizer and then perhaps also additional architecture parameters (e.g. network depth, hidden layer size).

You should make a reasonable API for your hyperparameters, __do not rewrite them manually in a file all the time.__ Instead it is a good practice to be able to start the training via command line with a command, e.g.

```
python train.py --learning_rate 0.003 --batch_size 8
```

You can use `argparse` Python module to easily parse arguments like these ([tutorial](https://docs.python.org/3/howto/argparse.html#id1)). However, you should have your default hyperparameter values stored in a config file, e.g. a YAML file:

```
learning_rate: 0.01
batch_size: 16
num_hidden_layers: 4
```

And then parse it with `pyyaml` library:

```
import yaml

hparams = yaml.safe_load('hparams.yaml')
```

Consider using Tensoboard Hparams extension, that lets you log the hyperparameters for current run and then visualize the results. You can check [TensorBoard](https://www.tensorflow.org/tensorboard/hyperparameter_tuning_with_hparams) documentation or see the code from Week 5 lab.


### 7. Keep experiment notes

Try keeping logs about the experiments you were running. You have results and hyperparameters logged in various files, but you should also write down your findings and thoughts, e.g. when you found out that some hyperparameter seems to be very sensitive or when you found out that some technique seems to be beneficial for your experiment.

### 8. Early stopping

You should stop training when you detect that the run does not improve anymore. This technique is called _early stopping_ and it can save you lots of GPU time. You can stop training for following reasons:

1. No significant progress was done in previous X epochs.
2. Results are getting significantly worse.
3. Model performs worse than a baseline after certain number of epochs. Some hyperparameters (such as learning rate or batch size) will make the training slower, make sure you do not penalize them.

This technique is recommended only after you get to know your model and how it behaves on the task. Otherwise, if you use early stopping too liberally, you can stop runs that could have achieved interesting results.

## Growing your model

There is a lot of devicions you need to make when you start a deep learning project. You have to design a model, choose a optimizer algorithm, propose data representation, pick an evaluation metrics, choose some form of regularization, etc. The sheer number of decisions can be overwhelming. Even if you pick some, it is hard to tell, what to change if you fail to train your model.

Another problem with developing deep learning models is that they can fail silently. The fact that no exception was raised during the training does not mean that the training is done correctly. You can feed wrong data in wrong format, you can fail to calculate your loss or minimize it properly, you can miscalculate you evaluation metrics, etc. In this section we mention few tricks that can help you with these problems.

### 9. Build infrastructure first

Before you start experimenting with your problem, make sure you have the code infrastructure ready. Data processing pipeline, evaluation metrics calculation, results logging, model saving, hyperparameter management, etc. It is recommended to have this infrastructure set up before you start your training. This backbone can save you tremendous amount of time later on that you would spend on debugging and redoing calculations.

### 10. From train to test

#### Starting with few batches
When I implement a model, before running a full-blown experiment, I like to make sure that it at least seems to be able of learning. For this I try to __overfit it on a very small number (tens) of training samples.__ For this I also often use small version of model, i.e. relatively small hidden layer sizes etc. If the model is not able to get a good performance or decrease loss on a small number of samples, it won't work on full training set either. It usually means that there is a logical error in implementation that should be addressed. This preliminary test serves as a kind of sanity check and can be done locally on your personal computers.

#### Training with full training set
After you establish that your model works, you can normally train it on your training dataset using GPUs. You should worry first about your __performance on the training set__. If you are not able to achieve good performance on your training data, you have no chance of achieving it on test data. Bad training set results might indicate that your model is too small (try making it bigger), it is not trained properly (try changing hyperparameters), or even that you do not feed it with data correctly (try checking what comes in and what comes out). You can assess what are good results by looking what other people get with similar data or on similar tasks.

#### Improving results on test set
After you make sure that you can fit training data, you can start worrying about __how well does your model generalize on test set__. If your test set lags significantly behind your training set, you are probably overfitting, i.e. you are memorizing training data and the knowledge you have learned does not work on previously unseen data. You can solve this problem by using more training data, using data augmentation techniques on existing data, or by employing regularization techniques.

#### Summary

1. __Several training samples__ - Is the model working? Solve with debugging and code review.
2. __Training set__ - Does it have enough capacity? Solve by making the model bigger and playing with optimization hyperparameters.
3. __Testing set__ - Can it generalize well? Solve by adding more data (including using data augmentation) or by using regularization.

### 11. From simple to complex

You should start with a very simple model to establish a baseline you can improve. It can be a linear regression, simple convolutional network or something similar that you can code very easily. Only then start adding complexity, making the model bigger and adding additional parts.

## Further Reading

- Andrej Karpathy, a Slovak that currently leads the development of Tesla's autonomous driving, has a nice [blog](http://karpathy.github.io/2019/04/25/recipe/) about neural network development, where he discusses how to debug and grow your models.
- DataDriven published [their version](https://drivendata.github.io/cookiecutter-data-science) of how should a data science working directory looks like.
- [Third course from Coursera's Deep Learning specialization](https://www.youtube.com/playlist?list=PLkDaE6sCZn6E7jZ9sN_xHwSHOdjUxUW_b) by Andrew Ng deals with how you should manage your machine learning projects.
- Han Xiao has a [nice blog](https://hanxiao.github.io/2017/12/21/Use-HParams-and-YAML-to-Better-Manage-Hyperparameters-in-Tensorflow/) about using YAML for hyperparameter management.
