# The universal workflow of machine learning

## Defining the problem and assembling a dataset

1. What will your input data be?
2. What are you trying to predict?
3. What type of problem are you facing?
    - Binary classification
    - Multiclass classification
    - Scalar regression   
    - Vector regression
    - Multiclass, multilabel classification
    - Clustering
    - Generation
    - Reinforcement learning
    - etc.

## Note: Be aware of hypotheses you make at this stage

1. You hypothesize that your outputs can be predicted given your inputs.
2. You hypothesize that your available data is sufficiently informative to learn the relationship between inputs and outputs

Not all problems can be solved; just because you’ve assembled examples
of inputs X and targets Y doesn’t mean X contains enough information to predict
Y.

Keep in mind that machine learning can only be used to memorize patterns that are present in your training data. You can only recognize what you’ve seen before. Using machine learning trained on past data to predict the future is making the assumption that the future will behave like the past. That often isn’t the case.

## Choosing a measure of success

- balanced-classification problems
    - accuracy, ROC AUC, etc.
- class-imbalanced problems
    - precision, recall, etc.
- ranking problems, multilabel classification
    - average precision
- etc.

## Deciding on an evaluation protocol
- Hold-out validation
- K-fold cross-validation
- Iterated K-fold validation

## Preparing your data

How to format your data? (assuming a deep neural network)
- data formatted as tensors
- values taken by tensors scaled to small values, ie. \[-1, 1\] or \[0, 1\]
- data normalized if values in different ranges
- feature engineering

## Developing a model that does better than a baseline

Your goal is to achieve *statistical power* : to develop a model capable of beating a dumb baseline.

Three key choices to make:
1. Last-layer activation
    - linear
    - sigmoid
    - softmax
    - etc.
2. Loss function
    - binary_crossentropy
    - mse
    - etc.
3. Optimization configuration
    - rmsprop
    - etc.

Note that there isn't always a direct way to turn a metric into a loss function.

Loss functions need to be:
- computable given mini-batch of data (as little as a single data point)
- differentiable (can't use backpropagation otherwise)

## Note: Choosing the right last-layer activation and loss function


<table style="width:100%">
  <tr>
    <th>Problem type</th>
    <th>Last-layer activation</th>
    <th>Loss function</th>
  </tr>
  <tr>
    <td>Binary classification</td>
    <td>sigmoid</td>
    <td>binary_crossentropy</td>
  </tr>
  <tr>
    <td>Multiclass, single-label classification</td>
    <td>softmax</td>
    <td>categorical_crossentropy</td>
  </tr>
  <tr>
    <td>Multiclass, multi-label classification</td>
    <td>sigmoid</td>
    <td>binary_crossentropy</td>
  </tr>
  <tr>
    <td>Regression to arbitrary values</td>
    <td>None</td>
    <td>mse</td>
  </tr>
  <tr>
    <td>Regression to values between 0 and 1</td>
    <td>sigmoid</td>
    <td>mse or binary_crossentropy</td>
  </tr>
</table>


## Scaling up: developing a model that overfits

To figure out how big a model you need, you must develop a model that overfits.

1. Add layers.
2. Make the layers bigger.
3. Train for more epochs.

Always monitor the training and validation loss, as well as any metrics you care about to know if overfitting is achieved.

## Regularizing your model and tuning your hyperparameters

- Add dropout
- Add L1 and/or L2 regularization
- Try different architectures: add / remove layers
- Try different hyperparameters
    - units per layer
    - learning rate
    - etc.
- Iterate on feature engineering

**Note**: Using feedback from your validation process to tune your model over many iterations may cause the model to overfit to the validation process.

**Note**: If the performance on the test set is significantly worse than the one on validation data, this may mean either that your **validation process wasn't reliable** or your **model had overfitted to the validation process**. (Switch to a more reliable evaluation protocol, eg. k-fold validation)