# ML Strategy

- Understand why Machine Learning strategy is important
- Apply satisficing and optimizing metrics to set up your goal for ML projects
- Choose a correct train/dev/test split of your dataset
- Understand how to define human-level performance
- Use human-level perform to define your key priorities in ML projects
- Take the correct ML Strategic decision based on observations of performances and datase

## Introduction to ML Strategy

### Why ML Strategy

There are a lot of options to try to improve ML system:
- Collect more data
- Collect more diverse training set
- Train algorithm longer with gradient descent
- Try Adam instead of gradient descent
- Try bigger network
- Try smaller network 
- Try dropout
- Add $L_2$ regularization
- Network architecture
    - Activation function
    - no. of hidden unit
    - etc.

If we choose wrongly or poorly, we can lost six months with nothing better.
Need a quick and effective way to figure out which of these idea to pursue.

### Orthogonalization

**Chain of assumtions in ML**

- Fit the training set well on cost function
    - Bigger netwrok
    - Adam
- Fit the dev set well on cost function
    - Regularization
    - Bigger training set
- Fit the test set well on cost function
    - Bigger dev set
- Perform well in real world
    - Change dev set
    - Change cost function

## Setting up your goal

### Single number evaluation metric

Need a single real number to evaluate classifiers

### Satisficing and Optimizing metric

$cost = accuracy - 0.5 \times runningTime$

$\rightarrow$ miximizing accuracy, subject to running time less than or equa 100ms.
- Optimizing metric: Accuracy
- Satisficing metric: Running time (threshold)

If we have N metrics:
- pick 1 metric to be the optimizing metric
- N - 1 metrics to be satisficing metrics

**For example:**

Wake words/Trigger words:

Alexa, OK Google, Hey Siri, Nihao Baidu.

- accuracy
- number of false positive

$\rightarrow$ maximize accuracy, s.t. less than 1 false positive every 24h.

### Train/dev/test distributions

- Dev and test set should come from same distribution
- Randomly shuffle data to dev and test set

Choose a dev set and test set to reflect data you expect to get in the future and consider important to do well on.

### Size of the dev and test sets
- Big data, the rule of 70/20/10 or 60/30/10 should be 99/0.5/0.5
- Size of the test set need to big enough to give confident in the overall performance of the system
    - Test set used in development should be called dev set

### When to change dev/test sets and metrics

Error: $\frac{1}{m_{dev}} \sum_{i=1}^{m_{dev}} \mathcal{L}\{y^{(i)}_{pred} \ne y^{(i)}\}$

The problem with this metric is that it will treat the unexpected inputs such as pornographic that has been misclassifed as a cat the same with a bird picture. This called the rank preference problem.

Way to deal with that is to add a weight that:

- $w^{(i)} = 1$ if $x^{(i)}$ is non-porn
- $w^{(i)} = 10$ if $x^{(i)}$ is porn

Weighted error: $\frac{1}{\sum w^{(i)}} \sum_{i=1}^{m_{dev}} w^{(i)} \mathcal{L}\{y^{(i)}_{pred} \ne y^{(i)}\}$

Add weight to the cost funtion J for the inputs:

$J = \frac{1}{\sum w^{(i)}} \sum_{i=1}^{m} w^{(i)} \mathcal{L}\{\hat(y^{(i)}), y^{(i)}\}$

If the algorithm run good ion dev/test set but bad in real world, need to change the dev/test sets.

## Comparing to human-level performance

### Why human-level performance?

Humans are quite good at a lot of tasks. As long as ML is worse than humans, we can:
- Get labeled data from humans
- Gain insight from manual error analysis: Why did a person get this right?
- Better analysis of bias/variance

### Avoidable bias

Error | Set A | Set B
--- | --- | ---
 **Human** $(\approx Bayes)$| 1% | 7.5%
**Training error** | 8% | 8%
**Dev error** | 10% | 10%
**Focus** | **Bias** | **Variance**

$
Avoidable \ bias = training \ error - human \ error\\
Variance = training \ error - dev \ error
$

### Understanding human-level performance

It's often harder to determine to focus on bias or variance when approaching human-level error

### Surpassing human-level performance

Problem where ML significantly surpasses human-level performance:
- Online advertising
- Product recommendations
- Logistic (predicting transit time)
- Load approvals

All of them are **structured-data** and **not nature perception problems**, **lots of data**

Some nature perception problems that ML can surpass human:

- Speech recognition 
- Some image recognition
- Medical tasks
    - ECG, skin cancer, etc.
    
### Improving your model performance

Assumptions of supervied learning:
- Fit the training set pretty well -> low avaidable bias
- Generalize pretty well for dev/test set -> low variance

To reduce avoidable bias:
- Train bigget model
- Train longer/better optimization algoritms (momentum, RMSprop, Adam)
- NN architecture/hyperparameters search (CNN, RNN)

To reduce variance:
- More data
- Regularization (L2, dropout, data argumentation)
- NN architecture/hyperparameters search (CNN, RNN)

