# Why Human Level Performance?

Progress of improving a learning algorithm is pretty rapid on the way to human level performance, but often it slows down when we exceed human level performance. Many times the performance of the model will not surpass a theoretical limit. We call this the **Bayes Theoretical Error** - which is the best possible error.

Why does the progress slow down? This occurs frequently for tasks with the Bayes Theoretical Error that is close to the human level performance. 

So long as the machine learning algorithm is worse than humans, we can still get labelled data. We can gain analysis from our model's bias/variance too. There are a variety of tactics to improve the model to human level performance. This is especially for tasks that humans do well.

# Avoidable Bias

Sometimes we do not want to do too well on our training set. Suppose that the human level error for the task of cat classification is **1%**. If our model achieves only **8%**, maybe we would want to improve the algorithm. i.e. we would want to reduce bias.

Now let's suppose that human level error is actually **7.5%**. Now we would actually want to reduce the **variance** for our model.

Think of **human-level error** as an estimate for the **Bayes error**. This is an especially good comparison for image recognition. 

Depending on what we think is achievable, with the same training error and dev error, we decide to focus on different tactics to reduce either bias or variance.

We can call the difference between Bayes error and human level error as the **"avoidable bias"**.

# Understanding Human Level Performance

Recall that the purpose of human level performance is to estimate the Bayes error. Suppose we have an example for medical image classification. Now suppose that we have:

1. Typical human with 3% error
2. Typical doctor with 1% error
3. Experienced doctor with 0.7% error
4. Team of experienced doctor with 0.5% error

**What is the human level performance here?**

Given that a team of experienced doctor can debate and discuss it to achieve 0.5% error, by definition the theoretically optimal limit is lower than 0.5%.

However, this is not a hard and fast rule. It depends how well our training set performs. If we are at 5% training error, then maybe we should use a typical human's 3% error as the estimate for Bayes error. However, if we are already at 0.7% training error, we should take 0.5% as the Bayes error instead of 0.7%. 

In summary, we are comparing training error to the Bayes error instead of 0%.

# Surpassing Human-level Performance

Suppose we have the following example:

<table>
    <tr>
        <th>Agent</th>
        <th>Error</th>
    </tr>
    <tr>
        <td>Team of Humans</td>
        <td>0.5%</td>
    </tr>
    <tr>
        <td>One Human</td>
        <td>1%</td>
    </tr>
    <tr>
        <td>Training Error</td>
        <td>0.6%</td>
    </tr>
    <tr>
        <td>Dev Error</td>
        <td>0.8%</td>
    </tr>
</table>

Observe that since our training error is performing much better than the error of one human, we should take a team of humans as our Bayes error approximate. The avoidable bias is then 0.6% - 0.5% = 0.1% and the variance is 0.8% - 0.6% = 0.2%.

Suppose we have another performance evaluation table:

<table>
    <tr>
        <th>Agent</th>
        <th>Error</th>
    </tr>
    <tr>
        <td>Team of Humans</td>
        <td>0.5%</td>
    </tr>
    <tr>
        <td>One Human</td>
        <td>1%</td>
    </tr>
    <tr>
        <td>Training Error</td>
        <td>0.3%</td>
    </tr>
    <tr>
        <td>Dev Error</td>
        <td>0.4%</td>
    </tr>
</table>

Now how do we evaluate? We do not have enough information to tell if we should reduce bias/variance. Once the algorithm outperforms the team of humans, it is even more difficult for human intuition to deduce how to improve the algorithm further. Some of the tools that we have to point us in a certain direction will not work as well as before.

Here are some examples that already surpasses human level performance:

1. Online advertising
2. Product recommendations
3. Logistics (transit time)
4. Loan approvals

Notice that all these examples are learned from **structured data**. These are not natural perception problems. Humans tend to be better at natural perception tasks.

# Improving Model Performance

Let's put it all together to have a recipe to improve our learning algorithm. Here are two fundamental assumptions of supervised learning:

1. We can fit the training set pretty well, i.e. low avoidable bias
2. The training set performance generalizes pretty well to dev/test set

So to improve the system,

1. Look at difference between training and human level error - avoidable bias
2. Look at the difference between the training error and the dev error - to reduce variance

For step **1** we can do things like:
1. Train a bigger model
2. Train longer/use a better algorithm
3. Find a better hyperparameters or NN architecture

For step **2**, we can:
1. Get more data - generalize better to dev set data that the algorithm did not see
2. Regularization - L2/Dropouts/Data Augmentation
3. New neural network architectures/hyperparameters