# Metrics

Evaluating your machine learning model is an important step.
Some see it as nothing more than looking at the accuracy, and then you're done.
However, it is a bit more complicated than that (who would have thought??)

In this notebook, you're going to see different metrics and evaluation strategies.

## Accuracy

Accuracy is a pretty simple metric.
It is the ratio of the number of correct predictions to the total number of input samples.

$$ accuracy = \frac{nb.\ correct\ predictions}{nb.\ input\ samples}$$

Assuming a binary classification problem (it also works with more classes),
using accuracy works pretty well when there is an equal number of samples of each class in the training set.
However, when there is an imbalance in the samples, for example there are 95 samples of class *A*
and 5 samples of class *B*, for a total of 100, the model can easily get **at least** 95\% accuracy.
It can just to that by always predicting class *A*, leading you to believe that the
model is doing great, even though it miss-classified all the samples of class *B* (5 samples).
This is very problematic in fields such as medicine, when trying to find a rare disease for example.

## Confusion matrix

To get a finer granularity in the predictions, one can use a confusion matrix.

Let's again assume the we have a binary classification problem, and we can predict class *A* or class *B*.
Class *A* is (arbitrarily) labeled as the positive class, and thus class *B* as the negative class.

There are 4 possible cases:

- The true output of a sample is A, we predict A --> True Positive (TP)
- The true output of a sample is B, we predict B --> True Negative (TN)
- The true output of a sample is B, we predict A --> False Positive (FP)
- The true output of a sample is A, we predict B --> False Negative (FN)

This gives us a better of what went wrong in the model. Maybe we find that the model always predicts the positive class,
and so we could maybe give the model more negative classes as input to balance it out. A lot of information can be
gathered from a simple confusion matrix.

![Confusion Matrix (Image)](./assets/confusion_matrix.png) ![Confusion Matrix Values (Image)](./assets/confusion_matrix_values.png)

In this data set, we have 100 samples, of which 51 (48 + 3) are positive, and 49 (42 + 7) are negative.
One can still get the accuracy by summing up the True Positives and the True Negatives, and dividing by the total number of samples.
We thus get $\frac{48 + 42}{100} = 90\%$.

## ROC Curve (Receiver Operating Characteristic Curve)

A ROC curve is a graph showing the performance of a classification model at different classification thresholds.

**What is a classification threshold?**

Many machine learning algorithms are able to output probabilities of belonging to a given class instead of strict class labels.
This is useful because it can tell you the certainty or the uncertainty of the prediction.

The classification threshold is the limit you set where all elements falling
on one side of the threshold are labelled as one class and the elements landing on the opposite side are labeled as the other class (in binary classification).
A common misconception is that the threshold should always be at 50\%, but the threshold is problem dependent.
Indeed, for a given problem, you might want to set the threshold to a different value. For example in spam classification,
you'd prefer having more false negatives (spam interpreted as not spam (called ham)) than false positive (ham interpreted as spam), or the reverse.

![Classification Threshold (Image)](./assets/classification_threshold.png)
[Image source](https://www.researchgate.net/publication/327847657_One-class_Quantification)

How to plot the ROC curve?
It is pretty easy.

For every classification threshold that you choose, you will plot a point of the True Positive Rate (TPR) against the
False Positive Rate (FPR), where TPR is on y-axis and FPR is on the x-axis

The *true positive rate*, also called *recall* or *sensitivity*, has the following equation (based on the terms coined in the confusion matrix section):

**TPR/Recall/Sensitivity** $= \frac{TP}{TP+FN}.$

The *false positive rate* is the opposite of *specificity*. Be careful, *sensitivity* and *specificity* are similar in name, but different in meaning.
We have

**Specificity** = $\frac{TN}{TN+FP}$

and **FPR** = $1 - \text{ specificity } = \frac{FP}{TN+FP}$.

You will rarely ever need to know the formulas by heart, this is just to give you a sense how all of this is connected.
Finally, we have a graph of a ROC curve.
The closer the graph is to the *upper left* corner, the better the model is at distinguishing between classes.
In the graph below, you also see the *No Skill* line, which shows the skill of a random model, which cannot distinguish between classes.

To get a numeric value for such a curve, which makes it easier to compare multiple models, one can use the **AUC score**, which is simply the area
under the ROC curve.
The **A**rea **U**nder **C**urve is one of the most popular evaluation metrics.
*One way of interpreting AUC is as the probability that the model ranks a random positive example more highly than a random negative example* [[Source]](https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc)

![ROC Curve (Image)](assets/roc_curve.png)

There are many more evaluation metrics, most notably **precision**, **recall** (which we've seen above), which can be graphed as a **Precision-Recall (PR) Curve.**
A popular one is the **F1-Score**, which combines precision and recall into one numerical value. Similar to this is also **Matthews Correlation Coefficient (MCC).
**.
I highly encourage you to read more about the topic and educate yourself further by looking at the additional reading material at the bottom of the notebook.

## References and more reading material

[A Gentle Introduction to Threshold-Moving for Imbalanced Classification - Machine learning mastery](https://machinelearningmastery.com/threshold-moving-for-imbalanced-classification/)

[Classification Accuracy is Not Enough: More Performance Measures You Can Use - Machine learning mastery](https://machinelearningmastery.com/classification-accuracy-is-not-enough-more-performance-measures-you-can-use/)  

[Understanding AUC - ROC Curve - Towards Data Science](https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5)

[How to Use ROC Curves and Precision-Recall Curves for Classification in Python - Machine learning mastery](https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-classification-in-python/)

[Matthews Correlation Coefficient is The Best Classification Metric You’ve Never Heard Of - towardsdatascience](https://towardsdatascience.com/the-best-classification-metric-youve-never-heard-of-the-matthews-correlation-coefficient-3bf50a2f3e9a)
