### Confusion matrix
A table which shows the performance of a machine learning model with respect to different categories. 


* $Precision = \frac{TP}{Predicted Positives}$

* $Recall(Sensitivity) = \frac{TP}{Actual Positives}$
 
* $f1 = \frac{2* PR}{P+R}$

* $TPR(Sensitivity/Recall) = \frac{TP}{Actual Positives}$
 
* $TNR(Specificity) = \frac{TN}{Actual Negatives}$

* $FPR(1-Sensitivity) = \frac{FP}{Actual Negatives}$

* $FNR(1-Specificity) = \frac{FN}{Actual Positives}$



`f beta measure` : When we care more about minimizing false positives than minimizing false negatives, we would want to select a beta value of < 1 for the F-beta score. In other words, precision would be given more weight than recall in this scenario. 

\begin{align}
F_\beta = (1 + \beta^2) \cdot \frac{\text{Precision} \cdot \text{Recall}}{\beta^2 \cdot \text{Precision} + \text{Recall}}
\end{align}


### Multi-Class Confusion Matrix
| Predicted\Actual | Apple | Orange | Mango |
|-------------------|-------|--------|-------|
| Apple             |   7   |   8    |   9   |
| Orange            |   1   |   2    |   3   |
| Mango             |   3   |   2    |   1   |

* In the case of Apple, TP = 7,FP=17,FN=4,TN=8(other than the respective column/row sum)

 * `micro-averaged -score`: It is calculated by considering the total TP, total FP and total FN of the model. It does not consider each class individually, It calculates the metrics globally.
 
 * `macro-averaged -score`: It calculates metrics for each class individually and then takes unweighted mean of the measures. 
 
 * `weighted-averaged -score`: Unlike Macro F1, it takes a weighted mean of the measures. The weights for each class are the total number of samples of that class
 

### AUC & ROC

* Particularly useful when assessing the trade-off between true positive rate and false positive rate across different classification thresholds.
* An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds. This curve plots two parameters TPR & FPR.
* An ROC curve plots TPR vs. FPR at different classification thresholds. Lowering the classification threshold classifies more items as positive, thus increasing both False Positives and True Positives.
* To compute the points in an ROC curve, we could evaluate a logistic regression model many times with different classification thresholds, but this would be inefficient. Fortunately, there's an efficient, sorting-based algorithm that can provide this information for us, called AUC.

![image.png](attachment:image.png)

* AUC stands for "Area under the ROC Curve." That is, AUC measures the entire two-dimensional area underneath the entire ROC curve (think integral calculus) from (0,0) to (1,1).
* AUC provides an aggregate measure of performance across all possible classification thresholds. One way of interpreting AUC is as the probability that the model ranks a random positive example more highly than a random negative example. 
* AUC ranges in value from 0 to 1. A model whose predictions are 100% wrong has an AUC of 0.0; one whose predictions are 100% correct has an AUC of 1.0.
* AUC is desirable for the following two reasons:

  * AUC is scale-invariant. It measures how well predictions are ranked, rather than their absolute values.
  * AUC is classification-threshold-invariant. It measures the quality of the model's predictions irrespective of what classification threshold is chosen.
  
* However, both these reasons come with caveats, which may limit the usefulness of AUC in certain use cases:
  * Scale invariance is not always desirable. For example, sometimes we really do need well calibrated probability outputs, and AUC won’t tell us about that.
  * Classification-threshold invariance is not always desirable. In cases where there are wide disparities in the cost of false negatives vs. false positives, it may be critical to minimize one type of classification error. For example, when doing email spam detection, you likely want to prioritize minimizing false positives (even if that results in a significant increase of false negatives). AUC isn't a useful metric for this type of optimization.