## Confusion Matrix

- Karl Pearson created the confusion matrix in 1904, when it was first known as a contingency table. It was later referred to as a classification matrix, before being referred to as a confusion matrix in data science.
- The word "confusion" refers to confusion that can occur on a specific metric to be prioritized while attempting to improve the model, although several metrics can be obtained from the confusion matrix.

The **confusion matrix** is a square matrix of size *N × N*, where N denotes the number of output classes.
Each row of the matrix represents the number of instances of a predicted class and each column represents the number of instances of the actual class.

In [None]:
import pandas as pd
from tabulate import tabulate

confusion_matrix = [
    ["", "Predicted Positive", "Predicted Negative"],
    ["Actual Positive", "True Positive (TP)", "False Negative (Type II error, FN)"],
    ["Actual Negative", "False Positive (Type I error, FP)", "True Negative (TN)"]
]

print(tabulate(confusion_matrix, headers="firstrow", tablefmt="grid"))


+-----------------+-----------------------------------+------------------------------------+
|                 | Predicted Positive                | Predicted Negative                 |
| Actual Positive | True Positive (TP)                | False Negative (Type II error, FN) |
+-----------------+-----------------------------------+------------------------------------+
| Actual Negative | False Positive (Type I error, FP) | True Negative (TN)                 |
+-----------------+-----------------------------------+------------------------------------+


There are two types of prediction: correct and incorrect (errors). 
* **True Positive (TP)**: Both the actual values and the prediction are positive. 
* **False Positive (FP)**: Although the prediction is positive, the actual value is negative. This is called a "Type I error”.
* **True Negative (TN)**: The actual value is negative, and the prediction is negative.
* **False Negative (FN)**: Although predicted to be negative, the sample is positive. This is also called "Type II error”.

### 📊 Metrics Derived from the Confusion Matrix

The confusion matrix is not only a table but also the foundation for a whole family of evaluation metrics and visualizations in classification problems.

#### Basic Metrics

In [None]:
from IPython.display import display, Math

formulas = [
    r"Accuracy = \frac{TP + TN}{TP + TN + FP + FN}",
    r"Error\ Rate = \frac{FP + FN}{TP + TN + FP + FN}"
]

for f in formulas:
    display(Math(f))


<IPython.core.display.Math object>

<IPython.core.display.Math object>

#### Class-wise Metrics

In [None]:
from IPython.display import display, Math

formulas = [
    r"Precision \ (Positive \ Predictive \ Value) = \frac{TP}{TP + FP}",
    r"Recall \ (Sensitivity, \ True \ Positive \ Rate) = \frac{TP}{TP + FN}",
    r"Specificity \ (True \ Negative \ Rate) = \frac{TN}{TN + FP}",
    r"False \ Positive \ Rate \ (FPR) = \frac{FP}{FP + TN}",
    r"False \ Negative \ Rate \ (FNR) = \frac{FN}{FN + TP}"
]

for f in formulas:
    display(Math(f))


<IPython.core.display.Math object>

<IPython.core.display.Math object>

<IPython.core.display.Math object>

<IPython.core.display.Math object>

<IPython.core.display.Math object>

#### Balanced Metrics

In [None]:
from IPython.display import display, Math

formulas = [
    # F1 Score
    r"F1\text{-}Score = \frac{2 \times (Precision \times Recall)}{Precision + Recall}",
    
    # Balanced Accuracy
    r"Balanced\ Accuracy = \frac{Sensitivity + Specificity}{2}",
    
    # G-Mean
    r"G\text{-}Mean = \sqrt{Sensitivity \times Specificity}",
    
    # Matthews Correlation Coefficient
    r"Matthews Correlation Coefficient (MCC) = \frac{TP \times TN - FP \times FN}{\sqrt{(TP+FP)(TP+FN)(TN+FP)(TN+FN)}}"
]

for f in formulas:
    display(Math(f))


<IPython.core.display.Math object>

<IPython.core.display.Math object>

<IPython.core.display.Math object>

<IPython.core.display.Math object>

#### Probabilistic Metrics (with threshold variation)

* ROC-AUC (Receiver Operating Characteristic – Area Under Curve)
    - Plots TPR (Recall) vs FPR at different thresholds.
    - Area under the curve = model’s discrimination ability.
* PR-AUC (Precision-Recall Curve – Area Under Curve)
    - Plots Precision vs Recall at different thresholds.
    - Useful for imbalanced datasets.

#### Chance-Corrected Agreement Metric

* Cohen’s Kappa
* Informedness (Bookmaker Informedness)

#### Data Distribution Metric

* Prevalence

#### Predictive Association Metric

* Markedness

#### Baseline / Benchmark Metric

* Null Error Rate