# Unit 3 Classification Metrics


Classification metrics help us evaluate model performance on classification tasks like predicting spam emails or diagnosing diseases. They help us determine if our model performs well. By the end, you'll understand:

* **Confusion Matrix** and its interpretation.
* **Accuracy**, **Precision**, and **Recall**.
* **F1-score**
* How to compute these metrics using **Python** and **SciKit Learn**.

Let's dive in!

## Confusion Matrix

A **Confusion Matrix** describes the performance of a classification model. In the context of a confusion matrix, a positive prediction is predicting the class labeled `1`, and a negative prediction is predicting the class labeled `0`. The confusion matrix is a 2x2 table (for binary classification) that shows:

* **True Positives (TP):** The number of **correct** positive predictions.
* **True Negatives (TN):** The number of **correct** negative predictions.
* **False Positives (FP):** The number of **incorrect** positive predictions.
* **False Negatives (FN):** The number of **incorrect** negative predictions.

Imagine that you need to classify emails as spam (1) or not-spam (0). Let's define an example of predictions and then create our confusion matrix:

```python
import numpy as np
from sklearn.metrics import confusion_matrix

# Sample classification dataset
y_true = np.array([0, 1, 0, 1, 0, 1, 1, 0, 1, 0])  # True labels
y_pred = np.array([1, 1, 1, 1, 0, 0, 1, 0, 1, 0])  # Predicted labels

# Calculating confusion matrix
conf_matrix = confusion_matrix(y_true, y_pred)
print(f"Confusion Matrix:\n{conf_matrix}")
````

**Output:**

```
Confusion Matrix:
[[3 2]
 [1 4]]
```

This tells us:

  * **True Positives (TP):** 4 (model correctly predicted spam four times)
  * **True Negatives (TN):** 3 (model correctly predicted not spam three times)
  * **False Positives (FP):** 2 (model incorrectly predicted spam two times)
  * **False Negatives (FN):** 1 (model incorrectly predicted not spam one time)

Note that the values in the confusion matrix are stored this way:

```
[[TN FP]
 [FN TP]]
```

## What is Accuracy?

**Accuracy** is the ratio of correctly predicted instances out of all instances. It's useful but can be misleading for imbalanced datasets.

**Formula:** $Accuracy = \\frac{TP + TN}{TP + TN + FP + FN}$

Let's compute accuracy using **SciKit Learn**:

```python
from sklearn.metrics import accuracy_score

# Calculating accuracy
accuracy = accuracy_score(y_true, y_pred)
print(f"Accuracy: {accuracy}")
```

**Output:**

```
Accuracy: 0.7
```

Our model is 70% accurate. But sometimes accuracy alone isn't enough. Accuracy can be deceptive in imbalanced datasets where one class significantly outnumbers the other. For example, if 95% of emails are not spam and only 5% are spam, a model that never classifies any email as spam would still be 95% accurate. Thus, accuracy doesn't always reflect the real performance on minority classes.

In such cases of imbalanced data, we need to use other metrics. Let's look at the other options: the **Precision** and **Recall** metrics.

## What is Precision?

**Precision** is the ratio of correctly predicted positive cases out of all predicted positives. It's crucial when false positives are costly (e.g., spam detection).

**Formula:** $Precision = \\frac{TP}{TP + FP}$

Here's how to calculate precision:

```python
from sklearn.metrics import precision_score

# Calculating precision
precision = precision_score(y_true, y_pred)
print(f"Precision: {precision}")
```

**Output:**

```
Precision: 0.6666666666666666
```

Approximately 67% of instances predicted as spam were actually spam.

Use precision when the cost of false positives is high. This metric is crucial in scenarios where the consequences of incorrectly predicting a positive are significant. **Example:** In spam detection, marking an important email as spam (a false positive) can result in the user missing critical information. Therefore, we prioritize obtaining a high precision to minimize false positives.

## What is Recall?

**Recall** is the ratio of correctly predicted positive cases out of all actual positives. It's essential when false negatives are costly (e.g., disease detection).

**Formula:** $Recall = \\frac{TP}{TP + FN}$

Let's compute recall:

```python
from sklearn.metrics import recall_score

# Calculating recall
recall = recall_score(y_true, y_pred)
print(f"Recall: {recall}")
```

**Output:**

```
Recall: 0.8
```

80% of actual spam emails were correctly predicted as spam.

Use recall when the cost of false negatives is high. This metric is essential in situations where missing actual positive cases is more detrimental than having false positives. **Example:** In disease diagnosis, failing to identify a disease (a false negative) can have severe consequences on patient health. In such cases, we aim for high recall to ensure as many actual positive cases as possible are correctly identified.

## F1-score

Sometimes you want to pay attention to both Precision and Recall, finding an optimal balance between them. In these cases, we use the **F1-Score** metric.

**F1-Score** is the harmonic mean of Precision and Recall. It balances the two metrics to provide a single measure of a model's performance.

**Formula:** $F1-Score = 2 \\times \\frac{Precision \\times Recall}{Precision + Recall}$

The F1-Score is high only if both Precision and Recall are high. It's particularly useful for imbalanced datasets where a high score for one metric might be misleading without considering the other.

Here's how to calculate the F1-Score:

```python
from sklearn.metrics import f1_score

# Calculating F1-Score
f1 = f1_score(y_true, y_pred)
print(f"F1-Score: {f1}")
```

**Output:**

```
F1-Score: 0.7272727272727273
```

An F1-Score of approximately 0.73 indicates a good balance between Precision and Recall, offering a more comprehensive measure of the model's performance in scenarios where both false positives and false negatives are important.

## Lesson Summary

We've covered:

  * **Confusion Matrix:** Breakdown of predictions.
  * **Accuracy:** Ratio of correct predictions.
  * **Precision:** Correct predictions out of all positive predictions.
  * **Recall:** Correct predictions out of all actual positives.
  * **F1-score:** Combination of Precision and Recall.
  * The pitfalls of using **Accuracy** with imbalanced datasets.

These metrics help evaluate different aspects of your model's performance. Now it's your turn\! You'll compute classification metrics on new datasets, reinforcing your understanding. Ready to practice? Let's go\!


## Confusion Matrix Values in Spam Classification

Stellar work, Space Explorer! Let’s take it a step further. Complete the TODO to finish the classification code and print the confusion matrix values. This will help you understand how to extract specific values from the confusion matrix

May cosmic knowledge guide you!

```python
import numpy as np
from sklearn.metrics import confusion_matrix

# Sample classification dataset
y_true = np.array([0, 1, 0, 1, 0, 1, 1, 0, 1, 0])
y_pred = np.array([1, 1, 0, 1, 0, 0, 1, 0, 1, 0])

# Calculating confusion matrix
conf_matrix = confusion_matrix(y_true, y_pred)

# TODO: Calculate and assign TP, FP, TN, and FN

print(f"TP: {TP}, FP: {FP}, TN: {TN}, FN: {FN}")

```

Alright, Space Explorer\! Let's extract those confusion matrix values.

```python
import numpy as np
from sklearn.metrics import confusion_matrix

# Sample classification dataset
y_true = np.array([0, 1, 0, 1, 0, 1, 1, 0, 1, 0])
y_pred = np.array([1, 1, 0, 1, 0, 0, 1, 0, 1, 0])

# Calculating confusion matrix
conf_matrix = confusion_matrix(y_true, y_pred)

# TODO: Calculate and assign TP, FP, TN, and FN
# The confusion matrix is structured as:
# [[TN, FP],
#  [FN, TP]]

TN = conf_matrix[0, 0]
FP = conf_matrix[0, 1]
FN = conf_matrix[1, 0]
TP = conf_matrix[1, 1]

print(f"Confusion Matrix:\n{conf_matrix}")
print(f"TP: {TP}, FP: {FP}, TN: {TN}, FN: {FN}")
```

## Calculate Precision and Recall

Great job so far, Stellar Navigator!

Now it's time to fill in the missing pieces. Implement the missing code to calculate Precision and Recall based on the email classification results. Your goal is here is to calculate precision and recall without implemented functions, using formulas. It will help you understand better what these metrics actually mean.

Remember, Precision is the ratio of correctly predicted positive cases to all predicted positives, and Recall is the ratio of correctly predicted positive cases to all actual positives.

Happy coding!

```python
from sklearn.metrics import confusion_matrix

# Assume y_true and y_pred are given
y_true = [0, 1, 0, 1, 0, 1, 1, 0, 1, 0]  # True labels
y_pred = [1, 1, 0, 1, 0, 0, 1, 0, 1, 0]  # Predicted labels

# Use sklearn's confusion matrix
tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()

# TODO: Calculate and print Precision and Recall

```

Great job so far, Stellar Navigator\! Let's calculate Precision and Recall using the fundamental formulas.

```python
from sklearn.metrics import confusion_matrix

# Assume y_true and y_pred are given
y_true = [0, 1, 0, 1, 0, 1, 1, 0, 1, 0]  # True labels
y_pred = [1, 1, 0, 1, 0, 0, 1, 0, 1, 0]  # Predicted labels

# Use sklearn's confusion matrix to get TN, FP, FN, TP
tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()

# TODO: Calculate and print Precision and Recall

# Calculate Precision
# Formula: Precision = TP / (TP + FP)
if (tp + fp) == 0:
    precision = 0.0 # Handle division by zero if no positive predictions
else:
    precision = tp / (tp + fp)

# Calculate Recall
# Formula: Recall = TP / (TP + FN)
if (tp + fn) == 0:
    recall = 0.0 # Handle division by zero if no actual positives
else:
    recall = tp / (tp + fn)

print(f"True Positives (TP): {tp}")
print(f"False Positives (FP): {fp}")
print(f"False Negatives (FN): {fn}")
print(f"True Negatives (TN): {tn}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
```

## Calculate Accuracy for Email Classification

Cosmo
Just now
Read message aloud
Hey Space Explorer!

Let's explore a scenario with an unbalanced dataset. Modify the given code to calculate and print the precision and recall instead of the accuracy score. This will help you understand that while accuracy might be high, the model can still perform poorly in identifying both classes correctly.

Go ahead and make the change!

```python
import numpy as np
from sklearn.metrics import accuracy_score

# Email classification labels: 1 is spam, 0 is not spam
y_true = np.array([0, 0, 0, 0, 0, 1, 0, 0, 0, 1])  # Actual labels
y_pred = np.array([0, 0, 1, 0, 0, 0, 0, 0, 0, 0])  # Predicted labels (dumb model)

# Calculate accuracy for the email classification model
accuracy = accuracy_score(y_true, y_pred)

print(f"Accuracy: {accuracy}")

```

Hey Space Explorer\!

You're absolutely right; with unbalanced datasets, accuracy can be very misleading. Let's modify the code to calculate and print precision and recall instead, to get a clearer picture of our "dumb model's" performance.

```python
import numpy as np
from sklearn.metrics import precision_score, recall_score, confusion_matrix

# Email classification labels: 1 is spam, 0 is not spam
y_true = np.array([0, 0, 0, 0, 0, 1, 0, 0, 0, 1])  # Actual labels
y_pred = np.array([0, 0, 1, 0, 0, 0, 0, 0, 0, 0])  # Predicted labels (dumb model)

# Calculate precision for the email classification model
# Setting zero_division=0 to handle cases where there are no true positive predictions
precision = precision_score(y_true, y_pred, zero_division=0)

# Calculate recall for the email classification model
recall = recall_score(y_true, y_pred, zero_division=0)

print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")

# Optional: Print Confusion Matrix to see the breakdown
# tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
# print(f"Confusion Matrix:\n{confusion_matrix(y_true, y_pred)}")
# print(f"TP: {tp}, FP: {fp}, TN: {tn}, FN: {fn}")
```

### Explanation of the Output:

When you run this code, you'll observe:

  * **Precision:** This will likely be low (0.0 in this specific case). Even though our "dumb model" correctly identifies most of the "not spam" emails (0s), it makes a False Positive (predicts spam when it's not) and misses all actual spam emails, leading to a low precision for the positive class (spam). In this example, the model predicts '1' (spam) only once, and that prediction is incorrect (actual label is '0'). Hence, TP is 0 and FP is 1, leading to Precision = 0 / (0 + 1) = 0.
  * **Recall:** This will also be low (0.0 in this specific case). The model fails to identify *any* of the actual spam emails (the `1`s in `y_true`). Since our model never correctly predicts a positive, its recall will be 0. Recall = TP / (TP + FN) = 0 / (0 + 2) = 0.

This clearly demonstrates why just looking at accuracy can be deceptive with unbalanced datasets. Our model has high accuracy (80% if you were to calculate it) because it correctly predicts the majority class (not spam) most of the time. However, it completely fails at its core task of identifying the minority class (spam), as evidenced by the zero precision and recall. This is a crucial lesson in evaluating classification models\!

## Calculate Classification Metrics

Greetings, Space Voyager!

You're doing great. Now it's your turn to fill in the blanks and calculate the metrics that show the model's performance. Complete the TODO to compute the missing confusion matrix, accuracy, precision, and recall.

May the stars guide you!

```python
# TODO: import necessary functions

# True labels and predicted labels for spam classification
y_true = [1, 0, 1, 0, 1, 1, 0, 0, 0, 1]
y_pred = [1, 0, 1, 0, 0, 1, 0, 0, 0, 1]

# TODO: Calculate and print the Confusion Matrix

# TODO: Calculate and print the accuracy, precision, and recall

```

```python
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score

# True labels and predicted labels for spam classification
y_true = [1, 0, 1, 0, 1, 1, 0, 0, 0, 1]
y_pred = [1, 0, 1, 0, 0, 1, 0, 0, 0, 1]

# Calculate and print the Confusion Matrix
cm = confusion_matrix(y_true, y_pred)
print("Confusion Matrix:")
print(cm)

# Calculate and print the accuracy, precision, and recall
accuracy = accuracy_score(y_true, y_pred)
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)

print(f"\nAccuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
```

## Compute Confusion Matrix and F1-Score

Good job learning classification metrics, Space Voyager!

Let's compute the F1-score for our spam email classifier. Replace the precision and recall calculation with the necessary code to compute the F1-score, representing both metrics with one value.

May your analytics soar through the cosmos!

```python
import numpy as np
from sklearn.metrics import confusion_matrix, precision_score, recall_score

# True labels and predicted labels
y_true = np.array([0, 1, 0, 1, 0, 1, 1, 0, 1, 0])
y_pred = np.array([1, 1, 1, 1, 0, 0, 1, 0, 1, 0])

# Compute the confusion matrix
conf_matrix = confusion_matrix(y_true, y_pred)
print(f"Confusion Matrix:\n{conf_matrix}")

# TODO: Replace the precision and recall calculation with F1-score
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
print(f"Precision: {precision}")
print(f"Recall: {recall}")

```

```python
import numpy as np
from sklearn.metrics import confusion_matrix, f1_score

# True labels and predicted labels
y_true = np.array([0, 1, 0, 1, 0, 1, 1, 0, 1, 0])
y_pred = np.array([1, 1, 1, 1, 0, 0, 1, 0, 1, 0])

# Compute the confusion matrix
conf_matrix = confusion_matrix(y_true, y_pred)
print(f"Confusion Matrix:\n{conf_matrix}")

# Replace the precision and recall calculation with F1-score
f1 = f1_score(y_true, y_pred)
print(f"F1-score: {f1}")
```