# Load Shedding Classifier Accuracy Calculation

This notebook calculates the accuracy, recall, and F1 score for a load shedding classifier. It compares the predicted load shedding values with the actual values and computes various error metrics.

## Steps:
1. Load the predicted and actual load shedding datasets.
2. Ensure the timestamps in both datasets match properly.
3. Merge the datasets on the timestamp to align actual and predicted values.
4. Extract the actual and predicted values.
5. Compute accuracy, recall, and F1 score.
6. Compute false positives and false negatives.
7. Return the computed metrics.


In [None]:
import pandas as pd
from sklearn.metrics import accuracy_score, recall_score, f1_score, confusion_matrix
from data_ingestion import read_from_CSV

# Load the datasets
loadshedding_predicted_path = "./loadshedding_pred.csv"
loadshedding_actual_path = "./loadshedding_jan_mar_2024.csv"

loadshedding_pred = read_from_CSV(loadshedding_predicted_path)
loadshedding_actual = read_from_CSV(loadshedding_actual_path)

# Ensure timestamps match properly
loadshedding_pred['date_time'] = pd.to_datetime(loadshedding_pred['date_time'])
loadshedding_actual["date_time"] = pd.to_datetime(loadshedding_actual["Date"] + " " + loadshedding_actual["Hour"].astype(str) + ":00", format="%Y-%m-%d %H:%M")

# Merge datasets on date_time to align actual and predicted values
merged_data = pd.merge(loadshedding_pred, loadshedding_actual, on="date_time", how="left")

merged_data.drop(columns=["Date", "Hour", "Unnamed: 0"], inplace=True)
merged_data.columns = merged_data.columns.str.lower()
merged_data["load_shedding"] = merged_data["load_shedding_stage"].apply(lambda x: 1 if x not in ["Unknown", "No Load Shedding"] else 0)

# Extract actual and predicted values
y_true = merged_data["load_shedding"]
y_pred = merged_data["estimated_loadshedding"]

# Compute error metrics
accuracy = accuracy_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
f1 = f1_score(y_true, y_pred)

# Compute false positives and false negatives
tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()

# Return the computed metrics
results = {
    "Accuracy": accuracy,
    "recall": recall,
    "f1_score": f1,
    "False Negatives": fn # focusing more on false negatives since the scrapped +ve true values are the only trusted loadshedding data we have
}
results
# merged_data["estimated_loadshedding"].value_counts()

{'Accuracy': 0.38278388278388276,
 'recall': 0.7724550898203593,
 'f1_score': 0.16064757160647572,
 'False Negatives': 38}