This notebook processes raw electricity data to estimate load shedding stages. It performs the following steps:
1. Load and clean the raw electricity data.
2. Filter the data to start from a specific date.
3. Apply rule-based functions to estimate load shedding stages.
4. Save the processed data to a CSV file.


In [5]:
import pandas as pd
from det_loadshedding import tune_threshold, estimate_loadshedding
from data_ingestion import read_from_CSV

# Import electricity data
df = read_from_CSV("ESK15160.csv")  # Ensure csv is in the same folder as notebook

# process the column titles into appropriate formats and update the df 
new_list = []
for col in df.columns:
    new_list.append(col.lower().replace(" ", "_"))

df.columns = new_list

# Create a new column for UNIX timestamps as they would be use as joining keys for the weather data
df['date_time'] = pd.to_datetime(df['date_time'], format="%d/%m/%Y %H:%M")
# df['entry_id'] = df["date_time"].apply(lambda x: int(x.timestamp()))

# Entries into the db start from 01/01/2024. So we get the time stamp and filter the df by it
timestamp = df[df["date_time"] == "01/01/2024 00:00"]["date_time"].iloc[0]
df = df[df["date_time"] >= timestamp]

# Reindex data
df = df.reset_index(drop=True)

# Apply the rule-based threshold function to the dataset
df["load_shedding_threshold"] = df["date_time"].dt.hour.apply(tune_threshold)
df["estimated_loadshedding"] = df.apply(estimate_loadshedding, axis=1)

# Select required columns and save data in a csv
df = df[["date_time", "estimated_loadshedding"]]
df.to_csv("./loadshedding_pred.csv", index=False)


2025-02-16 08:05:11,786 - data_ingestion - INFO - CSV file read successfully from the path.
