# Benchmark Forecast Evaluation

This notebook is part of the baseline validation process for hourly and daily electricity price forecasts in the Netherlands.

The **forecast data** originates from [energie.oxygent.nl](https://energie.oxygent.nl), while the **actual day-ahead prices** were obtained from the [ENTSO-E Transparency Platform](https://transparency.entsoe.eu/). Both datasets span the evaluation period from **24 February to 29 March 2025**.

## Objective

The goal of this notebook is to compute and present benchmark error metrics—primarily RMSE—for electricity price forecasts across multiple forecast horizons (1–7 days ahead). This forms a baseline against which more advanced models (e.g. SARIMA, naïve, or deep learning models) will later be compared.

## Structure

### 1. Data Preparation
- Forecast and actual prices are preprocessed and aligned based on hourly resolution.
- Only forecasts with a horizon between 1 and 7 days are retained.

### 2. Metric Evaluation
- Overall error metrics are computed for the entire dataset:  
  **RMSE**, **MAE**, **MAPE**, and **MASE**.

### 3. Aggregation
- Forecast errors are aggregated in two ways:
  - **Hourly Matrix**: Rows = hour of day, Columns = forecast horizon  
  - **Daily Matrix**: Rows = calendar date, Columns = forecast horizon  
- Each matrix includes the actual average price and a total RMSE score.

### 4. Visualization
- Results are displayed as interactive Plotly tables to enable easy inspection and model comparison.

## Notes

- This notebook does **not yet include** SARIMA or naïve forecasts; those will be added in a later phase.
- All transformations and calculations reflect due diligence for benchmarking the baseline model prior to applying more complex methods.

In [24]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from plotly.offline import plot
from sklearn.metrics import mean_squared_error, mean_absolute_error
import warnings
warnings.filterwarnings("ignore", category=UserWarning)

# ====================
# 1. Data Inlezen en Voorbereiden
# ====================

# Paden naar de bestanden
forecast_path = "/Users/redouan/Downloads/Price_Preds_Processed_20250407.csv"
actuals_path  = "/Users/redouan/Downloads/GUI_ENERGY_PRICES_202501010000-202601010000.csv"

# Evaluatieperiode
start_date = pd.Timestamp("2025-02-24")
end_date   = pd.Timestamp("2025-03-29")

# --- Forecast Data ---
forecast_df = pd.read_csv(forecast_path)

# AANGEPASTE CONVERSIE: minuten sinds epoch × 100.000 ms
forecast_df["forecast_datetime"] = pd.to_datetime(forecast_df["x"] * 100000, unit='ms', utc=True)
forecast_df["forecast_datetime"] = forecast_df["forecast_datetime"].dt.tz_localize(None)
forecast_df["target_hour"] = forecast_df["forecast_datetime"].dt.floor("H")

# Parse de timestamp kolom
if forecast_df["timestamp"].dtype == object and isinstance(forecast_df["timestamp"].iloc[0], str):
    forecast_df["fetch_dt"] = pd.to_datetime(forecast_df["timestamp"]).dt.tz_localize(None)
else:
    forecast_df["fetch_dt"] = pd.to_datetime(forecast_df["timestamp"] * 100000, unit='ms', utc=True).dt.tz_localize(None)

# Bereken datum en horizon
forecast_df["issuance_date"] = forecast_df["fetch_dt"].dt.normalize()
forecast_df["target_date"] = forecast_df["target_hour"].dt.normalize()
forecast_df["horizon"] = (forecast_df["target_date"] - forecast_df["issuance_date"]).dt.days

# Filter op horizon (1-7 dagen) en evaluatieperiode
forecast_df = forecast_df[(forecast_df["horizon"] >= 1) & (forecast_df["horizon"] <= 7)]
forecast_df = forecast_df[(forecast_df["target_hour"] >= start_date) & 
                         (forecast_df["target_hour"] <= end_date)].copy()

# --- Actuals Data ---
actuals_df = pd.read_csv(actuals_path, encoding="utf-8-sig")
actuals_df["date_anchor"] = actuals_df["MTU (UTC)"].astype(str).str.split(" - ").str[1]

# Converteer naar datetime
actuals_df["actual_datetime"] = pd.to_datetime(
    actuals_df["date_anchor"], format="%d/%m/%Y %H:%M:%S", errors="coerce"
)

# Bereken prijs in EUR/kWh en target_hour
actuals_df["price_kwh"] = actuals_df["Day-ahead Price (EUR/MWh)"] / 1000
actuals_df["target_hour"] = actuals_df["actual_datetime"].dt.floor("H").dt.tz_localize(None)

# Filter op evaluatieperiode
actuals_df = actuals_df[(actuals_df["target_hour"] >= start_date) & 
                        (actuals_df["target_hour"] <= end_date)].copy()

# Merge forecasts en actuals op target_hour
merged = pd.merge(forecast_df, actuals_df, on="target_hour", how="inner")
merged = merged.dropna(subset=["y", "price_kwh"]).copy()
merged = merged.sort_values("target_hour")

# ====================
# 2. Evaluatie: Bereken de Foutmaten
# ====================
# Bereken overall metrics voor de hele dataset
y_true = merged["price_kwh"]
y_pred = merged["y"]

rmse = np.sqrt(mean_squared_error(y_true, y_pred))
mae  = mean_absolute_error(y_true, y_pred)
mape = np.mean(np.abs((y_true - y_pred) / np.where(y_true != 0, y_true, np.nan))) * 100
mase_scale = np.mean(np.abs(y_true.diff().dropna()))
mase = mae / mase_scale if mase_scale != 0 else np.nan

# ====================
# 3. Aggregatie voor Overzichtsmatrices
# ====================

# --- Aggregated Hourly Matrix ---
merged["hour"] = merged["target_hour"].dt.hour

# Groepeer op 'hour' en 'horizon' en bereken de RMSE
rmse_hour_agg = merged.groupby(["hour", "horizon"]).apply(
    lambda g: np.sqrt(np.mean((g["price_kwh"] - g["y"])**2))
)
rmse_hour_df = pd.DataFrame(rmse_hour_agg, columns=["rmse"]).reset_index()

# Pivot: rijen = hour, kolommen = horizon (1..7)
rmse_hour_pivot = rmse_hour_df.pivot(index="hour", columns="horizon", values="rmse")

# Zorg dat alle uren (0-23) aanwezig zijn in de index
missing_hours = set(range(24)) - set(rmse_hour_pivot.index)
for hour in missing_hours:
    rmse_hour_pivot.loc[hour] = np.nan
rmse_hour_pivot = rmse_hour_pivot.sort_index()

# Zorg dat kolommen voor horizon 1 t/m 7 altijd aanwezig zijn
rmse_hour_pivot = rmse_hour_pivot.reindex(columns=range(1, 8))

# Voeg de gemiddelde actuals toe per uur (over de evaluatieperiode)
actuals_by_hour = merged.groupby("hour")["price_kwh"].mean()
rmse_hour_pivot["Actuals"] = actuals_by_hour

# Bereken Total_rmse over de geldige horizonten
horizon_columns = [col for col in range(1, 8) if col in rmse_hour_pivot.columns]
rmse_hour_pivot["Total_rmse"] = rmse_hour_pivot[horizon_columns].mean(axis=1, skipna=True)

# Zet de kolomvolgorde
ordered_cols = ["Actuals"] + list(range(1, 8)) + ["Total_rmse"]
ordered_cols = [col for col in ordered_cols if col in rmse_hour_pivot.columns]
rmse_hour_pivot = rmse_hour_pivot[ordered_cols]

# --- Aggregated Daily Matrix ---
merged["target_date"] = merged["target_hour"].dt.normalize()
rmse_day_agg = merged.groupby(["target_date", "horizon"]).apply(
    lambda g: np.sqrt(np.mean((g["price_kwh"] - g["y"])**2))
)
rmse_day_df = pd.DataFrame(rmse_day_agg, columns=["rmse"]).reset_index()

# Pivot: rijen = target_date, kolommen = horizon
rmse_day_pivot = rmse_day_df.pivot(index="target_date", columns="horizon", values="rmse")
rmse_day_pivot = rmse_day_pivot.reindex(columns=range(1, 8))
rmse_day_pivot = rmse_day_pivot.sort_index()

# Voeg de gemiddelde actuals toe per dag
actuals_by_day = merged.groupby("target_date")["price_kwh"].mean()
rmse_day_pivot["Actuals"] = actuals_by_day

# Bereken Total_rmse, maar alleen voor geldige horizonten
horizon_columns = [col for col in range(1, 8) if col in rmse_day_pivot.columns]
rmse_day_pivot["Total_rmse"] = rmse_day_pivot[horizon_columns].mean(axis=1, skipna=True)

ordered_cols_day = ["Actuals"] + list(range(1, 8)) + ["Total_rmse"]
ordered_cols_day = [col for col in ordered_cols_day if col in rmse_day_pivot.columns]
rmse_day_pivot = rmse_day_pivot[ordered_cols_day]

#%%
# ====================
# 4. Visualisatie
# ====================

# --- 4.1 Overzicht Metrics ---
metrics_fig = go.Figure(data=[go.Table(
    header=dict(
        values=["Metric", "Value"],
        fill_color='#B3E5FC',
        align='center',
        font=dict(size=14, color='black')
    ),
    cells=dict(
        values=[
            ["RMSE", "MAE", "MAPE (%)", "MASE"],
            [f"{rmse:.4f}", f"{mae:.4f}", f"{mape:.2f}", f"{mase:.4f}"]
        ],
        fill_color='#F5F5F5',
        align='center',
        font=dict(size=12)
    ))
])

metrics_fig.update_layout(
    title="Overall Forecast Evaluation Metrics",
    width=500,
    height=200,
    margin=dict(l=20, r=20, t=60, b=20)
)

metrics_fig.show()

# --- 4.2 Plotly Time Series Graph ---
# Maak een subset van de data voor betere visualisatie (gebruik bijvoorbeeld 1 voorspelling per dag)
# Selecteer voor elk target_hour de eerste voorspelling
timeseries_data = merged.drop_duplicates(subset=['target_hour']).sort_values('target_hour')

# Teken lijndiagram van werkelijke vs. voorspelde prijzen
timeseries_fig = go.Figure()

# Trace voor de werkelijke prijzen
timeseries_fig.add_trace(go.Scatter(
    x=timeseries_data["target_hour"],
    y=timeseries_data["price_kwh"],
    mode="lines+markers",
    name="Werkelijke prijs (kWh)",
    line=dict(color="blue")
))

# Trace voor de voorspelde prijzen
timeseries_fig.add_trace(go.Scatter(
    x=timeseries_data["target_hour"],
    y=timeseries_data["y"],
    mode="lines+markers",
    name="Voorspelde prijs (kWh)",
    line=dict(color="orange", dash="dash")
))

# Layout update
timeseries_fig.update_layout(
    title="⚡ Elektriciteitsprijs: Voorspelling vs Realiteit (Uurdata)",
    xaxis_title="Datum & Uur",
    yaxis_title="Prijs (EUR/kWh)",
    paper_bgcolor="white",
    plot_bgcolor="white",
    xaxis=dict(showgrid=True, gridcolor="lightgray"),
    yaxis=dict(showgrid=True, gridcolor="lightgray"),
    width=1200,  # Aangepaste breedte
    height=600
)

timeseries_fig.show()

# --- 4.3 Plotly Tables voor Matrices ---
# Functie om DataFrame voor te bereiden voor Plotly tabel-visualisatie
def prepare_for_plotly_table(df, is_hourly=True):
    plot_df = df.copy()
    
    # Hernoem kolommen volgens vereist format
    column_mapping = {
        "Actuals": "actual_price_kwh",
        "Total_rmse": "RMSE_total"
    }
    
    # Voeg horizon kolommen toe aan mapping
    for i in range(1, 8):
        if i in plot_df.columns:
            column_mapping[i] = f"RMSE_day{i}"
    
    # Hernoem kolommen waar van toepassing
    plot_df = plot_df.rename(columns=column_mapping)
    
    # Zorg voor de juiste kolomvolgorde
    ordered_cols = ["actual_price_kwh"] + [f"RMSE_day{i}" for i in range(1, 8) if f"RMSE_day{i}" in plot_df.columns] + ["RMSE_total"]
    ordered_cols = [col for col in ordered_cols if col in plot_df.columns]
    plot_df = plot_df[ordered_cols]
    
    # Formateer de indextitel
    if is_hourly:
        index_values = [f"{hour:02d}:00" for hour in plot_df.index]
    else:
        index_values = [date.strftime("%d-%m-%Y") for date in plot_df.index]
    
    # Formateer de waarden in het DataFrame
    if "actual_price_kwh" in plot_df.columns:
        plot_df["actual_price_kwh"] = plot_df["actual_price_kwh"].apply(lambda x: f"€{x:.4f}" if pd.notna(x) else "")
    
    # Formateer RMSE kolommen met 4 decimalen
    for col in plot_df.columns:
        if col.startswith("RMSE_"):
            plot_df[col] = plot_df[col].apply(lambda x: f"{x:.4f}" if pd.notna(x) else "")
    
    return plot_df, index_values

# Bereid DataFrames voor op Plotly tabel visualisatie
rmse_hour_plotly, hour_index_values = prepare_for_plotly_table(rmse_hour_pivot, is_hourly=True)
rmse_day_plotly, day_index_values = prepare_for_plotly_table(rmse_day_pivot, is_hourly=False)

# Definieer kleuren
header_color = '#B3E5FC'  # Lichtblauw
cell_color = '#F5F5F5'    # Lichtgrijs

# Creëer plotly table voor Hourly Matrix
hourly_table = go.Figure(data=[go.Table(
    header=dict(
        values=['Hour'] + list(rmse_hour_plotly.columns),
        fill_color=header_color,
        align='center',
        font=dict(size=12, color='black')
    ),
    cells=dict(
        values=[hour_index_values] + [rmse_hour_plotly[col] for col in rmse_hour_plotly.columns],
        fill_color=cell_color,
        align=['center'] + ['right'] * len(rmse_hour_plotly.columns),
        font=dict(size=11)
    )
)])

# Pas layout aan voor Hourly Table
hourly_table.update_layout(
    title="Hourly Electricity Price Forecast Evaluation",
    width=1000,
    height=600,
    margin=dict(l=20, r=20, t=60, b=20)
)

# Toon Hourly Table
hourly_table.show()

# Creëer plotly table voor Daily Matrix
daily_table = go.Figure(data=[go.Table(
    header=dict(
        values=['Date'] + list(rmse_day_plotly.columns),
        fill_color=header_color,
        align='center',
        font=dict(size=12, color='black')
    ),
    cells=dict(
        values=[day_index_values] + [rmse_day_plotly[col] for col in rmse_day_plotly.columns],
        fill_color=cell_color,
        align=['center'] + ['right'] * len(rmse_day_plotly.columns),
        font=dict(size=11)
    )
)])

# Pas layout aan voor Daily Table
daily_table.update_layout(
    title="Daily Electricity Price Forecast Evaluation",
    width=1000,
    height=800,
    margin=dict(l=20, r=20, t=60, b=20)
)

# Toon Daily Table
daily_table.show()