# Energy Price Forecasting Script

This script performs energy price forecasting using a SARIMA model. It processes a dataset of day-ahead energy prices from the year 2022, optimizes model parameters, and predicts the next week's energy prices.

**Key Features and Goals:**
- **Data Integrity and Leakage Prevention:**  
  The script always performs a careful train-test split (for example, using a 3-week training period and a 1-week forecasting period) to ensure that no future data leaks into the training set.
  
- **Iterative and Modular Workflow:**  
  Each step—from data loading and preprocessing to model fitting, forecasting, and evaluation—is separated into modular steps. This allows you to adjust date ranges, test data splits, and parameters at every stage.
  
- **Interactive Visualizations:**  
  Using Plotly, the script provides interactive visualizations for exploratory data analysis (EDA) and forecast evaluation. You can zoom, hover, and dynamically explore the data and forecast results.
  
- **Model Optimization and Parameter Evaluation:**  
  The SARIMA model parameters are optimized using Auto-ARIMA with a stepwise search, with the option to let auto_arima choose an optimal Box–Cox transformation.  
  Additionally, details of all candidate hyperparameter configurations evaluated (via a candidate parameters summary) are logged, allowing you to analyze why a particular configuration was chosen and compare it with other model types (e.g., SARIMAX, ML, or deep learning models).

- **Comprehensive Logging:**  
  The experiment results—including date ranges, model type, selected parameters, error metrics (MAE, MSE, RMSE, MAPE, and MdAPE), and global forecast accuracy—are saved in two ways:
  - A CSV log file named using the scheme: `SARIMA_used_log_YYYYMMDD_HHMM.csv` (stored in the designated Log folder).
  - A cumulative SQLite database (`experiment_results.db`) stored in the designated Data folder, enabling robust querying and long-term tracking of all experiments.

**Steps in This Script:**
1. **Import & Setup Libraries:**  
   Checks and imports all required libraries, ensuring that Plotly is configured for interactive visualizations.
2. **Load and Preprocess Dataset:**  
   Reads the CSV file, extracts timestamps from the "MTU (CET/CEST)" column, converts them to datetime, renames the target column ("Day-ahead (EUR/MWh)" to "Energy Price"), and resamples the data to a continuous hourly frequency.
3. **Exploratory Data Analysis (EDA) with Interactive Visualizations:**  
   Provides summary statistics, outlier detection, and interactive plots (time series, histogram, box plot) to explore the dataset.
4. **Define Date Range and Split Data (Training & Testing):**  
   Allows you to adjust the training (e.g., 3 weeks) and test (e.g., 1 week) periods, with an explicit check for data leakage.
5. **Enhanced Interactive Visualization of Train-Test Split:**  
   Creates an interactive Plotly chart that displays the full dataset with vertical lines and annotations marking the start of training, end of training, and end of test periods.
6. **Fit Optimized SARIMA Model:**  
   Uses Auto-ARIMA to select optimal model parameters and then trains a SARIMA model using the best-found configuration (with a fallback to default parameters if necessary).
7. **Forecast the Next 7 Days and Visualize Results (Interactive):**  
   Generates a 7-day forecast with confidence intervals, inverts any Box–Cox transformation if applied, and displays an interactive forecast chart with vertical markers.
8. **Verify Forecast vs. Test Data and Evaluate Accuracy Metrics (Interactive):**  
   Compares forecasted and actual values, computes overall error metrics (including MAE, MSE, RMSE, MAPE, and MdAPE), calculates a global forecast accuracy metric (by aggregating actual and forecast values), and provides interactive visualizations of both hourly and day-by-day performance.
9. **Log Experiment Results:**  
   Saves all experiment details (including candidate parameters evaluated) to both a CSV log file and a cumulative SQLite database for future analysis and model comparison.

**Dataset Information:**
- **Source:** Day-ahead energy prices (Year: 2022)
- **Frequency:** Hourly data
- **Target Column:** "Energy Price" (renamed from "Day-ahead (EUR/MWh)")
- **Seasonality:** Daily (m=24)

**Modeling Approach:**
- **Primary Model:** SARIMA (with Auto-ARIMA optimization)
- **Fallback Option:** Manual SARIMA parameters if Auto-ARIMA fails
- **Additional Considerations:** The candidate parameter search summary is logged to help understand the selection process and support future comparisons with other models (e.g., SARIMAX, ML, or deep learning).

This comprehensive and modular approach ensures robust model development, prevents data leakage, provides interactive insights, and enables detailed tracking and analysis of all experiments.

# 1️⃣ Step 1: Import & Setup Libraries

In this step, we check for and import all the required libraries. If a library is missing, the script will attempt to install it. This cell ensures that all dependencies are loaded and that Plotly is configured to render charts in the notebook.

In [109]:
# Step 1: Import & Setup Libraries
import os
import sys
import subprocess

# List of required libraries
REQUIRED_LIBRARIES = [
    "pandas", "numpy", "plotly", "scikit-learn", "statsmodels", "pmdarima"
    #, "psutil"
]

def check_and_install_libraries():
    """Checks and installs missing libraries."""
    for lib in REQUIRED_LIBRARIES:
        try:
            __import__(lib)
            print(f"✅ {lib} is installed.")
        except ImportError:
            print(f"📌 {lib} is missing. Installing now...")
            subprocess.call([sys.executable, "-m", "pip", "install", lib])
            print(f"✅ {lib} installed successfully.")

# Run the function to check/install libraries
check_and_install_libraries()

# Import libraries
try:
    import pandas as pd
    import numpy as np
    import plotly.express as px
    import plotly.graph_objects as go
    import plotly.io as pio
    # from sklearn.model_selection import train_test_split
    from statsmodels.tsa.statespace.sarimax import SARIMAX
    from pmdarima import auto_arima
    # from statsmodels.tsa.stattools import adfuller
    # from statsmodels.stats.diagnostic import acorr_ljungbox
    # import scipy.stats as stats
    # import psutil

    # Configure Plotly to render in the notebook
    pio.renderers.default = "notebook"
    
    print("\n✅ All libraries imported successfully!\n")
except Exception as e:
    print(f"❌ Error during import: {e}")

✅ pandas is installed.
✅ numpy is installed.
✅ plotly is installed.
📌 scikit-learn is missing. Installing now...
✅ scikit-learn installed successfully.
✅ statsmodels is installed.
✅ pmdarima is installed.

✅ All libraries imported successfully!



# 2️⃣ Step 2: Load and Preprocess Dataset

In this step, we load the dataset and preprocess it. Note that the first column, "MTU (CET/CEST)", contains a date range in the format:  
`"dd/mm/yyyy HH:MM:SS - dd/mm/yyyy HH:MM:SS"`.  
We extract the starting timestamp (the portion before " - ") and convert it to a datetime object using the format `%d/%m/%Y %H:%M:%S`.

After parsing the timestamps, we:
- Drop any rows with invalid timestamps.
- Set the timestamp as the DataFrame index.
- Rename the "Day-ahead (EUR/MWh)" column to "Energy Price" (which is our target variable).
- Keep only the "Energy Price" column.
- Ensure that the index has a continuous hourly frequency (filling missing timestamps if needed).

Adjust the file path or format as necessary.

In [111]:
# Step 2: Load and Preprocess Dataset

import sys

# Define file path components (adjust file name as needed)
# file_path = os.path.join(DOWNLOAD_DIR, f"power-gen-consolidated-data-2022-2024.csv")  # Change extension as needed
DATA_FOLDER = "/Users/sgawde/work/eaisi-code/main-branch-apr/ENEXIS/workspaces/sandeep/data"
FILE_NAME = "power-gen-consolidated-data-2022-2024.csv"
FILE_PATH = os.path.join(DATA_FOLDER, FILE_NAME)

# Check if file exists
if not os.path.exists(FILE_PATH):
    print(f"❌ Error: File not found at {FILE_PATH}")
    sys.exit(1)

print(f"✅ File found: {FILE_PATH}")

# Load the raw dataset and print its shape and columns
df_raw = pd.read_csv(FILE_PATH)
print(f"✅ Raw data shape: {df_raw.shape}")
print("✅ Raw data columns:", df_raw.columns.tolist())
print("✅ Raw data preview:")
print(df_raw.head())

df_raw["date"] = df_raw["date_x"].copy()


# Extract the start time from the "MTU (CET/CEST)" column.
# The column is in the format: "dd/mm/yyyy HH:MM:SS - dd/mm/yyyy HH:MM:SS" 2022-01-01 00:00:00+00:00
df_raw["date"] = df_raw["date"].str.split(" ").str[0]

# Parse the "Timestamp" column using the explicit format
# df_raw["date"] = pd.to_datetime(df_raw["date"], format="%d/%m/%Y %H:%M:%S", errors="coerce")
# print("\n✅ date after parsing:")
# print(df_raw["date"].head())

# Drop any rows with invalid timestamps (NaT)
df_raw = df_raw.dropna(subset=["date"])
print(f"\n✅ Data shape after dropping invalid date: {df_raw.shape}")

# Set the "Timestamp" column as the index and sort the DataFrame
df_raw.set_index("date", inplace=True)
df_raw.sort_index(inplace=True)
print("\n✅ Index range after sorting:")
print(f"   Start: {df_raw.index.min()}")
print(f"   End:   {df_raw.index.max()}")

# Drop duplicate timestamps (keep the first occurrence)
df_raw = df_raw[~df_raw.index.duplicated(keep="first")]
print("\n✅ Data shape after dropping duplicate timestamps:", df_raw.shape)

# Rename the energy price column:
# Use the "Day-ahead (EUR/MWh)" column as the target variable.
#if "Day-ahead (EUR/MWh)" in df_raw.columns:
#    df_raw.rename(columns={"Day-ahead (EUR/MWh)": "Energy Price"}, inplace=True)
#else:
#    print("⚠️ Warning: 'Day-ahead (EUR/MWh)' column not found.")

# Keep only the "Energy Price" column
#if "Price" in df_raw.columns:
#    df_processed = df_raw[["Price"]]
#else:
#    print("❌ ERROR: 'Price' column is missing. Check column names.")
#    sys.exit(1)

# Ensure the index has a continuous hourly frequency (filling missing timestamps via forward fill)
# df_processed = df_processed.asfreq("H").ffill()
print("\n✅ Final preprocessed data shape:", df_raw.shape)
print("✅ Final index range:")
print(f"   Start: {df_raw.index.min()}")
print(f"   End:   {df_raw.index.max()}")

# Assign the processed DataFrame to 'df' for use in subsequent steps
# df = df_raw


✅ File found: /Users/sgawde/work/eaisi-code/main-branch-apr/ENEXIS/workspaces/sandeep/data/power-gen-consolidated-data-2022-2024.csv
✅ Raw data shape: (26303, 76)
✅ Raw data columns: ['date_x', 'Load', 'Price', 'Flow_BE_to_NL', 'Flow_NL_to_BE', 'Flow_DE_to_NL', 'Flow_NL_to_DE', 'Flow_GB_to_NL', 'Flow_NL_to_GB', 'Flow_DK_to_NL', 'Flow_NL_to_DK', 'Flow_NO_to_NL', 'Flow_NL_to_NO', 'Flow_BE', 'Flow_DE', 'Flow_GB', 'Flow_DK', 'Flow_NO', 'Total_Flow', 'index', 'date_y', 'temperature_2m', 'apparent_temperature', 'cloud_cover', 'wind_speed_10m', 'diffuse_radiation', 'direct_normal_irradiance', 'shortwave_radiation', 'wind_speed_100m', 'location', 'capacity_0', 'production_all', 'capacity_1', 'production_wind', 'capacity_2', 'production_solar', 'capacity_4', 'production_heatpump', 'capacity_8', 'production_cofiring', 'capacity_9', 'production_geothermal', 'capacity_10', 'production_other', 'capacity_11', 'production_waste', 'capacity_12', 'production_biooil', 'capacity_13', 'production_biomass'

# 3️⃣ Step 3: Exploratory Data Analysis (EDA)

In this step, we explore the preprocessed dataset in detail. We will:
- Provide an overview of the dataset (number of rows, columns, data types, missing values).
- Display descriptive statistics and distributions for the numerical columns.
- Detect potential outliers using the Interquartile Range (IQR) method.

This analysis helps us understand the data quality and guides any necessary preprocessing adjustments before moving forward.

In [112]:
# Step 3: Exploratory Data Analysis (EDA) with Interactive Visualizations

def dataset_summary(data):
    """Prints an overview of the dataset, including column quality, data types, and missing values."""
    if data.empty:
        print("⚠️ Warning: The dataset is empty. Please check the preprocessing steps.")
        return
    print("\n🔍 Dataset Overview:")
    print(f"- Number of rows: {data.shape[0]}")
    print(f"- Number of columns: {data.shape[1]}")
    
    print("\n📌 Column Information:")
    try:
        summary_df = pd.DataFrame({
            "Data Type": data.dtypes,
            "Missing Values": data.isnull().sum(),
            "Unique Values": data.nunique(),
            "First Value": data.iloc[0],
            "Last Value": data.iloc[-1]
        })
        print(summary_df)
    except IndexError:
        print("⚠️ Cannot display first/last values because the dataset is empty.")

def column_distribution(data):
    """Displays descriptive statistics for numerical columns."""
    if data.empty:
        print("⚠️ Dataset is empty. Skipping descriptive statistics.")
    else:
        print("\n📊 Column Distribution & Summary Statistics:")
        print(data.describe())

def detect_outliers(data):
    """Identifies potential outliers using the IQR method."""
    if data.empty:
        print("⚠️ Dataset is empty. Skipping outlier detection.")
    else:
        print("\n🚨 Outlier Detection:")
        numerical_cols = data.select_dtypes(include=['number'])
        Q1 = numerical_cols.quantile(0.25)
        Q3 = numerical_cols.quantile(0.75)
        IQR = Q3 - Q1
        outliers = ((numerical_cols < (Q1 - 1.5 * IQR)) | (numerical_cols > (Q3 + 1.5 * IQR))).sum()
        # Only print columns with detected outliers
        outliers = outliers[outliers > 0]
        if outliers.empty:
            print("✅ No outliers detected.")
        else:
            print(outliers)

# Run summary functions on the preprocessed DataFrame (df)
dataset_summary(df)
column_distribution(df)
detect_outliers(df)

# Interactive Visualizations using Plotly Express

import plotly.express as px
'''
# 1. Interactive Time Series Plot
fig_line = px.line(
    df,
    x=df.index,
    y="Price",
    title="Interactive Energy Price Time Series",
    labels={"x": "Time", "Energy Price": "Price (EUR/MWh)"},
    template="plotly_dark"
)
fig_line.update_xaxes(rangeslider_visible=True)
fig_line.show()

# 2. Interactive Histogram of Energy Price
fig_hist = px.histogram(
    df,
    x="Price",
    nbins=50,
    title="Distribution of Energy Price",
    labels={"Energy Price": "Price (EUR/MWh)"},
    template="plotly_dark"
)
fig_hist.update_layout(bargap=0.1)
fig_hist.show()

# 3. Interactive Box Plot for Energy Price
fig_box = px.box(
    df,
    y="Price",
    title="Box Plot of Energy Price",
    labels={"Energy Price": "Price (EUR/MWh)"},
    template="plotly_dark"
)
fig_box.show()
'''


🔍 Dataset Overview:
- Number of rows: 1096
- Number of columns: 72

📌 Column Information:
                           Data Type  Missing Values  Unique Values  \
index                          int64               0           1096   
Load                         float64               0           1034   
Price                        float64               0           1009   
Flow_BE_to_NL                float64               0            406   
Flow_NL_to_BE                float64               0            609   
...                              ...             ...            ...   
production_CHP_total         float64            1096              0   
capacity_50                  float64            1096              0   
production_solarthermal      float64            1096              0   
capacity_51                  float64            1096              0   
production_allconsuminggas   float64            1096              0   

                            First Value  Last Value  
in

'\n# 1. Interactive Time Series Plot\nfig_line = px.line(\n    df,\n    x=df.index,\n    y="Price",\n    title="Interactive Energy Price Time Series",\n    labels={"x": "Time", "Energy Price": "Price (EUR/MWh)"},\n    template="plotly_dark"\n)\nfig_line.update_xaxes(rangeslider_visible=True)\nfig_line.show()\n\n# 2. Interactive Histogram of Energy Price\nfig_hist = px.histogram(\n    df,\n    x="Price",\n    nbins=50,\n    title="Distribution of Energy Price",\n    labels={"Energy Price": "Price (EUR/MWh)"},\n    template="plotly_dark"\n)\nfig_hist.update_layout(bargap=0.1)\nfig_hist.show()\n\n# 3. Interactive Box Plot for Energy Price\nfig_box = px.box(\n    df,\n    y="Price",\n    title="Box Plot of Energy Price",\n    labels={"Energy Price": "Price (EUR/MWh)"},\n    template="plotly_dark"\n)\nfig_box.show()\n'

# 4️⃣ Step 4: Train-Test Split

In this step, we select specific date ranges for training and testing:
- **Training Period:** 3 weeks (e.g., from 2023-06-01 to 2023-06-21)
- **Test Period:** The following 1 week (e.g., from 2023-06-22 to 2023-06-28)

This ensures that our training data does not leak into our test data. If necessary, adjust the date ranges to suit your analysis.

In [113]:
# Step 4: Train-Test Split

# Define training and testing date ranges (adjust as needed)
train_start = "2022-01-01"
train_end = "2022-12-31"   # 3 weeks of training data
test_start  = "2023-10-01"
test_end    = "2023-10-31"   # 1 week of test/forecasting data

df["location"] = df["location"].replace("DeBilt", 1.0)

# Filter the DataFrame to select the training and test periods
train = df.loc[train_start:train_end]
test = df.loc[test_start:test_end]

# Check for data leakage: ensure that the training set ends before the test set begins
if train.index.max() < test.index.min():
    print("✅ No data leakage: Training data ends before Test data begins.")
else:
    print("Data leakage detected: Training data overlaps with Test data!")
    # raise ValueError("❌ Data leakage detected: Training data overlaps with Test data!")

# Print summary of the train-test split
print("\n✅ Train Data Summary:")
print(f"   Start: {train.index.min()}, End: {train.index.max()}, Total: {len(train)} records")
print("✅ Test Data Summary:")
print(f"   Start: {test.index.min()}, End: {test.index.max()}, Total: {len(test)} records")

✅ No data leakage: Training data ends before Test data begins.

✅ Train Data Summary:
   Start: 2022-01-01, End: 2022-12-31, Total: 365 records
✅ Test Data Summary:
   Start: 2023-10-01, End: 2023-10-31, Total: 31 records


# 5️⃣ Step 5: Interactive Data Visualization (Plotly)

In this step, we create interactive visualizations using Plotly. The visualization displays the full time series with the "Energy Price" and highlights the train-test split. You can use the range slider and selectors to zoom in or filter the data.

In [114]:
# Step 5: Enhanced Interactive Data Visualization (Plotly)

# Ensure Plotly renders inside the notebook
pio.renderers.default = "notebook"

# Print a summary of the training and testing datasets
print("\n🔍 **Train & Test Dataset Overview:**")
print(f"- Train Records: {len(train)}, Test Records: {len(test)}")
print(f"- Train Range: {train.index.min()} → {train.index.max()}")
print(f"- Test Range: {test.index.min()} → {test.index.max()}")
'''
# Create an interactive line plot of the full dataset
fig = px.line(
    df,
    x=df.index,
    y="Price",
    title="📈 Interactive Energy Price Time Series (Train & Test Split)",
    labels={"Price": "Price (EUR/MWh)", "index": "Time"},
    template="plotly_dark"
)

# 1. Add vertical line for train start
fig.add_vline(
    x=train.index[0],
    line_width=3,
    line_dash="dash",
    line_color="green"
)

# 2. Add vertical line for train end
fig.add_vline(
    x=train.index[-1],
    line_width=3,
    line_dash="dash",
    line_color="red"
)

# 3. Add vertical line for test end
fig.add_vline(
    x=test.index[-1],
    line_width=3,
    line_dash="dash",
    line_color="blue"
)

# Customize layout to include range selectors, slider, and annotations
fig.update_layout(
    xaxis=dict(
        rangeselector=dict(
            buttons=list([
                dict(count=7, label="1W", step="day", stepmode="backward"),
                dict(count=30, label="1M", step="day", stepmode="backward"),
                dict(step="all")
            ])
        ),
        rangeslider=dict(visible=True),
        type="date"
    ),
    annotations=[
        # Train Start Annotation
        dict(
            x=train.index[0],
            y=df["Price"].max(),
            text="Train Start",
            showarrow=True,
            arrowhead=2,
            arrowcolor="green"
        ),
        # Train End Annotation
        dict(
            x=train.index[-1],
            y=df["Price"].max(),
            text="Train End",
            showarrow=True,
            arrowhead=2,
            arrowcolor="red"
        ),
        # Test End Annotation
        dict(
            x=test.index[-1],
            y=df["Price"].max(),
            text="Test End",
            showarrow=True,
            arrowhead=2,
            arrowcolor="blue"
        )
    ]
)

# Show the interactive plot
fig.show()
'''


🔍 **Train & Test Dataset Overview:**
- Train Records: 365, Test Records: 31
- Train Range: 2022-01-01 → 2022-12-31
- Test Range: 2023-10-01 → 2023-10-31


'\n# Create an interactive line plot of the full dataset\nfig = px.line(\n    df,\n    x=df.index,\n    y="Price",\n    title="📈 Interactive Energy Price Time Series (Train & Test Split)",\n    labels={"Price": "Price (EUR/MWh)", "index": "Time"},\n    template="plotly_dark"\n)\n\n# 1. Add vertical line for train start\nfig.add_vline(\n    x=train.index[0],\n    line_width=3,\n    line_dash="dash",\n    line_color="green"\n)\n\n# 2. Add vertical line for train end\nfig.add_vline(\n    x=train.index[-1],\n    line_width=3,\n    line_dash="dash",\n    line_color="red"\n)\n\n# 3. Add vertical line for test end\nfig.add_vline(\n    x=test.index[-1],\n    line_width=3,\n    line_dash="dash",\n    line_color="blue"\n)\n\n# Customize layout to include range selectors, slider, and annotations\nfig.update_layout(\n    xaxis=dict(\n        rangeselector=dict(\n            buttons=list([\n                dict(count=7, label="1W", step="day", stepmode="backward"),\n                dict(count=30, l

# 6️⃣ Step 6: Fit SARIMA Model (Optimized for Speed)

In this step, we use Auto‐ARIMA to search for the best SARIMA parameters on the training data. We use a stepwise search (which is more efficient) and allow auto_arima to choose an optimal Box–Cox transformation if needed. 

After obtaining the optimal orders, we train a SARIMA model using the selected parameters. 

*Note:* If Auto‐ARIMA fails for any reason, default parameters will be used.

In [116]:
import time
import matplotlib.pyplot as plt
from sklearn.metrics import mean_absolute_error, mean_squared_error, mean_absolute_percentage_error

print("\n⏳ Running Optimized Auto-ARIMA to select best parameters...")

# ✅ Step 1: Ensure 'sales' is a 1D array
y = df['Price'].values.ravel()

# ✅ Step 2: Use Auto-ARIMA to Find the Best Model
auto_model = auto_arima(
    y,
    seasonal=True,  # Use seasonal ARIMA
    m=7,  # Seasonal period (weekly data)
    trace=True,  # Show fitting process
    error_action="ignore",
    suppress_warnings=True
)

# ✅ Step 3: Print the Best ARIMA Order
print(f"Best ARIMA Order: {auto_model.order}, Best Seasonal Order: {auto_model.seasonal_order}")

# ✅ Step 4: Fit SARIMAX Using the Best Parameters
best_order = auto_model.order
best_seasonal_order = auto_model.seasonal_order

sarima_model = SARIMAX(y, order=best_order, seasonal_order=best_seasonal_order)
sarima_results = sarima_model.fit()

# ✅ Step 5: Forecast Next 30 Days
future_steps = 30
forecast = sarima_results.forecast(steps=future_steps)

# ✅ Step 6: Plot the Results
# plt.figure(figsize=(10,5))
# plt.plot(df.index, y, label="Actual Data", color="blue")
# plt.gcf().autofmt_xdate()  # Rotate x-axis labels to avoid formatting issues

# Compute evaluation metrics
mae_ltsm = mean_absolute_error(y, forecast)
rmse_ltsm = np.sqrt(mean_squared_error(y, forecast))
mape_ltsm = mean_absolute_percentage_error(y, forecast)
# smape_ltsm = symmetric_mape(y, forecast)
# aic_ltsm = compute_aic(y, forecast, num_params=train.shape[1] + 1)

print(mae_ltsm, rmse_ltsm, mape_ltsm)

'''
start_arima = time.time()
try:
    auto_model = auto_arima(
        train,
        seasonal=True,
        m=24,                 # Daily seasonality in hourly data
        stepwise=True,        # Use heuristic search for efficiency
        trace=True,           # Show search progress (disable later if desired)
        suppress_warnings=True,
        # n_jobs is ignored in stepwise mode
        max_p=2, max_q=2,     # Limit non-seasonal AR & MA terms
        max_P=1, max_Q=1,     # Limit seasonal terms
        d=1, D=1,
        max_order=10,         # Restrict total number of parameters
        trend="t",
        lambda_="auto",       # Let auto_arima choose an optimal Box–Cox transformation (if beneficial)
        information_criterion="aic"
    )

    best_order, best_seasonal_order = auto_model.order, auto_model.seasonal_order
    elapsed_arima = time.time() - start_arima
    print(f"✅ Best SARIMA order: {best_order}, Seasonal Order: {best_seasonal_order}")
    print(f"⏱ Auto-ARIMA completed in {elapsed_arima:.2f} seconds")
except Exception as e:
    print(f"⚠️ Auto-ARIMA failed. Using default SARIMA parameters. Error: {e}")
    best_order, best_seasonal_order = (2, 1, 1), (1, 1, 1, 24)


# Check if a Box–Cox transformation was applied and print the lambda value if available
if hasattr(auto_model, 'lambda_') and auto_model.lambda_ is not None:
    print(f"📐 Optimal Box–Cox lambda: {auto_model.lambda_:.4f}")
else:
    print("📐 No Box–Cox transformation was applied.")

print("\n⏳ Training Optimized SARIMA Model...")

sarima_model = SARIMAX(
    train,
    order=best_order,
    seasonal_order=best_seasonal_order,
    enforce_stationarity=False,
    enforce_invertibility=False
)
sarima_result = sarima_model.fit(disp=False)
print("✅ Optimized SARIMA model trained successfully!")
'''


⏳ Running Optimized Auto-ARIMA to select best parameters...
Performing stepwise search to minimize aic



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.


'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.


'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.


'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.


'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.


'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.


'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.


'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.


'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.


'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.


'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.


 ARIMA(2,1,2)(1,0,1)[7] intercept   : AIC=11091.410, Time=2.89 sec
 ARIMA(0,1,0)(0,0,0)[7] intercept   : AIC=11251.259, Time=0.05 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.


'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(1,1,0)(1,0,0)[7] intercept   : AIC=11205.898, Time=0.20 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(0,1,1)(0,0,1)[7] intercept   : AIC=11152.611, Time=0.83 sec
 ARIMA(0,1,0)(0,0,0)[7]             : AIC=11249.262, Time=0.04 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.


'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(2,1,2)(0,0,1)[7] intercept   : AIC=11092.305, Time=1.69 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(2,1,2)(1,0,0)[7] intercept   : AIC=11092.243, Time=1.13 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(2,1,2)(2,0,1)[7] intercept   : AIC=11092.595, Time=2.66 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(2,1,2)(1,0,2)[7] intercept   : AIC=11092.615, Time=2.68 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(2,1,2)(0,0,0)[7] intercept   : AIC=11090.868, Time=0.78 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(1,1,2)(0,0,0)[7] intercept   : AIC=11089.149, Time=0.46 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(1,1,2)(1,0,0)[7] intercept   : AIC=11090.678, Time=0.73 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(1,1,2)(0,0,1)[7] intercept   : AIC=11090.724, Time=0.80 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(1,1,2)(1,0,1)[7] intercept   : AIC=11089.811, Time=1.43 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(0,1,2)(0,0,0)[7] intercept   : AIC=11089.542, Time=0.26 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(1,1,1)(0,0,0)[7] intercept   : AIC=11093.916, Time=0.33 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(1,1,3)(0,0,0)[7] intercept   : AIC=11086.793, Time=0.72 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(1,1,3)(1,0,0)[7] intercept   : AIC=11089.962, Time=0.97 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(1,1,3)(0,0,1)[7] intercept   : AIC=11088.790, Time=1.71 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(1,1,3)(1,0,1)[7] intercept   : AIC=11089.194, Time=1.97 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(0,1,3)(0,0,0)[7] intercept   : AIC=11088.501, Time=0.49 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(2,1,3)(0,0,0)[7] intercept   : AIC=11089.919, Time=0.88 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(1,1,4)(0,0,0)[7] intercept   : AIC=11090.117, Time=0.66 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(0,1,4)(0,0,0)[7] intercept   : AIC=11088.975, Time=0.47 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(2,1,4)(0,0,0)[7] intercept   : AIC=11088.537, Time=1.50 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(1,1,3)(0,0,0)[7]             : AIC=11084.794, Time=0.36 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(1,1,3)(1,0,0)[7]             : AIC=11086.793, Time=0.71 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(1,1,3)(0,0,1)[7]             : AIC=11086.793, Time=0.73 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(1,1,3)(1,0,1)[7]             : AIC=11087.197, Time=0.73 sec
 ARIMA(0,1,3)(0,0,0)[7]             : AIC=11086.505, Time=0.11 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.


'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(1,1,2)(0,0,0)[7]             : AIC=11087.153, Time=0.40 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(2,1,3)(0,0,0)[7]             : AIC=11087.921, Time=0.40 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(1,1,4)(0,0,0)[7]             : AIC=11088.120, Time=0.26 sec
 ARIMA(0,1,2)(0,0,0)[7]             : AIC=11087.544, Time=0.09 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.


'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(0,1,4)(0,0,0)[7]             : AIC=11086.978, Time=0.17 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(2,1,2)(0,0,0)[7]             : AIC=11088.871, Time=0.27 sec



'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.



 ARIMA(2,1,4)(0,0,0)[7]             : AIC=11086.635, Time=0.92 sec

Best model:  ARIMA(1,1,3)(0,0,0)[7]          
Total fit time: 31.621 seconds
Best ARIMA Order: (1, 1, 3), Best Seasonal Order: (0, 0, 0, 7)


ValueError: Found input variables with inconsistent numbers of samples: [1096, 30]

# 7️⃣ Step 7: Forecast the Next 7 Days and Visualize Results

In this step, we:
- Use the trained SARIMA model to forecast the next 7 days (168 hours) of energy prices.
- Create a forecast index based on the start of the test period.
- Extract the forecasted mean and confidence intervals.
- **Important:** If a Box–Cox transformation was applied (via `lambda_="auto"` in Auto‐ARIMA), we invert the transformation for the forecast output.
- Plot the forecast along with the training and test data to visually compare model predictions.

In [102]:
import plotly.graph_objects as go
from datetime import datetime

# Define forecast horizon: 7 days * 24 hours = 168 hours
forecast_steps = 24 * 7

# Create forecast results using the SARIMA model
forecast_obj = sarima_results.get_forecast(steps=forecast_steps)
forecast_mean = forecast_obj.predicted_mean
forecast_conf_int = forecast_obj.conf_int()

# Create a forecast index starting at the beginning of the test set
forecast_index = pd.date_range(start=test.index.min(), periods=forecast_steps, freq="H")

# Check if a Box–Cox transformation was applied in Auto‐ARIMA
lambda_val = None
if hasattr(auto_model, 'lambda_'):
    lambda_val = auto_model.lambda_

# If a transformation was applied (and lambda is not 1), invert the Box–Cox transformation:
# Inverse Box–Cox: x = (y*lambda + 1)**(1/lambda)
if lambda_val is not None and lambda_val != 1:
    forecast_mean = (forecast_mean * lambda_val + 1) ** (1 / lambda_val)
    forecast_conf_int = (forecast_conf_int * lambda_val + 1) ** (1 / lambda_val)
    print(f"📐 Inverting Box–Cox transform using lambda: {lambda_val:.4f}")
else:
    print("📐 No Box–Cox transformation to invert.")

# Build a DataFrame for the forecast results
forecast_df = pd.DataFrame({"Forecasted Price": forecast_mean}, index=forecast_index)

# Convert test.index.min() to a Python datetime object for add_vline
forecast_start_dt = test.index.min().to_pydatetime()

# -----------------------------
# Build Interactive Plotly Figure
# -----------------------------
fig = go.Figure()

# 1. Training Data
fig.add_trace(
    go.Scatter(
        x=train.index,
        y=train["Energy Price"],
        mode="lines",
        name="Training Data",
        line=dict(color="blue")
    )
)

# 2. Test Data
fig.add_trace(
    go.Scatter(
        x=test.index,
        y=test["Energy Price"],
        mode="lines",
        name="Test Data",
        line=dict(color="orange")
    )
)

# 3. Forecast
fig.add_trace(
    go.Scatter(
        x=forecast_df.index,
        y=forecast_df["Forecasted Price"],
        mode="lines",
        name="Forecast",
        line=dict(color="red", dash="dash")
    )
)

# 4. Confidence Interval (fill between upper & lower bounds)
fig.add_trace(
    go.Scatter(
        x=list(forecast_conf_int.index) + list(forecast_conf_int.index[::-1]),
        y=list(forecast_conf_int.iloc[:, 1]) + list(forecast_conf_int.iloc[:, 0][::-1]),
        fill='toself',
        fillcolor='rgba(255, 192, 203, 0.3)',  # light pink
        line=dict(color='rgba(255, 192, 203, 0)'),
        name="95% Confidence Interval"
    )
)

# 5. Vertical line for forecast start (without annotation in add_vline)
fig.add_vline(
    x=forecast_start_dt,
    line_width=2,
    line_dash="dash",
    line_color="black"
)

# 6. Add a separate annotation for the forecast start line
fig.add_annotation(
    x=forecast_start_dt,
    y=df["Energy Price"].max(),
    text="Forecast Start",
    showarrow=True,
    arrowhead=2,
    arrowcolor="black",
    yanchor="bottom"
)

# Update Layout with range slider and selectors
fig.update_layout(
    title="Energy Price Forecast (Next 7 Days)",
    xaxis_title="Time",
    yaxis_title="Energy Price (EUR/MWh)",
    template="plotly_dark",
    xaxis=dict(
        rangeselector=dict(
            buttons=list([
                dict(count=7, label="1W", step="day", stepmode="backward"),
                dict(count=30, label="1M", step="day", stepmode="backward"),
                dict(step="all")
            ])
        ),
        rangeslider=dict(visible=True),
        type="date"
    )
)

# Show the interactive figure
fig.show()

📐 No Box–Cox transformation to invert.



'H' is deprecated and will be removed in a future version, please use 'h' instead.



AttributeError: 'str' object has no attribute 'to_pydatetime'

# 8️⃣ Step 8: Verify Forecast vs. Test Data and Evaluate Accuracy Metrics

In this step, we compare the forecasted energy prices with the actual test data by:

- Combining the test data and forecasted data into a single DataFrame.
- Printing a summary table of Actual vs. Forecast values.
- Calculating overall error metrics (MAE, MSE, RMSE, MAPE) for the entire forecast period.
- Computing day-by-day error metrics to assess forecast accuracy on a daily basis.
- Visualizing the actual versus forecasted values.

This allows you to evaluate both the overall performance and the daily accuracy of the forecast.

In [None]:
# Step 8: Verify Forecast vs. Test Data and Evaluate Accuracy Metrics

from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Use a small epsilon to avoid division by zero
epsilon = 1e-10

# Create a DataFrame that combines actual test values with forecasted values
compare_df = pd.DataFrame({
    "Actual": test["Energy Price"],
    "Forecast": forecast_df["Forecasted Price"]
})

# Calculate Hourly Accuracy (%) for each time point:
# Accuracy = (1 - |Actual - Forecast|/(Actual)) * 100
compare_df["Hourly Accuracy (%)"] = (1 - np.abs((compare_df["Actual"] - compare_df["Forecast"]) / (compare_df["Actual"] + epsilon))) * 100

# Print the first few rows of the comparison DataFrame
print("\n📊 Comparison of Test Data and Forecast (first 5 rows):")
print(compare_df.head())

# -----------------------
# Overall Error Metrics
# -----------------------
overall_mae = mean_absolute_error(compare_df["Actual"], compare_df["Forecast"])
overall_mse = mean_squared_error(compare_df["Actual"], compare_df["Forecast"])
overall_rmse = np.sqrt(overall_mse)
overall_mape = np.mean(np.abs((compare_df["Actual"] - compare_df["Forecast"]) / (compare_df["Actual"] + epsilon))) * 100
overall_mdape = np.median(np.abs((compare_df["Actual"] - compare_df["Forecast"]) / (compare_df["Actual"] + epsilon))) * 100

# Global Forecast Accuracy (%) computed by aggregating the values:
sum_actual = compare_df["Actual"].sum()
sum_forecast = compare_df["Forecast"].sum()
global_accuracy = (1 - np.abs(sum_actual - sum_forecast) / (sum_actual + epsilon)) * 100

print("\n📉 Overall Forecast Error Metrics:")
print(f"- MAE: {overall_mae:.4f}")
print(f"- MSE: {overall_mse:.4f}")
print(f"- RMSE: {overall_rmse:.4f}")
print(f"- MAPE: {overall_mape:.2f}%")
print(f"- MdAPE: {overall_mdape:.2f}%")
print(f"- Global Forecast Accuracy: {global_accuracy:.2f}%")

# -----------------------
# Day-by-Day Error Metrics
# -----------------------
def compute_metrics(actual, forecast):
    mae = mean_absolute_error(actual, forecast)
    mse = mean_squared_error(actual, forecast)
    rmse = np.sqrt(mse)
    mape = np.mean(np.abs((actual - forecast) / (actual + epsilon))) * 100
    accuracy = max(0, 100 - mape)
    return pd.Series({"MAE": mae, "MSE": mse, "RMSE": rmse, "MAPE": mape, "Accuracy (%)": accuracy})

daily_metrics = compare_df.resample("D").apply(lambda x: compute_metrics(x["Actual"], x["Forecast"]))
print("\n📊 Day-by-Day Forecast Accuracy:")
print(daily_metrics)

# -----------------------
# Interactive Plotly Plot with Dual Y-Axes
# -----------------------
fig = make_subplots(specs=[[{"secondary_y": True}]])

# Add actual test data trace on primary y-axis
fig.add_trace(
    go.Scatter(
        x=compare_df.index, 
        y=compare_df["Actual"],
        mode="lines",
        name="Actual Test Data",
        line=dict(color="orange")
    ),
    secondary_y=False
)

# Add forecast data trace on primary y-axis
fig.add_trace(
    go.Scatter(
        x=compare_df.index,
        y=compare_df["Forecast"],
        mode="lines",
        name="Forecast",
        line=dict(color="red", dash="dot")
    ),
    secondary_y=False
)

# Add hourly accuracy trace on secondary y-axis
fig.add_trace(
    go.Scatter(
        x=compare_df.index,
        y=compare_df["Hourly Accuracy (%)"],
        mode="lines",
        name="Hourly Accuracy (%)",
        line=dict(color="blue", dash="dash")
    ),
    secondary_y=True
)

# Update layout for interactivity
fig.update_layout(
    title="Interactive Comparison: Actual vs. Forecast & Hourly Accuracy",
    xaxis_title="Time",
    yaxis_title="Energy Price (EUR/MWh)",
    template="plotly_dark"
)

# Set secondary y-axis title
fig.update_yaxes(title_text="Hourly Accuracy (%)", secondary_y=True)

# Add range slider and selectors
fig.update_layout(
    xaxis=dict(
        rangeselector=dict(
            buttons=list([
                dict(count=1, label="1D", step="day", stepmode="backward"),
                dict(count=7, label="1W", step="day", stepmode="backward"),
                dict(count=30, label="1M", step="day", stepmode="backward"),
                dict(step="all")
            ])
        ),
        rangeslider=dict(visible=True),
        type="date"
    )
)

fig.show()

# 9️⃣ Step 9: Log Experiment Results

In this step, we log the experiment results to two locations:

1. **CSV Log File:**  
   - The log file is named using the format:  
     `SARIMA_used_log_YYYYMMDD_HHMM.csv`  
   - This file is stored in the Log folder:  
     `/Users/redouan/Library/CloudStorage/OneDrive-DANAnalytics/EAISI/Script/Logfiles`
   - It contains one row per experiment with columns for the experiment timestamp, model type, training and test date ranges, selected SARIMA parameters, and various error metrics.

2. **SQLite Database:**  
   - The cumulative results are stored in a SQLite database file named `experiment_results.db`  
   - This file is located in the Data folder:  
     `/Users/redouan/Library/CloudStorage/OneDrive-DANAnalytics/EAISI/Script/Data`
   - This database allows you to query and analyze results across multiple experiments.

This two-file approach ensures traceability and flexibility when comparing model performance across different experiments.

In [None]:
import csv
import sqlite3
from datetime import datetime
import os

# Ensure that overall_accuracy and overall_mdape are defined (if not, compute them)
if "overall_accuracy" not in globals():
    overall_accuracy = compare_df["Hourly Accuracy (%)"].mean()
if "overall_mdape" not in globals():
    overall_mdape = np.median(np.abs((compare_df["Actual"] - compare_df["Forecast"]) / (compare_df["Actual"] + 1e-10))) * 100

# -----------------------------
# Define folder paths for logs and results
# -----------------------------
log_folder = "/Users/redouan/Library/CloudStorage/OneDrive-DANAnalytics/EAISI/Script/Logfiles"
db_folder = "/Users/redouan/Library/CloudStorage/OneDrive-DANAnalytics/EAISI/Script/Data"

# Ensure the folders exist
os.makedirs(log_folder, exist_ok=True)
os.makedirs(db_folder, exist_ok=True)

# -----------------------------
# Gather experiment results
# -----------------------------
# For candidate_params, we capture the full auto_arima summary as a string.
try:
    candidate_params = str(auto_model.summary())
except Exception as e:
    candidate_params = f"Auto-ARIMA search failed; used default parameters. Error: {e}"

results = {
    "experiment_timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
    "model_type": "SARIMA",  # update dynamically as needed
    "train_start": train.index.min().strftime("%Y-%m-%d %H:%M:%S"),
    "train_end": train.index.max().strftime("%Y-%m-%d %H:%M:%S"),
    "test_start": test.index.min().strftime("%Y-%m-%d %H:%M:%S"),
    "test_end": test.index.max().strftime("%Y-%m-%d %H:%M:%S"),
    "best_order": str(best_order),
    "best_seasonal_order": str(best_seasonal_order),
    "AIC": sarima_result.aic,
    "BIC": sarima_result.bic,
    "overall_MAE": overall_mae,
    "overall_MSE": overall_mse,
    "overall_RMSE": overall_rmse,
    "overall_MAPE": overall_mape,
    "overall_mdape": overall_mdape,
    "overall_accuracy": overall_accuracy,
    "candidate_params": candidate_params
}

# -----------------------------
# Option 1: Write results to a CSV Log File
# -----------------------------
# Create a filename with the current timestamp (YYYYMMDD_HHMM)
csv_filename = f"SARIMA_used_log_{datetime.now().strftime('%Y%m%d_%H%M')}.csv"
csv_filepath = os.path.join(log_folder, csv_filename)

# Define the header based on the keys of the results dictionary
header = list(results.keys())

# Write the header and current results to the CSV file using semicolon as the delimiter
with open(csv_filepath, "w", newline="") as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=header, delimiter=";")
    writer.writeheader()
    writer.writerow(results)

print(f"✅ Experiment results logged to CSV file: {csv_filepath}")

# -----------------------------
# Option 2: Insert results into a cumulative SQLite Database
# -----------------------------
db_filepath = os.path.join(db_folder, "experiment_results.db")
conn = sqlite3.connect(db_filepath)
cursor = conn.cursor()

# Create table if it doesn't exist, including new fields overall_mdape and candidate_params
create_table_query = """
CREATE TABLE IF NOT EXISTS experiments (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    experiment_timestamp TEXT,
    model_type TEXT,
    train_start TEXT,
    train_end TEXT,
    test_start TEXT,
    test_end TEXT,
    best_order TEXT,
    best_seasonal_order TEXT,
    AIC REAL,
    BIC REAL,
    overall_MAE REAL,
    overall_MSE REAL,
    overall_RMSE REAL,
    overall_MAPE REAL,
    overall_mdape REAL,
    overall_accuracy REAL,
    candidate_params TEXT
);
"""
cursor.execute(create_table_query)

# Insert the current experiment results into the database
insert_query = """
INSERT INTO experiments (
    experiment_timestamp, model_type, train_start, train_end, test_start, test_end,
    best_order, best_seasonal_order, AIC, BIC, overall_MAE, overall_MSE, overall_RMSE,
    overall_MAPE, overall_mdape, overall_accuracy, candidate_params
)
VALUES (
    :experiment_timestamp, :model_type, :train_start, :train_end, :test_start, :test_end,
    :best_order, :best_seasonal_order, :AIC, :BIC, :overall_MAE, :overall_MSE, :overall_RMSE,
    :overall_MAPE, :overall_mdape, :overall_accuracy, :candidate_params
);
"""
cursor.execute(insert_query, results)
conn.commit()
conn.close()

print(f"✅ Experiment results inserted into SQLite database: {db_filepath}")