# Parametric methods for Time Series Forecasting

The following forecasting methods are implemented in this notebook:
- Naive Forecasting Method
- Moving Averages (MA)
- Autoregressive Integrated Moving Average (ARIMA)

The objective of this exercise is to implement the aforementioned techniques and observe the forecasting results. In the end you are asked to define various parametric spaces to experiment with the models and note changes in performance.

## Imports

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.statespace.sarimax import SARIMAX
from sklearn.metrics import mean_squared_error

## Load Dataset

In [None]:
def load_dataset(filename):
    filename = f"./{filename}"
    building_df = pd.read_excel(filename)
    return building_df


def train_test_split(series, train_ratio=0.8):

    split_index = int(len(series) * train_ratio) # split index
    train = series.iloc[:split_index]
    test = series.iloc[split_index:]
    return train, test # returns dataframe for both training and testing

## Naive Forecast (Last Value)
The **Last Value Naive Forecast** is the simplest forecasting method. It assumes that the most recent observed value will remain the same for all future time steps.

$$
\hat{y}_{t+h} = y_t
$$


In [None]:
def naive_forecast(df, column='energy', forecast_horizon=6):
    
    # last value from the target variable
    last_value = df[column].iloc[-1]
    
    forecast = np.full(shape=forecast_horizon, fill_value=last_value)
    
    return forecast

## Moving Average
The **Moving Average Forecast** is a simple method that uses the average of the last \( k \) observations to predict future values. It assumes that the recent past provides the best estimate of near-future behavior.

$$
\hat{y}_{t+h} = \frac{1}{k} \sum_{i=0}^{k-1} y_{t-i}
$$

In [None]:
def moving_average_forecast(df, column='energy', window_size=3, forecast_horizon=6):
    if len(df) < window_size:
        raise ValueError("Not enough data for the given window size.")

    # compute the moving average over the last `window_size` values
    recent_values = df[column].iloc[-window_size:]
    forecast_value = recent_values.mean()

    return np.full(forecast_horizon, forecast_value) # forecasted values (oneshot - same value repeated for horizon)

## Autoregressive Integrated Moving Average (ARIMA)

**ARIMA** (AutoRegressive Integrated Moving Average) is a classical time series forecasting method used for **univariate data**. It combines three components:

- **AR (AutoRegression)**: uses past values
- **I (Integrated)**: differencing to make the series stationary
- **MA (Moving Average)**: uses past forecast errors

$$
y'_t = c + \phi_1 y'_{t-1} + \dots + \phi_p y'_{t-p} + \theta_1 \epsilon_{t-1} + \dots + \theta_q \epsilon_{t-q}
$$

Where:
- $ y'_t $: differenced version of original series \( y_t \)
- $ \phi_i $: AR coefficients
- $ \theta_j $: MA coefficients
- $ \epsilon $: forecast errors

ARIMA is best used when:
- The series is **univariate**
- The data shows **autocorrelation**
- The series is (or can be made) **stationary**

In [None]:
def check_acf_pacf_stationarity(train_df, column='energy', lags=40, title="Energy Series"):
    
    series = train_df[column].dropna()
    
    plt.figure(figsize=(14, 5))

    # Plot ACF
    plt.subplot(1, 2, 1)
    plot_acf(series, lags=lags, ax=plt.gca())
    plt.title(f"ACF: {title}")

    # Plot PACF
    plt.subplot(1, 2, 2)
    plot_pacf(series, lags=lags, ax=plt.gca())
    plt.title(f"PACF: {title}")
    plt.tight_layout()
    plt.show()

    # Run ADF test for stationarity
    adf_result = adfuller(series)
    print("ADF Statistic:", adf_result[0])
    print("p-value:", adf_result[1])
    
def run_arima_forecast(train_df, column='energy', order=(1, 1, 1), forecast_horizon=6):
    
    series = train_df[column].dropna()
    
    model = ARIMA(series, order=order)
    model_fit = model.fit()

    forecast = model_fit.forecast(steps=forecast_horizon)

    return np.asarray(forecast)

def run_arima_sarimax(train_df, column='energy', order=(1,1,1), seasonal_order=(1,1,1,24), forecast_horizon=6):
    series = train_df[column].dropna()
    model = SARIMAX(series,
                order=order,          # ARIMA part
                seasonal_order=seasonal_order)  # SARIMA seasonal part
    
    model_fit = model.fit()
    forecast = model_fit.forecast(steps=forecast_horizon)
    
    return np.asarray(forecast)

## Model Evaluation

In [None]:
def evaluate_model(model_name, y_test, y_pred):
    
    #calculate MSE
    mse = mean_squared_error(y_test, y_pred)
    
    print(f"{model_name} - Multi-step Forecast MSE: {mse:.2f}")

### Execution

In [None]:
# load and split the data
building_df = load_dataset("sim_building_data.xlsx")
train_df, test_df = train_test_split(building_df)


In [None]:
# define forecasting window
forecast_horizon = 24 

# make prediction using naive model (last value)
naive_y_pred = naive_forecast(train_df, forecast_horizon=forecast_horizon)

# make predictions using moving average
moving_average_y_pred = moving_average_forecast(train_df, forecast_horizon=forecast_horizon)

In [None]:
# plot autocorrelation plots (ACF) and partial auto 

check_acf_pacf_stationarity(train_df)

In [None]:
arima_y_pred = run_arima_forecast(train_df, forecast_horizon=forecast_horizon, order=(1, 1, 1))

In [None]:
sarima_y_pred = run_arima_sarimax(train_df, forecast_horizon=forecast_horizon)

In [None]:
y_test = np.asarray(test_df['energy'][:forecast_horizon])

In [None]:

evaluate_model("Naive Forecast", y_test, naive_y_pred)

evaluate_model("Movign Average", y_test, moving_average_y_pred)

evaluate_model("ARIMA", y_test, arima_y_pred)

evaluate_model("SARIMA", y_test, sarima_y_pred)

# Further Tasks
### 1. Try out different forecasted horizon
### 1. Try different forecast horizons (e.g., 3, 12)
### 3. Compare performance using MAE
### BONUS: Train ML model and compare the performance with simple baseline and ARIMA