# Postdam PM2.5 Traditional Forcasting

* Between 2013 and 2023, data collected by DEBB021 was used.
* To increase the accuracy of PM2.5 data estimation, NO2, O3, SO2, PM10 pollutant gas data accepted by the EEA was added.


In [None]:
# imports
import sys
import os
sys.path.append(os.path.dirname(os.getcwd()))
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np, pandas as pd

In [None]:
import model_base as mb
import traditional as td

## Data Exploration

* Load Data


In [None]:
df_hourly, df_daily, df_weekly, df_monthly = mb.read_date_freq()

# HOURLY 
mb.set_start_date_time_index(df_hourly)

# DAILY 
mb.set_start_date_time_index(df_daily)

# WEEKLY 
mb.set_start_date_time_index(df_weekly)

# MONTHLY 
mb.set_start_date_time_index(df_monthly)

# SARIMAX

# Model creation and Evolve

## Diagnose Fitted Model
* Standardized residual: The residual errors seem to fluctuate around a mean of zero and have a uniform variance.
* Histogram: The density plot suggest normal distribution with mean slighlty shifted towards right.
* Theoretical Quantiles: Mostly the dots fall not in line with the red line, deviations would imply the distribution is skewed.
* Correlogram: The Correlogram, (or ACF plot) shows the residual errors are not autocorrelated. The ACF plot would imply that there is some pattern in the residual errors which are not explained in the model.

## Residuals
Residual should have ~0 mean. If residual has other than zero, then forecasts are biased. Adjusting bias is easy: if residual mean is other than zero them simply add mean value to all forecast then bias problem solved. if forecast residuals that do not contain these characteristics have room for improvements. Adding additional terms to our ETS or ARIMA model may alleviate this issue.


In [None]:
# HOURLY
td.sarimax_train_and_evolve(df_hourly)

In [None]:
# DAILY
td.sarimax_train_and_evolve(df_daily, 'D')

In [None]:
# WEEKLY
td.sarimax_train_and_evolve(df_weekly, 'W')

In [None]:
# MONTLY
td.sarimax_train_and_evolve(df_monthly, 'M')

## Timestamp Data Load 

In [None]:
df_hourly_ts, df_daily_ts, df_weekly_ts, df_monthly_ts = mb.read_timestamp_freq()

# HOURLY 
mb.set_start_time_index(df_hourly_ts)

# DAILY 
mb.set_start_time_index(df_daily_ts)

# WEEKLY 
mb.set_start_time_index(df_weekly_ts)

# MONTHLY 
mb.set_start_time_index(df_monthly_ts)

# Support Vector Regression (SVR) forecasting  

In [None]:
# Hourly
td.svr_train_and_evolve(df_hourly_ts)

# Daily
td.svr_train_and_evolve(df_daily_ts, 'D')

# Weekly
td.svr_train_and_evolve(df_weekly_ts, 'W')

# Monthly
td.svr_train_and_evolve(df_monthly_ts, 'M')


# Multiple Linear Regression

In [None]:
# Hourly
td.linear_train_and_evolve(df_hourly_ts)

# Daily
td.linear_train_and_evolve(df_daily_ts, 'D')

# Weekly
td.linear_train_and_evolve(df_weekly_ts, 'W')

# Monthly
td.linear_train_and_evolve(df_monthly_ts, 'M')

# Hyperparamater Tuning

## HOURLY

In [None]:
# HOURLY HYPERPARAMETER TUNING

hourly_sarimax_best_params = td.tune_sarimax(df_hourly)
hourly_svr_best_params = td.svr_tune_and_evaluate(df_hourly_ts)

## DAILY

In [None]:
# DAILY HYPERPARAMETER TUNING

daily_sarimax_best_params = td.tune_sarimax(df_daily)
daily_svr_estimater_model = td.svr_tune_and_evaluate(df_daily_ts)

## WEEKLY

In [None]:
# WEEKLY HYPERPARAMETER TUNING

weekly_sarimax_best_params = td.tune_sarimax(df_weekly)
weekly_svr_best_params = td.svr_tune_and_evaluate(df_weekly_ts)

## MONTHLY

In [None]:
# MONTHLY HYPERPARAMETER TUNING

monthly_sarimax_best_params = td.tune_sarimax(df_monthly)
monthly_svr_best_params = td.svr_tune_and_evaluate(df_monthly_ts)