# Postdam PM2.5 Baseline (Naive Approach) Forcasting

Between 2013 and 2023, data collected by DEBB021 was used. 

To increase the accuracy of PM2.5 data estimation, NO2, O3, SO2, PM10 pollutant gas data accepted by the EEA was added.


In [None]:
# imports
import sys
import os
sys.path.append(os.path.dirname(os.getcwd()))
import matplotlib.pyplot as plt
import numpy as np, pandas as pd
import model_base as mb

## Data Exploration

* Load Data


In [None]:
df_hourly_ts, df_daily_ts, df_weekly_ts, df_monthly_ts = mb.read_timestamp_freq()


# HOURLY 
mb.set_start_time_index(df_hourly_ts)

# DAILY 
mb.set_start_time_index(df_daily_ts)

# WEEKLY 
mb.set_start_time_index(df_weekly_ts)

# MONTHLY 
mb.set_start_time_index(df_monthly_ts)

# Naive Approach

In [None]:
# Naive Forecast

# Hourly
df_hourly_ts['Forcasted-PM2.5-Value'] = df_hourly_ts['PM2.5-Value'].shift(1)

# daily
df_daily_ts['Forcasted-PM2.5-Value'] = df_daily_ts['PM2.5-Value'].shift(1)

# weekly
df_weekly_ts['Forcasted-PM2.5-Value'] = df_weekly_ts['PM2.5-Value'].shift(1)

# monthly
df_monthly_ts['Forcasted-PM2.5-Value'] = df_monthly_ts['PM2.5-Value'].shift(1)


In [None]:

# Hourly
mb.plot_pm_true_predict(df_hourly_ts, df_hourly_ts['Forcasted-PM2.5-Value'], 'Naive Hourly')

# daily
mb.plot_pm_true_predict(df_daily_ts, df_daily_ts['Forcasted-PM2.5-Value'], 'Naive Daily')

# weekly
mb.plot_pm_true_predict(df_weekly_ts, df_weekly_ts['Forcasted-PM2.5-Value'], 'Naive Weekly')

# monthly
mb.plot_pm_true_predict(df_monthly_ts, df_monthly_ts['Forcasted-PM2.5-Value'], 'Naive Monthly')

## Error Metrics

* Mean Absolute Error (MAE): MAE measures the average absolute difference between the predicted values and the actual values. Lower MAE values indicate better accuracy.

* Mean Squared Error (MSE): MSE measures the average squared difference between predicted and actual values. It penalizes larger errors more heavily than MAE. Lower MSE values indicate better accuracy.

* Root Mean Squared Error (RMSE): RMSE is the square root of the MSE. It provides an interpretable measure in the same units as the original data. Lower RMSE values indicate better accuracy.

* Mean Absolute Percentage Error (MAPE): MAPE calculates the average percentage difference between predicted and actual values. It is expressed as a percentage. Lower MAPE values indicate better accuracy. However, be cautious with MAPE when dealing with small or zero actual values, as it can result in division by zero.

* Mean Absolute Scaled Error (MASE): MASE measures the relative accuracy of a forecasting model compared to a naive forecast (e.g., using the previous period's value). A MASE value of less than 1 suggests that the model is better than the naive forecast.


In [None]:
def naive_evolve(df):
    # hourly
    # drop first column
    df = df.dropna()
    
    # Define your features and target variable
    train_data, validation_data, test_data = mb.split_data(df)
    
    # Error Metric
    mb.evolve_error_metrics(validation_data['PM2.5-Value'],validation_data['Forcasted-PM2.5-Value'])
    mb.naive_mean_absolute_scaled_error(validation_data['PM2.5-Value'],validation_data['Forcasted-PM2.5-Value'])
    # Test Error Metric
    mb.evolve_error_metrics(test_data['PM2.5-Value'],test_data['Forcasted-PM2.5-Value'])
    mb.naive_mean_absolute_scaled_error(test_data['PM2.5-Value'],test_data['Forcasted-PM2.5-Value'])

In [None]:
# HOURLY
naive_evolve(df_hourly_ts)

In [None]:
# DAILY
naive_evolve(df_daily_ts)

In [None]:
# WEEKLY
naive_evolve(df_weekly_ts)

In [None]:
# MONTHLY
naive_evolve(df_monthly_ts)