# H.05 | Time Series



You will be asked to implement the following functions in `time_series.py`: 

1. **local_seasonal_decomposition**: Implement the seasonal decomposition of time series. The function takes the time series data and the period as input and returns the trend, seasonal, and residual components.
2. **difference**: Implement the differencing method to remove the trend from the time series data.
3. **is_stationary**: Implement the Augmented Dickey-Fuller test to check if the time series data is stationary.
4. **walk_forward_validation_arima**: Implement the walk-forward validation method for the ARIMA model. The function takes the time series data & the order of the ARIMA model, and returns the forecast.


*Note: Our dataset is collected from "Atmospheric CO2 from Continuous Air Samples at Mauna Loa Observatory, Hawaii, U.S.A." from Keeling et al (2004).*

In [None]:
%load_ext autoreload
%autoreload 2

from statsmodels import datasets as statsmodels_datasets
import pandas as pd
import plotly.graph_objects as go

data = statsmodels_datasets.co2.load_pandas().data
data = data.iloc[-500:]
data = data.interpolate()
data = data.reset_index()
data.columns = ['date', 'co2']
data["date"] = pd.to_datetime(data["date"])
data.resample('ME', on='date').mean()
data.set_index('date', inplace=True)

train = data.iloc[:int(0.8*len(data))]
test = data.iloc[int(0.8*len(data)):]

fig = go.Figure()
fig.add_trace(go.Scatter(x=train.index, y=train['co2'], mode='lines', name='Train & Validation Period'))
fig.add_trace(go.Scatter(x=test.index, y=test['co2'], mode='lines', name='Test Period'))
fig.update_layout(title='CO2 Dataset', height=600, width=1200, template='plotly_white', showlegend=True)
fig.show()

### Seasonal Decomposition

Seasonal decomposition is a method to decompose a time series into its components: trend, seasonality, and residuals. The decomposition is additive if the components are added together, and multiplicative if they are multiplied. The decomposition is useful for understanding the underlying patterns in the time series data.

In [None]:
from time_series import local_seasonal_decompose
import plotly.graph_objects as go
from plotly.subplots import make_subplots

result = local_seasonal_decompose(train['co2'], model='additive')

fig = make_subplots(rows=2, cols=2, shared_xaxes=False, shared_yaxes=False, subplot_titles=("Observed", "Trend", "Seasonal", "Residual"))
fig.add_trace(go.Scatter(x=result.observed.index, y=result.observed, mode='lines', marker=dict(color = "blue")), row=1, col=1)
fig.add_trace(go.Scatter(x=result.trend.index, y=result.trend, mode='lines', marker=dict(color = "blue")), row=1, col=2)
fig.add_trace(go.Scatter(x=result.seasonal.index, y=result.seasonal, mode='lines', marker=dict(color = "blue")), row=2, col=1)
fig.add_trace(go.Scatter(x=result.resid.index, y=result.resid, mode='lines', marker=dict(color = "blue")), row=2, col=2)
fig.update_layout(height=600, width=1200, title_text="Seasonal Decomposition", template='plotly_white', showlegend = False)
fig.show()

### Stationarity & Differencing

Stationarity is important for time series modeling because many time series models assume that the data is stationary. If the time series is not stationary, differencing can be used to remove trends and make the time series stationary.

In [None]:
from time_series import difference, is_stationary

fig = make_subplots(rows=2, cols=1, subplot_titles=("Original Data", "Differenced Data"), shared_xaxes=True)
fig.add_trace(go.Scatter(x=train.index, y=train['co2'], mode='lines', name=f'Original Data | Stationary = {is_stationary(train["co2"], 0.05)}'), row=1, col=1)
for i in range(1, 3):
    differenced = difference(train['co2'], i)
    fig.add_trace(go.Scatter(x=train.index[i:], y=differenced, mode='lines', name=f'Differenced (d={i}) | Stationary = {is_stationary(differenced, 0.05)}'), row=2, col=1)
fig.update_layout(title='CO2 Levels After Differencing', height=600, width=1200, template='plotly_white', showlegend=True)
fig.show()

### ARIMA Modeling

ARIMA (AutoRegressive Integrated Moving Average) is a popular time series model that combines autoregressive and moving average components. The ARIMA model is specified by three parameters: `p`, `d`, and `q`. The `p` parameter is the autoregressive order, the `d` parameter is the differencing order, and the `q` parameter is the moving average order.

In [None]:
from time_series import walk_forward_validation_arima, mse

arima_1_1_0_predictions = walk_forward_validation_arima(train['co2'], test['co2'], order = (1, 1, 0))
arima_4_1_0_predictions = walk_forward_validation_arima(train['co2'], test['co2'], order = (4, 1, 0))

fig = go.Figure()
fig.add_trace(go.Scatter(x=train.index, y=train['co2'], mode='lines', name='Training / Validation Period'))
fig.add_trace(go.Scatter(x=test.index, y=test['co2'], mode='lines', name='Testing Period'))
fig.add_trace(go.Scatter(x=test.index, y=arima_1_1_0_predictions, mode='lines', name='ARIMA (1, 1, 0) | MSE = ' + str(round(mse(arima_1_1_0_predictions, test['co2']), 2))))
fig.add_trace(go.Scatter(x=test.index, y=arima_4_1_0_predictions, mode='lines', name='ARIMA (4, 1, 0) | MSE = ' + str(round(mse(arima_4_1_0_predictions, test['co2']), 2))))
fig.update_layout(height=600, width=1000, title_text="ARIMA Forecast", template='plotly_white')
fig.show()