# Time series: classical methods

### 1. Introduction to Time Series
* Definition of time series
* Importance and applications of time series analysis
* Types of time series data (univariate, multivariate)
* Time series components (trend, seasonality, cyclicity, irregularity)
### 2. Exploratory Data Analysis for Time Series
* Importing time series data into Python
* Visualizing time series data (line plots, seasonal plots)
* Handling missing values and outliers
* Stationarity testing (Augmented Dickey-Fuller test)
* Decomposing time series components
### 3. Traditional Time Series Forecasting Methods
* Simple methods (Naive, Average, Drift)
* Exponential Smoothing (Simple, Holt's, Holt-Winters)
* Autoregressive Integrated Moving Average (ARIMA) models




## 1. Introduction to Time Series
**Definition of Time Series**

A time series is a collection of data points indexed in chronological order, typically with uniform time intervals. It represents a sequence of observations or measurements taken at successive points in time, often capturing the evolution of a particular phenomenon over a period.

**Importance and Applications of Time Series Analysis**

Time series analysis is crucial in various fields, including finance, economics, meteorology, signal processing, and many others. It enables us to:

* Understand patterns and trends in data over time
* Identify seasonality, cyclicity, and other recurring patterns
* Forecast future values based on historical data
* Detect anomalies or unusual behavior
* Make informed decisions and predictions
**Types of Time Series Data**
1. **Univariate Time Series**: This type of time series consists of a single variable observed over time, such as stock prices, temperature readings, or sales figures.
2. **Multivariate Time Series**: This involves multiple variables observed over the same time period, such as various economic indicators or sensor measurements from different sources.

**Time Series Components**
Time series data can often be decomposed into the following components:
1. **Trend**: The underlying long-term pattern or movement, representing the overall direction of the data (increasing, decreasing, or stable).
2. **Seasonality**: Periodic fluctuations or patterns that repeat over fixed intervals of time, such as daily, weekly, monthly, or yearly cycles.
3. **Cyclicity**: Longer-term fluctuations or cycles that are not strictly periodic, often driven by economic or business cycles.
4. **Irregularity (Residual)**: The remaining random or irregular variations in the data that cannot be explained by the other components.

## 2. Exploratory Data Analysis for Time Series
### **Importing Time Series Data into Python**

Python provides several libraries for working with time series data, such as Pandas, NumPy, and datetime. Here's an example of importing a time series dataset from a CSV file:

In [None]:
import pandas as pd

url = 'https://github.com/jbrownlee/Datasets/raw/master/airline-passengers.csv'

# Load data from CSV
airline_data = pd.read_csv(url, index_col='Month', parse_dates=True)

In [None]:
airline_data.head()

**Exercise 1**:
Load the Daily Minimum Temperatures dataset. Replace the blanks with the correct information.

**Instructions**:
* Replace the first blank with the dataset URL https://raw.githubusercontent.com/upul/WhiteBoard/master/data/daily-minimum-temperatures-in-me.csv.
* Fill in the second blank with the index column name, which is **'Date'**.
* Set the number of rows to nrows=3650.
* Rename the temperature column to **'Temperature'**.
* Clean column **'Temperature'** by removing **'?'** and converting it to float.


In [None]:
daily_min_temp_url = 'https://raw.githubusercontent.com/upul/WhiteBoard/master/data/daily-minimum-temperatures-in-me.csv'

nrows = 3650
# Load the Daily Minimum Temperatures dataset
temp_data = pd.read_csv(, nrows=nrows, index_col=, parse_dates=True)
print(temp_data.head())

In [None]:
# Rename columns  to ['Temperature']
temp_data.columns = 
# Remove '?' from the Temperature column: if a string contains a '?', replace it to ''
temp_data['Temperature'] = temp_data['Temperature'].str.replace(, )
# Convert column Daily Minimum Temperatures to float using astype(float)
temp_data['Temperature'] = 
# Print the first 5 rows of the dataset using head()
temp_data.

In [None]:
temp_data.info()

### **Visualizing Time Series Data**

Visualizing time series data is essential for understanding patterns and trends. Common visualization techniques include:
#### 1. **Line Plots**: Plotting the data points over time to see the overall trend and patterns.

In [None]:
import matplotlib.pyplot as plt

# Line plot for Airline Passengers
airline_data.plot()
plt.show()

**Exercise 2**:
Create a line plot for the Daily Minimum Temperatures dataset.
**Instructions**:
* Replace the blank with plot() to generate the line plot.

In [None]:
# Line plot for Daily Minimum Temperatures
temp_data.____()
plt.show()

#### 2. **Seasonal Plots**: Visualizing the seasonal patterns by plotting the data over a specific time period (e.g., months or days of the week).

In [None]:
# Seasonal plot (monthly)
airline_data.groupby(airline_data.index.month).mean().plot()
plt.show()

**Exercise 3**:
Create a seasonal plot for the Daily Minimum Temperatures dataset.
**Instructions**:
* Replace the blanks with the correct information to generate the seasonal plot.

In [None]:
temp_data.groupby().mean().plot()

### **Handling Missing Values and Outliers**

Missing values and outliers can significantly impact time series analysis. Python libraries like Pandas provide methods for handling missing data, such as interpolation or filling with specific values.

In [None]:
# Interpolate missing values
airline_data.interpolate(method='linear', inplace=True)

**Exercise 4**:
Handle missing values in the Daily Minimum Temperatures dataset.

**Instructions**:
* Replace the blank with the correct method to fill missing values using linear interpolation.

In [None]:
# Interpolate missing values in the temperature dataset
temp_data.

Outlier detection and removal can be performed using statistical techniques or domain-specific knowledge.

### **Stationarity Testing**

Stationarity is an important concept in time series analysis, as many forecasting models assume that the data is stationary (i.e., its statistical properties do not change over time). A time series is considered stationary if its statistical properties, such as mean, variance, and autocorrelation, are constant over time. In simpler terms, a stationary series does not exhibit trend or seasonal effects and has constant long-term behavior. This is essential for developing reliable predictive models, as the stationarity assumption allows for consistency in the model's parameters over time.

#### The ADF Test Explained

The ADF test checks for the presence of a unit root in a time series, which would indicate non-stationarity. It is an extension of the Dickey-Fuller test and provides a more robust approach by accommodating higher order autoregressive processes.

The test is performed by estimating the following regression:

$$ \Delta y_t = \alpha + \beta t + \gamma y_{t-1} + \delta_1 \Delta y_{t-1} + \cdots + \delta_{p-1} \Delta y_{t-p+1} + \varepsilon_t $$

- $ y_t $ is the time series.
- $ \Delta y_t $ is the first difference of the series (i.e., $ y_t - y_{t-1} $).
- $ \alpha $ is a constant term.
- $ \beta t $ represents a deterministic trend.
- $ \gamma $ is the coefficient on $ y_{t-1} $, which is the key to the test. The null hypothesis is $ \gamma = 0 $, indicating a unit root (non-stationary). The alternative hypothesis is $ \gamma < 0 $, indicating stationarity.
- $ \delta_1, \cdots, \delta_{p-1} $ are the coefficients of the lagged difference terms.
- $ \varepsilon_t $ is the error term.

#### Hypothesis in the ADF Test

- Null Hypothesis ($ H_0 $): The series has a unit root (non-stationary).
- Alternative Hypothesis ($ H_1 $): The series does not have a unit root (stationary).

#### Interpreting the ADF Test Result

- **ADF Statistic**: A negative value is more supportive of stationarity. The more negative the statistic, the stronger the rejection of the null hypothesis.
- **p-value**: If the p-value is less than a significance level (commonly 0.05), the null hypothesis is rejected, suggesting the series is stationary.

#### Implementation in Python

The ADF test can be performed in Python using the `adfuller` method from the `statsmodels` library. The function returns several outputs, including the ADF statistic and the p-value, which are used to determine stationarity.

In [None]:
! pip3 install statsmodels

In [None]:
from statsmodels.tsa.stattools import adfuller

# ADF test
result = adfuller(airline_data['Passengers'])
print(f'ADF Statistic: {result[0]}')
print(f'p-value: {result[1]}')

**Exercise 5**:
Perform the Augmented Dickey-Fuller test on the Daily Minimum Temperatures dataset.

**Instructions**:
* Replace the blanks with the correct method to perform the ADF test.

In [None]:
# ADF test
temp_result = 
print(f'ADF Statistic: {temp_result[0]}')
print(f'p-value: {temp_result[1]}')

### **Decomposing Time Series Components**

Time series decomposition involves separating the data into its trend, seasonal, and residual components. This can be accomplished using techniques like moving averages or additive/multiplicative decomposition.

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose

# Additive decomposition
result = seasonal_decompose(airline_data, model='additive')

# Plot components
result.plot()

**Exercise 6**:
Decompose the Daily Minimum Temperatures dataset into its components.

**Instructions**:
* Replace the blank with the correct method to decompose the time series data.

In [None]:
# Additive decomposition
temp_result = seasonal_decompose(temp_data, model='additive', period=)

# Plot components
temp_result.plot()

## 3. Traditional Time Series Forecasting Methods
**Simple Methods**

1. **Naive Method**: This method assumes that the future value will be the same as the last observed value. It is a baseline method often used for comparison.

In [None]:
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import mean_squared_error

# Naive forecast
naive_forecast = airline_data.shift(1)

naive_forecast_mse = mean_squared_error(airline_data[1:], naive_forecast[1:])
print(f'Naive forecast MSE: {naive_forecast_mse.values[0]}')


**Exercise 7**:
Implement the Naive method for forecasting the Daily Minimum Temperatures dataset.

**Instructions**:
* Replace the blank with the correct method to implement the Naive method.

In [None]:
# Naive forecast
temp_naive_forecast = 

temp_naive_forecast_mse = 
print(f'Naive forecast MSE: {temp_naive_forecast_mse.values[0]}')

2. **Average Method**: This method forecasts the future value as the mean of all past observations.

In [None]:
# Average forecast, the mean of all last observations
average_forecast = airline_data.expanding().mean()

average_forecast_mse = mean_squared_error(airline_data, average_forecast)

print(f'Average forecast MSE: {average_forecast_mse.values[0]}')

**Exercise 8**:
Implement the Average method for forecasting the Daily Minimum Temperatures dataset.

**Instructions**:
* Replace the blank with the correct method to implement the Average method.

In [None]:
# Average forecast, the mean of all last observations
temp_average_forecast = temp_data.expanding().mean()

temp_average_forecast_mse = mean_squared_error(temp_data, temp_average_forecast)

print(f'Average forecast MSE: {temp_average_forecast_mse.values[0]}')

3. **Drift Method**: This method assumes a constant trend (drift) in the data and forecasts based on the trend.

In [None]:
import numpy as np

# Drift forecast
drift = np.polyfit(np.arange(len(airline_data)), airline_data['Passengers'], 1)[0]
drift_forecast = airline_data.shift(1) + drift

drift_forecast_mse = mean_squared_error(airline_data[1:], drift_forecast[1:])
print(f'Drift forecast MSE: {drift_forecast_mse.values[0]}')

**Exercise 9**:
Implement the Drift method for forecasting the Daily Minimum Temperatures dataset.

**Instructions**:
* Replace the blanks with the correct method to implement the Drift method.

In [None]:
# Drift forecast
temp_drift = 
temp_drift_forecast = temp_data.shift(1) + temp_drift

temp_drift_forecast_mse = mean_squared_error(temp_data[1:], temp_drift_forecast[1:])
print(f'Drift forecast MSE: {temp_drift_forecast_mse.values[0]}')

### **Exponential Smoothing Methods**

#### 1. **Simple Exponential Smoothing**: 

This method assigns exponentially decreasing weights to past observations, with more recent observations carrying higher weights.

In [None]:
airline_data.index.freq = 'MS'

In [None]:
# Calculate the index to split the data
split_idx = int(len(airline_data) * 0.8)

# Split the data into training and testing sets
airline_train = airline_data.iloc[:split_idx]
airline_test = airline_data.iloc[split_idx:]

In [None]:
from statsmodels.tsa.holtwinters import SimpleExpSmoothing

# Assuming `airline_train` is your training dataset
model = SimpleExpSmoothing(airline_train)
fitted_model = model.fit()

# Forecast the next steps equivalent to the test set size
steps = len(airline_test)  # Assuming you have a separate test set
ses_forecast = fitted_model.forecast(steps=steps)

In [None]:


# Calculate MSE
ses_forecast_mse = mean_squared_error(airline_test, ses_forecast)
print(f'SES forecast MSE: {ses_forecast_mse}')

**Exercise 10**:
Implement the Simple Exponential Smoothing method for forecasting the Daily Minimum Temperatures dataset.

**Instructions**:
* Define temp_train as the training data and temp_test as the test data.
* Replace the blanks with the correct method to implement Simple Exponential Smoothing.


In [None]:
# Calculate the index to split the data
split_idx = int(len(temp_data) * 0.8)

# Split the data into training and testing sets
temp_train = temp_data.iloc[:split_idx]
temp_test = temp_data.iloc[split_idx:]

In [None]:
# Assuming `airline_train` is your training dataset
model = 
fitted_model = model.fit()

# Forecast the next steps equivalent to the test set size
steps = 
temp_ses_forecast = fitted_model.forecast(steps=steps)

In [None]:
# Calculate MSE
temp_ses_forecast_mse = mean_squared_error(temp_test, temp_ses_forecast)
print(f'SES forecast MSE: {temp_ses_forecast_mse}')

####  2. **Holt's Method**: 


#####  Holt's Linear Trend Method

Holt's Linear Trend method extends Simple Exponential Smoothing to allow forecasting of data with a trend. This method is suitable for time series data where not only the level of the series is important but also the trend. The method involves two equations: one for the level and one for the trend.

##### Components of Holt's Method:

1. **Level**: The level equation is similar to Simple Exponential Smoothing, adjusting the series for the trend of the previous time step.

2. **Trend**: The trend equation estimates the trend in the data, which is the increase or decrease in the level component from one period to the next.

##### Equations:

The method consists of two equations:

1. **Level Equation**:
   $$ \hat{y}_{t+1|t} = \alpha y_t + (1 - \alpha)(l_{t-1} + b_{t-1}) $$
   Here, $\hat{y}_{t+1|t}$ is the forecast for the next period, $y_t$ is the actual value at time $t$, $l_{t-1}$ is the estimated level at time $t-1$, and $b_{t-1}$ is the estimated trend at time $t-1$. The parameter $\alpha$ controls the smoothing of the level.

2. **Trend Equation**:
   $$ b_t = \beta^*(l_t - l_{t-1}) + (1 - \beta^*)b_{t-1} $$
   Here, $b_t$ is the estimated trend at time $t$, $l_t$ is the estimated level at time $t$, and $\beta^*$ is the parameter controlling the smoothing of the trend.

##### Forecasting:

The forecast for $m$ periods ahead is given by:
$$ \hat{y}_{t+m} = l_t + mb_t $$
where $l_t$ is the current level, $b_t$ is the current trend, and $m$ is the number of periods ahead for the forecast.

##### Implementation in Python:

Holt's method can be implemented using the `Holt` class from the `statsmodels.tsa.holtwinters` module in Python:

In [None]:
from statsmodels.tsa.holtwinters import Holt

# Holt's Method
model = Holt(airline_train)
fitted_model = model.fit()
steps = len(airline_test)
holt_forecast = fitted_model.forecast(steps=steps)

In [None]:
holt_mse_forecast = mean_squared_error(airline_test, holt_forecast)

print(f'Holt forecast MSE: {holt_mse_forecast}')

**Exercise 11**:
Implement Holt's Linear Trend method for forecasting the Daily Minimum Temperatures dataset.

**Instructions**:
* Perform the Holt's method on the temperature data.

In [None]:
# Holt's Method
model = 
fitted_model = model.fit()

# Forecast the next steps equivalent to the test set size
steps = 
temp_holt_forecast = fitted_model.forecast(steps=steps)

In [None]:
temp_holt_forecast_mse = mean_squared_error(temp_test, temp_holt_forecast)

print(f'Holt forecast MSE: {temp_holt_forecast_mse}')

####  3. **Holt-Winters Method**: 




**Holt-Winters Method**
The Holt-Winters method is a time series forecasting technique that accounts for seasonality and trend. It has three components: level, trend, and seasonality.

**Components**
* Level (α): Smoothing parameter for the level (0 ≤ α ≤ 1)
* Trend (β): Smoothing parameter for the trend (0 ≤ β ≤ 1)
* Seasonality (γ): Smoothing parameter for the seasonality (0 ≤ γ ≤ 1)

**Formulas**:
**Additive Holt-Winters**
- Level: $\ell_t = \alpha(y_t - s_{t-m}) + (1 - \alpha)(\ell_{t-1} + b_{t-1})$
- Trend: $b_t = \beta(\ell_t - \ell_{t-1}) + (1 - \beta)b_{t-1}$
- Seasonality: $s_t = \gamma(y_t - \ell_t) + (1 - \gamma)s_{t-m}$
- Forecast: $\hat{y}_{t+h} = \ell_t + hb_t + s_{t-m+h}$


**Multiplicative Holt-Winters**
- Level: $\ell_t = \alpha\frac{y_t}{s_{t-m}} + (1 - \alpha)(\ell_{t-1} + b_{t-1})$
- Trend: $b_t = \beta(\ell_t - \ell_{t-1}) + (1 - \beta)b_{t-1}$
- Seasonality: $s_t = \gamma\frac{y_t}{\ell_t} + (1 - \gamma)s_{t-m}$
- Forecast: $\hat{y}_{t+h} = (\ell_t + hb_t)s_{t-m+h}$

Where:

- $y_t$ is the actual value at time $t$
- $m$ is the number of periods per season
- $h$ is the number of periods ahead to forecast

In [None]:
from statsmodels.tsa.holtwinters import ExponentialSmoothing

# Holt-Winters Method
model = ExponentialSmoothing(airline_train, trend='add', seasonal='add', seasonal_periods=12)
fitted_model = model.fit()


steps = len(airline_test)
holtwinter_forecast = fitted_model.forecast(steps=steps)

In [None]:
holtwinter_forecast_mse = mean_squared_error(airline_test, holtwinter_forecast)
print(f'Holt-Winters forecast MSE: {holtwinter_forecast_mse}')

**Exercise 12**:
Implement the Holt-Winters method for forecasting the Daily Minimum Temperatures dataset.

**Instructions**:
* Perform the Holt-Winters method on the temperature data.

In [None]:
model = 
fitted_model = model.fit()

steps = 
temp_holtwinter_forecast = fitted_model.forecast(steps=steps)

In [None]:
temp_holtwinter_forecast_mse = mean_squared_error(temp_test, temp_holtwinter_forecast)
print(f'Holt-Winters forecast MSE: {temp_holtwinter_forecast_mse}')

**Autoregressive Integrated Moving Average (ARIMA) Models**


ARIMA is a time series forecasting method that combines autoregressive (AR), differencing (I), and moving average (MA) components. It is denoted as ARIMA(p, d, q), where:
* p: Order of the autoregressive term
* d: Degree of differencing
* q: Order of the moving average term

**Formulas**:
**Autoregressive (AR) Term**
$AR(p): y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \ldots + \phi_p y_{t-p} + \varepsilon_t$

**Differencing (I)**
$\nabla^d y_t = (1 - B)^d y_t$

where $B$ is the backshift operator: $B^i y_t = y_{t-i}$

**Moving Average (MA) Term**
$MA(q): y_t = c + \varepsilon_t + \theta_1 \varepsilon_{t-1} + \theta_2 \varepsilon_{t-2} + \ldots + \theta_q \varepsilon_{t-q}$

**Combined ARIMA(p, d, q) Model**
$\phi(B)(1 - B)^d y_t = c + \theta(B)\varepsilon_t$

where:
* $\phi(B) = 1 - \phi_1 B - \phi_2 B^2 - \ldots - \phi_p B^p$ is the autoregressive operator
* $\theta(B) = 1 + \theta_1 B + \theta_2 B^2 + \ldots + \theta_q B^q$ is the moving average operator
* $\varepsilon_t$ is white noise (error term)
* $c$ is a constant

The ARIMA model is fitted by estimating the parameters $\phi_1, \ldots, \phi_p, \theta_1, \ldots, \theta_q$ using maximum likelihood estimation or other optimization techniques.

In [None]:
from statsmodels.tsa.arima.model import ARIMA

# ARIMA model
model = ARIMA(airline_train, order=(1, 1, 1))
fitted_model = model.fit()

steps = len(airline_test)
arima_forecast = fitted_model.forecast(steps=steps)

In [None]:
arima_forecast_mse = mean_squared_error(airline_test, arima_forecast)
print(f'ARIMA forecast MSE: {arima_forecast_mse}')

**Exercise 13**:
Implement an ARIMA model for forecasting the Daily Minimum Temperatures dataset.

**Instructions**:
* Replace the blanks with the correct method to implement the ARIMA model.

In [None]:
model = 
fitted_model = model.fit()

steps = 
temp_arima_forecast = fitted_model.forecast(steps=steps)

In [None]:
temp_arima_forecast_mse = mean_squared_error(temp_test, temp_arima_forecast)
print(f'ARIMA forecast MSE: {temp_arima_forecast_mse}')