# Time Series

### Jack Bennetto
#### June 16, 2017

(based heavily on Matt Drury's notebook)

## Objectives

By the end of the day you should be able to:

  - Define "time series" and "time-series data".
  - Identify fundamental concepts in a time series: trend, seasonality, stickyness.
  - Use the classical decomposition to decompose and then describe a time series.
  - Define stationarity, contrast with independence.
  - Identify a stationary time series.
  - Fit ARIMA models to forecast a stationary time series.

## Agenda

#### Morning
 * Some examples
 * Trend
 * Seasonality
 * White noise & Stationarity
 * Diferencing
 * Random walks
 * MA series

#### Afternoon
 * Autocorrelation
 * Partial autocorrelation
 * Autoregressive series
 * ARIMA models
 * Model selection

In [None]:
%matplotlib inline

import os
import itertools
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.tsa.arima_model import ARIMA, ARIMAResults
from statsmodels.tsa.arima_process import ArmaProcess
from statsmodels.stats.diagnostic import acorr_ljungbox
from scipy import signal
import scipy.stats as scs

# This one is only available in the development verison of statsmodels
# Run:
#    pip install git+https://github.com/statsmodels/statsmodels.git
# to install the development version.
from statsmodels.tsa.statespace.sarimax import SARIMAX

# Always make it pretty.
plt.style.use('ggplot')

# We get some interger -> float typecast warning from sklean below, this keeps them out of our hair.
import warnings
warnings.filterwarnings('ignore')

# Time Series - Fundamental Concepts - Morning Lecture

A *time series* is a specific type of data, where measurements of a single quantity are taken over time.

When speaking in equations, we will generally represent time with an index $i$, and the observations from the series as $y_i$.  So the time series as a whole is

$$y_1, y_2, y_3, \ldots $$

### Examples

A good source for time series data is [google trends](https://www.google.com/trends/) where you can find how the popularity of a search term varies over time:

We've provided some chosen google trends in the `data` directory.

In [None]:
def load_google_trend_data(name):
    """Load saved data for a google trend.
    
    NOTE: For this to work you need to have started your notebook from the
    lecture directory.
    """
    file_name = os.path.join('.', 'data', ''.join(['search-index-', name, '.txt']))
    df = pd.read_csv(file_name)
    df = df.set_index(pd.DatetimeIndex(df.week))
    del df['week']
    return pd.Series(df[name], df.index)

Let's plot these trends and look for some interesting patterns.

In [None]:
def plot_trend_data(ax, name, series):
    ax.plot(series.index, series)
    ax.set_title("Google Search Trend For {}".format(name))
    #ax.set_ylim((0, 100))

In [None]:
google_search_names = ['baseball', 'python', 'pokemon', 'taxes', 'gdp', 'gmail', 'blackberry']

google_trends = {
    name: load_google_trend_data(name)
    for name in google_search_names
}

In [None]:
fig, axs = plt.subplots(len(google_trends), figsize=(14, 12))

for ax, name in zip(axs, google_search_names):
    plot_trend_data(ax, name, load_google_trend_data(name))
    
plt.tight_layout()

Some observations:

  - Searches for **baseball** show very strong oscillations aligned with the baseball season.  Search volume increases quickly as the season begins, and then falls off as the season goes on.
  - Searches for **python** show a general increase over time, probably influenced by the popularity of the language itself.  There may be evidence that searches decrease as specific periods of the year, most noticeably the end of year, when people are not working.
  - Searches for **pokemon** spike suddenly at a recent date, pointing to a one-time event greatly increasing interest.
  - Searches for **taxes** spike regularly at two specific points of the year, with the interest being heightened between those two dates.  At other times of the year, searches revert to a low rate.
  - Searches for **gdp** show a more random pattern than the others.  There is some evidence for *stickiness* in the trend, low values beget low subsequent values, and high values beget high subsequent values.
  - Searches for **gmail** show a pattern reminiscent of python, but there are sudden shifts where the baseline level jumps to a new value.
  - Searches for **blackberry** have a non-linear trend over time, they increase to a peak popularity, then decrease to almost nothing.

**Activity:** Load and plot the trend data for "data-science", what patterns do you see?

In [None]:
fig = plt.figure(figsize=(14,2))
ax = plt.subplot(111)
plot_trend_data(ax, 'data-science', load_google_trend_data('data-science'))

## Basic Concepts

Various different time series often show common patterns.  Attaching words to these patterns allows us to build a common language to discuss time series.

### Trends

A **trend** in a time series is a gradual change in average level as time moves on.  A trend an be *increasing*, *decreasing*, or *neither* (if, for example, a trend changes direction at some point in time). 

Searches for **python** show a steady increasing trend.

In [None]:
fig, ax = plt.subplots(1, figsize=(14, 2))
plot_trend_data(ax, 'python', google_trends['python'])

Searches for **blackberry** show a non-linear trend.

In [None]:
fig, ax = plt.subplots(1, figsize=(14, 2))
plot_trend_data(ax, 'blackberry', google_trends['blackberry'])

**Question:** How you you describe the trend in `data-science`?

You can often use a regression model to capture a general trend in the series.

Let's try to capture the linear trend in the python search data by fitting a simple linear model and using its predictions on the training data.

In [None]:
def to_col_vector(arr):
    """Convert a one dimensional numpy array to a column vector."""
    return arr.reshape(-1, 1)

def make_design_matrix(arr):
    """Construct a design matrix from a numpy array, including an intercept term."""
    return sm.add_constant(to_col_vector(arr), prepend=False)

In [None]:
def fit_linear_trend(series):
    """Fit a linear trend to a time series.  Return the fit trend as a numpy array."""
    X = make_design_matrix(np.arange(len(series)) + 1)
    linear_trend_ols = sm.OLS(series.values, X).fit()
    linear_trend = linear_trend_ols.predict(X)
    return linear_trend

In [None]:
def plot_linear_trend(ax, name, series):
    linear_trend = fit_linear_trend(series)
    plot_trend_data(ax, name, series)
    ax.plot(series.index, linear_trend)

In [None]:
fig, ax = plt.subplots(1, figsize=(14, 2))
plot_linear_trend(ax, 'python', google_trends['python'])
plt.tight_layout()

If we subtract out the fit trend from the original series, we get the **detrended series**.

In [None]:
python_series = google_trends['python']
python_linear_trend = fit_linear_trend(python_series)
python_series_detrended = python_series - python_linear_trend

fig, ax = plt.subplots(1, figsize=(14, 2))
ax.plot(python_series_detrended.index, python_series_detrended)
ax.set_title("Google Searches for Python, Detrended")
plt.tight_layout()

Detrending a series is often times a first step in analysing a time series.

Many times time series show a more sophisticated trend than a simple linear increase or decrease, in these cases a more sophisticated detrending procedure is needed.

In the following two cases, a linear detrending would be clearly inappropriate.

In [None]:
fig, axs = plt.subplots(2, figsize=(14, 4))
plot_linear_trend(axs[0], 'blackberry', google_trends['blackberry'])
plot_linear_trend(axs[1], 'gmail', google_trends['gmail'])
plt.tight_layout()

**Question:** Can you describe preciesely what is wrong with the linear model in both of these cases?

A very general approach that can be used for detrending data is to compute a **moving average**.

The moving average estimate of the trend at a data point $y_i$ is

$$ \hat y_i = \frac{1}{2w + 1} \sum_{j = -w}^{w} y_{i + j} $$

We esentially slide a *window* of a fixed side across our data, and average the values of the series within the window.

The parameter $w$ controls how far to the left and to the right of $w_i$ we look when averaging the nearby points, this is called the **window**.

In [None]:
def fit_moving_average_trend(series, window=6):
    return pd.rolling_mean(series, window, center=True)

def plot_moving_average_trend(ax, name, series, window=6):
    moving_average_trend = fit_moving_average_trend(series, window)
    plot_trend_data(ax, name, series)
    ax.plot(series.index, moving_average_trend)

In [None]:
fig, axs = plt.subplots(3, figsize=(14, 6))
plot_moving_average_trend(axs[0], 'python', google_trends['python'])
plot_moving_average_trend(axs[1], 'blackberry', google_trends['blackberry'])
plot_moving_average_trend(axs[2], 'gmail', google_trends['gmail'])
plt.tight_layout()

Varying the window parameter changes the quality of the moving average fit

In [None]:
fig, axs = plt.subplots(6, figsize=(14, 12))

gmail_series = google_trends['gmail']
for i, window in enumerate([4, 8, 12, 16, 20, 24]):
    axs[i].plot(gmail_series.index, gmail_series)
    axs[i].plot(gmail_series.index, fit_moving_average_trend(gmail_series, window=window))
    axs[i].set_title("Moving Average Smoothed Gmail Search Data with window = {} weeks".format(window))
    
plt.tight_layout()

We see that:

  - Smaller values of `window` will tend to be influenced by noise of other non-trend patterns in the series.
  - Large values of `window` produce smoother estimates of the general trend in the data.
  
For this reason, when estimating the trend component of a time series, we generally prefer larger windows.

**Activity:** Fit a moving average smooth to the data science series for various windows, and plot the results.

When we have data that aligns with calendar regularities (quarterly, weekly, yearly), it is a good idea to chose the window so that an entire annual cycle is used in the smooth.  This will average out any **seasonal** patterns in the data, as we will discuss below.

In [None]:
fig, axs = plt.subplots(3, figsize=(14, 6))
plot_moving_average_trend(axs[0], 'python', google_trends['python'], window=52)
plot_moving_average_trend(axs[1], 'blackberry', google_trends['blackberry'], window=12)
plot_moving_average_trend(axs[2], 'gmail', google_trends['gmail'], window=52)
plt.tight_layout()

### Seasonality

A **seasonal** pattern in a time series is one that tends to appear regularly, and aligns with features of the calendar.

In [None]:
fig, axs = plt.subplots(2, figsize=(14, 4))
plot_trend_data(axs[0], 'baseball', google_trends['baseball'])
plot_trend_data(axs[1], 'taxes', google_trends['taxes'])
plt.tight_layout()

Just like we can de**trend** a time series, we can also **deseasonalize** a time series.

The simplest method is to create dummy variables at regular intervals of the calender
  - A dummy for each month.
  - A dummy for each season.
  
and then fit a linear regression to the series using these dummy variables.

In [None]:
def create_monthly_dummies(series):
    month = series.index.month
    # Only take 11 of the 12 dummies to avoid strict colinearity.
    return pd.get_dummies(month).ix[:, :11]

In [None]:
def fit_seasonal_trend(series):
    dummies = create_monthly_dummies(series)
    X = sm.add_constant(dummies.values, prepend=False)
    seasonal_model = sm.OLS(series.values, X).fit()
    return seasonal_model.predict(X)

In [None]:
def plot_seasonal_trend(ax, name, series):
    seasons_average_trend = fit_seasonal_trend(series)
    plot_trend_data(ax, name, series)
    ax.plot(series.index, seasons_average_trend)

Here are linear regression models fit to the baseball and taxes trends with monthly dummy variables.

In [None]:
fig, axs = plt.subplots(2, figsize=(14, 5))

plot_seasonal_trend(axs[0], 'baseball', google_trends['baseball'])
plot_seasonal_trend(axs[1], 'taxes', google_trends['taxes'])
plt.tight_layout()

To deseasonalize, we simply subtract out the seasonal predictions.

In [None]:
baseball_series = google_trends['baseball']
baseball_seasonal_trend = fit_seasonal_trend(baseball_series)
baseball_seasonal_detrended = baseball_series - baseball_seasonal_trend

fig, ax = plt.subplots(1, figsize=(14, 2))
ax.plot(baseball_series.index, baseball_seasonal_detrended)
ax.set_title("Google Searches for Baseball, Deseasonalized")

In [None]:
baseball_series = google_trends['taxes']
baseball_seasonal_trend = fit_seasonal_trend(baseball_series)
baseball_seasonal_detrended = baseball_series - baseball_seasonal_trend

fig, ax = plt.subplots(1, figsize=(14, 2))
ax.plot(baseball_series.index, baseball_seasonal_detrended)
ax.set_title("Google Searches for Taxes, Deseasonalized")

**Question:** Have we removed the seasonality from these series?  What more could we do?

**Activity:** Deseasonalize the `taxes` series.

## Trend-Seasonal-Residual Decomposition

The **Classical Trend-Seasonal-Residual Decomposition** expresses a time series as the sum of three components:

$$ y_t = T_t + S_t + R_t $$

and is accomplished as follows.

Suppose, for definiteness, that we are working with *weekly* data, so that each $52$ observations forms a calender year.  Then, the series is decomposed as follows:

1. Compute the trend component $T_t$ using a moving average with window width $52$ (or $12$ for monthly data.  Then detrend the series.
2. Compute the seasonal component $S_t$ of the detrended series $y_t - T_t$ by averaging together the observations that fall in the same week (or month, if monthly data).  **Note**, this is *equivalent* to fitting a linear regression to the detrended data with an indicator for each week, and then making predictions for each week of the original series.
3. The **remainder**, or **error**, or **residual** series $E_t$ is $y_t - T_t - S_t$.

Statsmodels implements the classical decomposition as `seasonal_decompose`.

In [None]:
python_decomposition = sm.tsa.seasonal_decompose(google_trends['python'])

In [None]:
def plot_shared_yscales(axs, x, ys, titles):
    ymiddles =  [ (y.max()+y.min())/2 for y in ys ]
    yrange = max( (y.max()-y.min())/2 for y in ys )
    for ax, y, title, ymiddle in zip(axs, ys, titles, ymiddles):
        ax.plot(x, y)
        ax.set_title(title)
        ax.set_ylim((ymiddle-yrange, ymiddle+yrange))

In [None]:
def plot_seasonal_decomposition(axs, series, sd):
    plot_shared_yscales(axs,
                        series.index,
                        [series, sd.trend, sd.seasonal, sd.resid],
                        ["Raw Series", "Trend Component $T_t$", "Seasonal Component $S_t$", "Residual Component $R_t$"])

In [None]:
fig, axs = plt.subplots(4, figsize=(14, 8), sharex=True)
plot_seasonal_decomposition(axs, google_trends['python'], python_decomposition)
plt.tight_layout()

This decomposition validates our description of the series earlier as a general upwards trend, but with some seasonal behaviour (like a sharp drop off in queries around wintertime).

When the decomposition is working well, the residual component $R_t$ should show no seasonal or trend patterns.

In [None]:
baseball_decomposition = sm.tsa.seasonal_decompose(google_trends['baseball'])

fig, axs = plt.subplots(4, figsize=(14, 8), sharex=True)
plot_seasonal_decomposition(axs, google_trends['baseball'], baseball_decomposition)
plt.tight_layout()

Here the almost all the variation is covered by the seasonal pattern.

Sometimes the classical decomposition is not powerful enough to capture the trend or seasonal components of a time series.

The gmail series decomposition shows some interesting behaviour in the residuals:

In [None]:
gmail_decomposition = sm.tsa.seasonal_decompose(google_trends['gmail'])

fig, axs = plt.subplots(4, figsize=(14, 8), sharex=True)
plot_seasonal_decomposition(axs, google_trends['gmail'], gmail_decomposition)
plt.tight_layout()

Around October 2013 the series showed a clear and sudden change in average level, which the trend component could not adapt to quickly enough.  Consequently, this discontinuity appears in the residual series.

A more sophisticated procedure should be used to decompose this series, which allows for a discontinuity in the observed series, or fits a more sophisticated model for the trend component.

**Activity:** Decompose the data science series.  What patterns do you see?  Is the decomposition appropriate?

## Statistical Concepts

In the afternoon we will present some statistical models for time series.  For now, we will present some fundamental concepts.

### Random Processes

Up to now we have only considered a single manifestation of a single time series

$$ y_1, y_2, y_3, \ldots $$

Let's now impose a probability model on this data, i.e., now we consider series as **random objects that can be sampled**.

Formally, we consider a single time series as a *sample* from a sequence of random variables

$$ Y_1, Y_2, Y_3, \ldots $$

Note that we are **not** assuming that these random variables are independent, **it is very likely that in a time series that sample we draw at time $t$ influences the sample we draw at time $t+1$**.

When we need to distinguish the data from the statistical model that we are assuming generated the data, we call:

  - The data are a **series**.
  - The data generating process a **random process**, or more specifically, a **time-series random process**.

### White Noise

The simplest possible random process from this perspective occurs when each $Y_i$ is independent from all the rest, and all the $Y$'s are identically distributed with a mean of zero.

In this case the series is called **white noise**.

In [None]:
def white_noise(size=250):
    return pd.Series(np.random.normal(size=size))

In [None]:
fig, axs = plt.subplots(4, figsize=(14, 8))
for i in range(4):
    noise = white_noise()
    axs[i].plot(noise.index, noise)
    if i == 0:
        axs[i].set_title("White Noise Series")

White noise shows no trend, no seasonal patterns, no cyclic behaviour, and no stickiness.

### Stationarity

White noise has the simplest possible structure, everything is independent.

As mentioned, in most time series observations will **not** be independent.  Nonetheless, there is a useful concept that characterizes the type of dependence found in many time series.

A time series random process $Y_1, Y_2, Y_3, \ldots$ is said to be **strictly stationary** if the distribution of all equally spaced tuples taken from the series are identical.  That is, if

$$ (Y_{i_1}, Y_{i_2}, \ldots, Y_{i_k}) $$

and

$$ (Y_{i_1 + k}, Y_{i_2 + k}, \ldots, Y_{i_3 + k}) $$

are always identically distributed.

**This means that** the properties of the series **do not** depend on when you begin observing it.

**This means that** if you start observing the series, watch for an hour, go cook a meal for an hour, then watch the series for another hour, **it does not matter if the meal you cooked was breakfast, lunch, or dinner** the properties of the series you observe will be the same in any case.

**Discussion:** Can stationary series show trends?  Can stationary series show seasonality?  Can stationary series be sticky?

**Discussion:** Can the variance of a stationary series change over time?

**Discussion:** Is the following series stationary? No? Summarize all the reasons why not.

In [None]:
file_name = os.path.join('.', 'data', 'jj.txt')
df = pd.read_csv(file_name)
earnings_series = pd.Series(df.earnings)

In [None]:
fig, ax = plt.subplots(1, figsize=(14, 4))
ax.plot(earnings_series.index, earnings_series)
ax.set_title("Company Earnings Over Time")
fig.tight_layout()

**Question:** How would you transform this series to be stationary?
  - How would you remove the trend?
  - How would you stabilize the variance?  [This may help](http://stats.stackexchange.com/questions/18930/transformation-for-stabilizing-variance-in-time-series).

### Differencing

Most series encountered in nature are **not** stationary, but there is often a simple way to transform one to be stationary.

Given a time series $y_1, y_2, y_3, \ldots$, the **first differences series** is

$$ y_2 - y_1, y_3 - y_2, y_4 - y_3, \ldots $$

The aim of differencing a series is to remove the trend component, which stationary series cannot have.

In [None]:
def plot_series_and_difference(axs, series, title):
    diff = series.diff()
    axs[0].plot(series.index, series)
    axs[0].set_title("Raw Series: {}".format(title))
    axs[1].plot(series.index, diff)
    axs[1].set_title("Series of First Differences: {}".format(title))

In [None]:
fig, axs = plt.subplots(2, figsize=(14, 4))
plot_series_and_difference(axs, google_trends['python'], 'python')
fig.tight_layout()

In [None]:
fig, axs = plt.subplots(2, figsize=(14, 4))
plot_series_and_difference(axs, google_trends['gmail'], 'gmail')
fig.tight_layout()

**Discussion:** Do these difference series look stationary?

**Activity:** Compute and plot the first differences series for data science.  Does it look stationary?

**Question:** If you are given the *differences* of a series, how can you recover the original series?

### Example: Random Walks

When a differenced series is *white noise* (**note**: *not all stationary series are white noise*, as we will demonstrate below) we have the following relationship

$$ y_{i+1} - y_i = e_i $$

or 

$$ y_{i+1} = y_i + e_i $$

This type of series is called a **random walk** and is a very common statistical model for time series.

In [None]:
def random_walk(size=250):
    noise = np.random.normal(size=(size + 1))
    walk = np.cumsum(noise)
    return pd.Series(walk)

In [None]:
def plot_random_walk(ax, size=250):
    walk = random_walk(size=size)
    ax.plot(walk.index, walk)

In [None]:
fig, axs = plt.subplots(4, figsize=(14, 8))
for i in range(4):
    walk = random_walk()
    diff = walk.diff()
    if i%2 == 0:
        axs[i].plot(walk.index, walk)
        axs[i].set_title(" Random Walk Series")
    if i%2 == 1:
        axs[i].plot(diff.index, diff)
        axs[i].set_title("Differenced Random Walk Series") 
plt.tight_layout()

Notice that random walks, although constructed from totally random components, can show very strong evidence of *trends*.  This is because random walks are sticky, once they randomly elevate to an extreme value, subsequent values of the series are only small adjustments to previous values, so the value of the series tends to stay elevated.

**Question:** Is the differenced series from a random walk white noise?

### Example: Linear Combinations of White Noise

A simple way to construct a diverse set of stationary series is as **linear combinations of white noise**.

Suppose that 

$$ e_1, e_2, e_3, \ldots $$

is a white noise process (what was that again?).  Then any series created by linearly combining lagged values of $e$ is stationary

$$ s_i = \mu + e_i + a_1 e_{i-1} + a_2 e_{i-2} + \cdots + a_k e_{i-k} $$

In [None]:
def linear_combination_of_white_noise(size, coef):
    coef = np.asarray(coef)
    n_coef = len(coef)
    noise_size = size + len(coef)
    noise = np.random.normal(size=noise_size)
    # np.convolve reverses the second array :/
    # We need to subset the result to remove edge effects.
    lc = np.convolve(noise, coef[::-1])[(n_coef-1):(size+n_coef-1)]
    return pd.Series(lc)

In [None]:
def plot_linear_combination_of_white_noise(ax, size, coef):
    series = linear_combination_of_white_noise(size, coef)
    ax.plot(series.index, series)
    ax.set_title("Linear Combination of White Noise with Coefficients {}".format(
            str(list(coef[1:]))))

In [None]:
fig, axs = plt.subplots(4, figsize=(14, 8))
plot_linear_combination_of_white_noise(axs[0], 250, [1, 1])
plot_linear_combination_of_white_noise(axs[1], 250, [1, -1])
plot_linear_combination_of_white_noise(axs[2], 250, [1, 1, 1, 1, 1])
plot_linear_combination_of_white_noise(axs[3], 250, [1, 0.5, 0.25, 0.1])
plt.tight_layout()

**All the series above are sampled from a stationary process.**

A linear combination of white noise is traditionally called a **MA series**.

  - An MA(1) contains one lagged term: $s_i = \mu + e_i + a_1 e_{i-1}$
  - An MA(2) contains two lagged terms: $s_i = \mu + e_i + a_1 e_{i-1} + a_2 e_{i-2}$
  - And so on...
  
**Note:**  MA stands for *moving average*, even though this has *nothing to do with* a moving average as we discussed earlier.  Unfortunately, the name has stuck, and you'll have to get use to it.  My recommendation is to call them "em ay" series, and reserve the full words for true moving averages.

**Activity:** Take a few minuets to play around with generating some MA series.  Can you create any interesting behaviour out of pure randomness?

**Discussion:** What sort of real-life process would create an MA series?

### Tests for Stationarity

Stationarity can be hard to visualize, so there are some statistical hypothesis tests that you can use to indicate stationarity.

The **Augmented Dickey-Fuller test** is setup as follows:

  - $H_0$: The series is **not-stationary** (specifically, has a *unit root*).
  - $H_{\text{a}}$: The series is stationary.
  
It is available in stats models as `tsa.stattools.adfuller`.

White noise series are stationary:

In [None]:
for i in range(5):
    test = sm.tsa.stattools.adfuller(white_noise(size=250))
    print("ADF p-value: {0:.3f}".format(test[1]))

MA series are stationary:

In [None]:
for coef in [[1, 1], [1, -1], [1, 1, 1, 1], [1, 0.5, 0.25, 0.1]]:
    series = linear_combination_of_white_noise(size=250, coef=coef)
    test = sm.tsa.stattools.adfuller(series)
    print("ADF p-value: {0:.3f}".format(test[1]))

What about some of the series we found in nature?

The python series has an obvious trend, it is clearly not stationary, and the test confirms.

In [None]:
test = sm.tsa.stattools.adfuller(google_trends['python'])
print("ADF p-value for python series: {0:2.2f}".format(test[1]))

Maybe differencing helped?

In [None]:
test = sm.tsa.stattools.adfuller(google_trends['python'].diff()[1:])
print("ADF p-value for differenced python series: {0:2.2f}".format(test[1]))

### **It sure did!**

The baseball series is also stationary after differencing.

In [None]:
test = sm.tsa.stattools.adfuller(google_trends['baseball'].diff()[1:])
print("ADF p-value for differenced baseball series: {0:2.2f}".format(test[1]))

**Activity:** Test some of the other series we've been experimenting with for stationality.  Does differencing improve the situation?

In [None]:
test = sm.tsa.stattools.adfuller(google_trends['gmail'].diff()[1:])
print("ADF p-value for differenced gmail series: {0:2.2f}".format(test[1]))

**Final Activity:** Can you transform the `earnings` series from earlier to be stationary?  There is a step you will have to take that we have **not** discussed.  We will begin the afternoon lecture by discussing this example.

In [None]:
# Work goes here.

# Time Series - Models for Stationary Series - Afternoon Lecture

For the rest of the day, we will be working with *stationary* series, and developing a statistical model for them.

## Autocorrelation

Recall the definition of strictly stationary.

A time series $Y_1, Y_2, Y_3, \ldots$ is said to be **strictly stationary** if the distribution of all equally spaced tuples taken from the series are identical.  That is, if

$$ (Y_{i_1}, Y_{i_2}, \ldots, Y_{i_k}) $$

and

$$ (Y_{i_1 + k}, Y_{i_2 + k}, \ldots, Y_{i_3 + k}) $$

are always identically distributed.

One consequence of this definition is that the correlation between two $Y$'s only depends on the lag between them:

$$ Corr(Y_5, Y_2) = Corr(Y_4, Y_1) = Corr(Y_{144}, Y_{141}) = \cdots $$

These correlations between a time series and a lagged version of itself are called **autocorrelations**.

$$ \gamma_k(Y)= Corr(Y_1, Y_{1+k}) = Corr(Y_2, Y_{2+k}) = Corr(Y_3, Y_{3+k}) = \cdots $$

In [None]:
def series_and_lagged(series, lag=1):
    truncated = np.copy(series)[lag:]
    lagged = np.copy(series)[:(len(truncated))]
    return truncated, lagged

def compute_autocorrelation(series, lag=1):
    series, lagged = series_and_lagged(series, lag=lag)
    return np.corrcoef(series, lagged)[0, 1]

Recall from earlier that the differenced baseball series **is** stationary (according to the ADF test).

In [None]:
fig, axs = plt.subplots(2, figsize=(14, 4))
plot_series_and_difference(axs, google_trends['baseball'], 'baseball')
fig.tight_layout()

We can draw some scatterplots to summarize the autocorrelation information in the baseball series.

In [None]:
baseball_diff = google_trends['baseball'].diff()[1:]

fig, axs = plt.subplots(3, 3, figsize=(8, 8))

for i, ax in enumerate(axs.flatten()):
    series, lagged = series_and_lagged(baseball_diff, lag=i)
    autocorr = compute_autocorrelation(baseball_diff, lag=i)
    ax.scatter(series, lagged, alpha=0.5)
    ax.set_title("Lag {0} AC: {1:2.2f}".format(i, autocorr))
    
plt.tight_layout()

**Question:** Why does the first scatterplot here lie along a straight line?

A more immediate way to view the autocorrelation in a series is with an **autocorrelation plot**.  

In [None]:
fig, ax = plt.subplots(1, figsize=(16, 4))

_ = sm.graphics.tsa.plot_acf(baseball_diff, lags=2*52, ax=ax)

Each spike in this plot is an autocorrelation for a single lag.

The *first* spike is always at $1.0$ (because any series is perfectly correlated with itself.

The shaded region are 95% confidence bounds.  If *all* of the autocorrelations were truly zero, we would still expect to see about 5% of the spikes exceeding these bounds.

For the baseball example, we observe the following:
  - There is a very striking spike at exactly 52 lags, which is the number of weeks in a year.  This is repeated, less significantly, at a lag of twice 52.
  - The first and 53'rd autocorrelations are positive and significant.

**Discussion:** How do you interpret the large spike in autocorrelation at $52$ weeks?

**Discussion:** Could the spike at $104$ weeks be related to the spike at $52$ weeks?

**Discussion:** Does the spike at 52 weeks *prevent the series from being stationary*?

One potential issue with autocorrelations is the following:
    
If $y_i$ is correlated with $y_{i-1}$ **and** $y_{i-1}$ is correlated with $y_{i-2}$ then $y_{i}$ **necessarily** has some influence on $y_{i-2}$

## Partial Autocorrelation

A **partial autocorrelation** plot controls for this effect.

In [None]:
fig, ax = plt.subplots(1, figsize=(16, 4))

_ = sm.graphics.tsa.plot_pacf(baseball_diff, lags=2*52, ax=ax)

Here, the spike at $52$ weeks still appears, but the subsequent spike at $104$ weeks has disappeared.  This demonstrates that the spike at $104$ weeks was almost completely caused by the spike at $52$ weeks.

**Discussion:** How do you think the partial autocorrelation plot is constructed.  **Hint**: It uses linear regression in a fundamental way.

**Discussion:** What would you expect from an autocorrelation plot of white noise?

### Autocorrelation of MA series

Recall that MA series are linear combinations of white noise with a possible constant term

$$ s_i = \mu + e_i + a_1 e_{i-1} + \cdots + a_k e_{i-k} $$

In [None]:
fig, axs = plt.subplots(4, figsize=(14, 8))

ma_coefs = [[1, 0.5], [1, -0.5], [1, 0.5, 0.5], [1, -0.5, 0.5, -0.25, 0.25]]
for i, (ax, coef) in enumerate(zip(axs, ma_coefs)):
    plot_linear_combination_of_white_noise(axs[i], 250, coef)

plt.tight_layout()

**Discussion:** What will the autocorrelation of a MA series look like?

In [None]:
fig, axs = plt.subplots(4, figsize=(14, 8))

for i, (ax, coef) in enumerate(zip(axs, ma_coefs)):
    series = linear_combination_of_white_noise(size=250, coef=coef)
    _ = sm.graphics.tsa.plot_acf(series, lags=25, ax=ax)
    ax.set_title("Autocorrelation in MA With Coefficients {}".format(str(list(coef[1:]))))
    
plt.tight_layout()

For data generated from an MA model, the number of non-zero autocorrelations generally indicates the *order* of the model (the number of non-zero coefficients).

**Discussion:** Why?

### MA Models

Given stationary time series data, a reasonable question to ask is the following:
    
> Which MA series would be most likely to generate this data?"

This is the concept behind MA models.

A MA (moving average) model of order $k$ (also known as a $MA(k)$ model) attempts to represent a stationary time series as a linear combination of white noise:

$$ s_i = \mu + e_i + a_1 e_{i-1} + \cdots + a_k e_{i-k} $$

**Discussion:** What technique do you think is used to determine the estimated coefficients when an MA model is fit?

In [None]:
def fit_ma_model(series, order):
    # ARIMA is a more general model, which we are going to build up
    # in pieces.
    model = ARIMA(series, order=(0, 0, order))
    return model.fit()

Let's generate some MA data, fit MA models, and compare the coefficients.

In [None]:
def format_list_of_floats(L):
    return ["{0:2.2f}".format(f) for f in L]

In [None]:
for coef in ma_coefs:
    ma_series = linear_combination_of_white_noise(size=250, coef=coef)
    ma_model = fit_ma_model(ma_series.values, order=(len(coef) - 1))
    print("True Coefficients: {0: <20} Estimated Coefficients: {1: <20}".format(
            str(format_list_of_floats(coef[1:])),
            str(format_list_of_floats(ma_model.params[1:]))))

So it looks like the MA models recover the true coefficients from *actual* MA processes.  So far, so good.

Before we apply the MA model to real data, we need to introduce AR processes, and generalize everything to the ARIMA model.

### Intermission: Recursive Sequences

A **recursive sequence** (of numbers) is one where the subsequent values in the sequence are defined as functions of the preceding values.  For example:

  - $y_i = 1$
  - $y_i = y_{i-1} + 1; \ y_1 = 0$
  - $y_i = - y_{i-1}; \ y_1 = 1$
  - $y_i = 0.5 y_{i-1} + 0.25 y_{i-2}; \ y_1 = 1; y_2 = 1$

**Activity:** Sketch pictures of these recursive sequences.

**Bonus:**: Write a python function to draw a plot of a recursive sequence.  How should you specify the arguments to this function?  How can you factor the task into multiple functions so that each has a single responsibility?

### AR Processes

Another data generating process that (often, there are some restrictions on the coefficients) results in a stationary process is an **autoregressive** process.

In an autoregressive process, subsequent values of the series are linear combinations of previous values of the series plus some noise term (much like in regression we get a linear combination of predictors plus a noise term)

$$ y_i = \mu + b_1 y_{i-1} + b_2 y_{i-2} + \cdots + b_k y_{i-k} + \epsilon_i $$

The number of coefficients in this equation is called the *order* of the model, and we often speak of $AR(1)$ processes, $AR(2)$ processes, etc...

In [None]:
def auto_regressive_process(size, coefs, init=None):
    """Generate an autoregressive process with Gaussian white noise.  The
    implementation is taken from here:
    
      http://numpy-discussion.10968.n7.nabble.com/simulate-AR-td8236.html
      
    Exaclty how lfilter works here takes some pen and paper effort.
    """
    coefs = np.asarray(coefs)
    if init == None:
        init = np.array([0]*len(coef))
    else:
        init = np.asarray(init)
    init = np.append(init, np.random.normal(size=(size - len(init))))
    assert(len(init) == size)
    a = np.append(np.array([1]), -coefs)
    b = np.array([1])
    return pd.Series(signal.lfilter(b, a, init))

In [None]:
def plot_autoregressive_process(ax, size, coefs, init=None):
    ar = auto_regressive_process(size, coefs, init)
    ax.plot(ar.index, ar)

In [None]:
ar_coefs = [[0.5], [-0.5], [1.5, -0.5], [1.5, -1, 0.25]]

fig, axs = plt.subplots(4, figsize=(14, 8))

for i, (ax, coefs) in enumerate(zip(axs, ar_coefs)):
    plot_autoregressive_process(ax, 250, coefs)
    ax.set_title("Autoregressive Process with Coefficients {}".format(str(list(coefs))))

plt.tight_layout()

**Discussion:** What will the autocorrelation of an AR series look like?

In [None]:
fig, axs = plt.subplots(4, figsize=(14, 8))

for i, (ax, coef) in enumerate(zip(axs, ar_coefs)):
    series = auto_regressive_process(size=250, coefs=coef)
    _ = sm.graphics.tsa.plot_acf(series, lags=25, ax=ax)
    ax.set_title("Autocorrelation in AR With Coefficients {}".format(str(list(coef))))
    
plt.tight_layout()

In an AR process, even thought subsequent data points are only *directly* influenced by a few previous data points, the influence tends to linger for a long time.

We can see this lingering influence in:
  - The gradual decrease in autocorrelation when the initial coefficient is positive.
  - The sinusoidal behaviour when there is negative coefficients.

For AR processes, controlling for this lingering influence with a *partial* autocorrelation plot is important.

In [None]:
fig, axs = plt.subplots(4, figsize=(14, 8))

for i, (ax, coef) in enumerate(zip(axs, ar_coefs)):
    series = auto_regressive_process(size=250, coefs=coef)
    _ = sm.graphics.tsa.plot_pacf(series, lags=25, ax=ax)
    ax.set_title("Partial Autocorrelation in AR With Coefficients {}".format(str(list(coef))))
    
plt.tight_layout()

In the partial autocorrelation plot, we see very distinctly that the number of significant spikes is directly influenced by the number of non-zero coefficients.

**Discussion:** What sort of a process would generate an autoregressive series?

### AR Models

Given stationary time series data, a reasonable question to ask is the following:
    
> Which AR series would be most likely to generate this data?

This is the concept behind AR models.

In [None]:
def fit_ar_model(series, order):
    # ARIMA is a more general model, which we are going to build up
    # in pieces.
    model = ARIMA(series, order=(order, 0, 0))
    return model.fit()

Let's generate some AR data, fit some AR models, then compare the coefficients.

In [None]:
for coef in ar_coefs:
    ar_series = auto_regressive_process(size=250, coefs=coef)
    ar_model = fit_ar_model(ar_series.values, order=(len(coef)))
    print("True Coefficients: {0: <26} Estimated Coefficients: {1: <26}".format(
            str(format_list_of_floats(coef)),
            str(format_list_of_floats(ar_model.params[1:]))))

Just like for MA models, it looks like out AR models recover the true coefficients from an AR process.

### ARMA Models

Now that we have discussed AR and MA models, we can *combine* them into ARMA models, which have both AR and MA components.

ARMA models have both:
  - A **linear combination of white noise** component (the MA), where subsequent values of the series are (partially) a linear combination of white noise.
  - An **autoregressive component** the (AR), where subsequent values of the series are (partially) a linear combination of previous terms, plus noise.

  
$$ y_i = \mu + \overbrace{e_i + a_1 e_{i-1} + \cdots + a_q e_{i-q}}^{\text{MA component}} + \overbrace{b_1 y_{i-1} + b_2 y_{i-2} + \cdots + b_p y_{i-p} + \epsilon_i}^{\text{AR component}} $$

where $a_1...a_q$ are the MA coefficients, $b_1...b_p$ are the AR coefficients, $e_i$ is a white-noise series, and $\epsilon_i$ is an error term.

### ARIMA Models

The **I** in ARIMA stands for **integrated**.

The word *integration* is used here as the opposite of *difference*.

**Recall:** We often have to take the *first differences* of series we find in nature to make then *stationary*.


In [None]:
fig, axs = plt.subplots(2, figsize=(14, 4))
plot_series_and_difference(axs, google_trends['python'], 'python')
fig.tight_layout()

MA and AR (and, consequently ARMA) models *only apply to* stationary series.

So to model series in nature, we often have to model the *differenced* series as an ARMA process.

This leads us, finally, to **ARIMA**.

An **ARIMA(p, d, q)** model:
  - Applies an ARMA model to a series that has been differenced $d$ times.
  - The AR part of the ARMA model has order $p$.
  - The MA part of the ARMA model has order $q$.

Let's explore generating some data as an ARIMA process.

In [None]:
import six.moves
def arima_process(size, ar_coefs, ma_coefs, d=0):
    """Simulate a series from an arima model."""
    arma = ArmaProcess(ar_coefs, ma_coefs)
    arma_series = arma.generate_sample(size + d)
    # Integrate d times.
    for i in six.moves.range(d):
        arma_series = np.cumsum(arma_series)
    return pd.Series(arma_series)

In [None]:
def plot_arima_process(ax, size, ar_coefs, ma_coefs, d=0):
    series = arima_process(size, ar_coefs, ma_coefs, d)
    ax.plot(series.index, series)

In [None]:

ar_coefs = [[0.5], [-0.5]]
ma_coefs = [[0.5], [-0.5]]
ds = [0, 1]

fig, axs = plt.subplots(8, figsize=(14, 15))

for ax, (ar, ma, d) in six.moves.zip(axs, itertools.product(ar_coefs, ma_coefs, ds)):
    plot_arima_process(ax, 250, ar, ma, d)
    ax.set_title("ARIMA Process with AR={}, MA={}, d={}".format(
        str(list(ar)), str(list(ma)), d))
    
plt.tight_layout()

And, one final time, fitting an ARIMA model recovers the coefficients of a simulated ARIMA process.

### Case Study:  Electrical Equipment Orders Data

Let's attempt to fit an ARIMA model to a series of electrical orders, and use it to forecast future values.

In [None]:
file_name = os.path.join('.', 'data', 'elec-equip.csv')
df = pd.read_csv(file_name)
electric_series = pd.Series(df['x'].values, 
                            pd.DatetimeIndex(start='01-1996', end='12-2011', freq='M'))

In [None]:
fig, axs = plt.subplots(2, figsize=(14, 4))
plot_series_and_difference(axs, electric_series, "Electric Sales By Month")
plt.tight_layout()

The original series is clearly **not** stationary, so we differenced it.  Ther resulting series looks stationary, but let's use a test to check.

In [None]:
electric_differences = electric_series.diff()[1:]
test = sm.tsa.stattools.adfuller(electric_differences)
print("ADF p-value for differenced electric series: {0:2.2f}".format(test[1]))

So the differenced series is stationary, and we can attempt to model it as an ARMA process.

We need to determine the correct order for the AR and MA models, so let's look at the auto and partial auto correlation plots.

In [None]:
fig, ax = plt.subplots(1, figsize=(14, 3))
_ = sm.graphics.tsa.plot_acf(electric_differences, lags=25, ax=ax)

In [None]:
fig, ax = plt.subplots(1, figsize=(14, 3))
_ = sm.graphics.tsa.plot_pacf(electric_differences, lags=25, ax=ax)

We see the following evidence:
  - Oscillatory behaviour in the autocorrelation plot.
  - Three significant partial autocorrelations at the beginning of the plot.
  
This suggests that the differenced series may be well modeled as an $AR(3)$ series.

Since we applied *one* difference to achieve seasonality, this suggests that we should model the *original* series as an $ARIMA(3, 1, 0)$.

In [None]:
electric_model = ARIMA(electric_series, order=(3, 1, 0)).fit()

In [None]:
print("ARIMA(3, 1, 0) coefficients from Electric model:\n  Intercept {0:2.2f}\n  AR {1}".format(
    electric_model.params[0], 
        format_list_of_floats(list(electric_model.params[1:]))
    ))

Let's simulate some data using these coefficients and see if it looks like our differenced data.

In [None]:
fig, ax = plt.subplots(4, figsize=(14, 8))

ax[0].plot(electric_differences.index, electric_differences)
ax[0].set_title("First Differences of Electric Data")

for i in range(1, 4):
    simulated_data = auto_regressive_process(len(electric_differences), 
                                             list(electric_model.params)[1:])
    simulated_data.index = electric_differences.index
    ax[i].plot(simulated_data.index, simulated_data)
    ax[i].set_title("Simulated Data from Electric Model Fit")
    
plt.tight_layout()

We can also use the fit model to project electric sales into the future.

In [None]:
electric_series.reindex(pd.DatetimeIndex(start='01-1996', end='12-2012', freq='M'))

fig, ax = plt.subplots(1, figsize=(14, 4))
ax.plot(electric_series.index, electric_series)
fig = electric_model.plot_predict('2011-11-30', '2013', 
                                  dynamic=True, ax=ax, plot_insample=False)

_ = ax.legend().get_texts()[1].set_text("95% Prediction Interval")
_ = ax.legend(loc="lower left")

_ = ax.set_title("Electric Series Forcasts from ARIMA Model")

Zooming in, we can see that we **have** done a bit better than the naive forecast of predicting the last observed value.

In [None]:
electric_series.reindex(pd.DatetimeIndex(start='01-1996', end='12-2012', freq='M'))

fig, ax = plt.subplots(1, figsize=(14, 4))
ax.plot(electric_series['2010':].index, electric_series['2010':])
fig = electric_model.plot_predict('2011-11-30', '2013', 
                                  dynamic=True, ax=ax, plot_insample=False)

_ = ax.legend().get_texts()[1].set_text("95% Prediction Interval")
_ = ax.legend(loc="lower left")

_ = ax.set_title("Electric Series Forecasts from ARIMA Model")

If our model is good, the residuals should have no patterns whatsoever, i.e. be *white noise*

In [None]:
fig, ax = plt.subplots(1, figsize=(14, 3))
ax.plot(electric_model.resid.index, electric_model.resid)
ax.set_title("Residuals from Electric Model")
plt.tight_layout()

One way to diagnose this is to check if any autocorrelation remains in the residuals.

In [None]:
fig, ax = plt.subplots(1, figsize=(14, 3))
_ = sm.graphics.tsa.plot_acf(electric_model.resid, lags=50, ax=ax)

All the autocorrelations are within the confidence bands, so it looks like our model fits the data well.

### Model Selection: AIC

We made our selection of model based on the autocorrelation and partial autocorrelation plot, which is a good first step.

The overall procedure we followed is called the **Box-Jenkins Method**.

Hyndmann recommends also trying a few models that are "close by", varying the ARMA parameters slightly, and then selecting the best model from the results using an information criteria.

In [None]:
parameters = [
    {'AR': 3, 'MA': 0},
    {'AR': 2, 'MA': 0},
    {'AR': 4, 'MA': 0},
    {'AR': 3, 'MA': 1}
]

models = {}
for params in parameters:
    models[(params['AR'], params['MA'])] = ARIMA(electric_series, order=(params['AR'], 1, params['MA'])).fit()
    
for model_params in models:
    print("ARIMA({}, 1, {}) AIC: {}".format(model_params[0], model_params[1], models[model_params].aic))

The $ARIMA(3, 1, 1)$ model has a *slightly* better AIC, so following Hyndmann, we would select this as our final model.

The overall methodology for producing an ARIMA model is summarized in the following flowchart, taken from Hyndmann.

![Hyndmann Modeling Process](img/hyndman-modeling-process.png)

### Case Study: Baseball Searches

Let's complete our day with one more case study, the baseball search series we have been referencing the entire day.

In [None]:
fig, axs = plt.subplots(2, figsize=(14, 4))
plot_series_and_difference(axs, google_trends['baseball'], 'baseball')
fig.tight_layout()

Recall that the differenced series **was** found to be stationary, but the partial autocorrelation plot had a curious feature.

In [None]:
fig, ax = plt.subplots(1, figsize=(16, 4))

_ = sm.graphics.tsa.plot_pacf(baseball_diff, lags=3*52, ax=ax)

That spike at $52$ weeks is something we don't know how to deal with yet!

There is a generalization of ARIMA that handles seasonality, which is included in the (development) version of `statsmodels` as SARIMAX.  The seasonal ARIMA model fits sub-ARIMA models to each seasonal sequence of data.

A seasonal ARIMA model has more parameters

$$ SARIMA(p, d, q)(sp, sd, sq)_k$$

The $p$, $d$, and $q$ indices mean exactly the same thing as before.

The $k$ is the length of a season, i.e., the observation window that exhibits periodic behaviour.

In our example we observe the following:
  - The seasonal behaviour in the partial autocorrelation plot happens at $52$ weeks.
  - There is only *one* significant seasonal partial autocorrelation.
  - There is only *one* significant initial partial autocorrelation.

Altogether, this means that an appropriate model would be a

$$ SARIMA(1, 1, 0)(1, 0, 0)_{52} $$

In [None]:
baseball_series = google_trends['baseball']

baseball_model = SARIMAX(baseball_series, order=(1, 1, 0), seasonal_order=(1, 0, 0, 52)).fit()

Let's check the residuals for white noise'ness

In [None]:
fig, ax = plt.subplots(1, figsize=(14, 3))
ax.plot(baseball_model.resid.index, baseball_model.resid)
ax.set_title("Residuals from Baseball Model")
plt.tight_layout()

In [None]:
fig, ax = plt.subplots(1, figsize=(14, 3))
_ = sm.graphics.tsa.plot_acf(baseball_model.resid, lags=3*52, ax=ax)

There is one significant autocorrelation, but we expect at least a couple due to random chance, so this is satisfying.

To wrap up, let's see how the forecasts look.

In [None]:
# Pandas is a bit fussy about reindexing here, we need to extend the dates but
# Pandas's weekly series does not align with google's.
baseball_series.index = pd.DatetimeIndex(start='10-7-2011', end='9-25-2016', freq='W')
baseball_series = baseball_series.reindex(pd.DatetimeIndex(
        start='10-7-2011', end='2020', freq='W'))

In [None]:
fig, ax = plt.subplots(1, figsize=(16, 4))
ax.plot(baseball_series.index, baseball_series)

preds = baseball_model.predict('2016-09-17', '2020', 
                   dynamic=True, ax=ax, plot_insample=False)

ax.plot(preds.index, preds)
ax.set_title("Forecasts for Baseball Search Popularity")