## 4.2 ACF Features
All the autocorrelations of a series can be considered features of that series. We can also summarise the autocorrelation to produce new features; for example, the sum of the first ten squared autocorrelation coefficients is a useful summary of how much autocorrelation there is in a series, regardless of lag.

We can also compute autocorrelations of the changes in the series between periods. That is, we “difference” the data and create a new time series consisting of the differences between consecutive observations. Then we can compute the autocorrelations of this new differenced series. Occasionally it is useful to apply the same differencing operation again, so we compute the differences of the differences. The autocorrelations of this double differenced series may provide useful information.

Another related approach is to compute seasonal differences of a series. If we had monthly data, for example, we would compute the difference between consecutive Januaries, consecutive Februaries, and so on. This enables us to look at how the series is changing between years, rather than between months. Again, the autocorrelations of the seasonally differenced series may provide useful information.


In [None]:
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import sys
sys.path.append('../')
import numpy as np
from utils import summarize

The `feat_acf()` function in R described in the book is quite interesting - it computes a selection of autocorrelation values discussed above:
- the first autocorrelation coefficient from the original data;
- the sum of squares of the first ten autocorrelation coefficients from the original data;
- the first autocorrelation coefficient from the differenced data;
- the sum of squares of the first ten autocorrelation coefficients from the differenced data;
- the first autocorrelation coefficient from the twice differenced data;
- the sum of squares of the first ten autocorrelation coefficients from the twice differenced data;
- For seasonal data, the autocorrelation coefficient at the first seasonal lag is also returned.

This is non-trivial to reproduce in Python.

In [35]:
tourism = (
    pd.read_csv('../data/tsibble/tourism.csv')
    .assign(date=lambda df: pd.to_datetime(df['Quarter'].str.replace(' ', '')))
    .set_index('date', drop=False)
    )

In [33]:
features = (
    tourism
    .groupby('Region State Purpose'.split())
    .Trips
    .pipe(summarize, lambda x: dict(
        acf1 = x.shift().corr(x),
        acf10 = np.sum([x.shift(n).fillna(0).corr(x)**2 for n in range(1,11)]),
        diff1_acf1 = x.diff().shift().corr(x.diff()),
        diff1_acf10 = np.sum([
            x.diff().shift(n).fillna(0).corr(x.diff().fillna(0))**2
            for n in range(1,11)]),
        diff2_acf1 = x.diff().diff().shift().corr(x.diff().diff()),
        # etc.
    ))
    .unstack()
    .reset_index()
)