# Time Series

In this notebook we'll both introduce new time series techniques and practice ones we learned in class.

In [None]:
# import the packages we'll use
## For data handling
import pandas as pd
import numpy as np
from numpy import meshgrid

## For plotting
import matplotlib.pyplot as plt
import seaborn as sns

## This sets the plot style
## to have a grid on a white background
sns.set_style("whitegrid")

### Handling Datetimes

Write a function that takes in someone's birthday and calculates their age. This may be trickier than you think, remember leap years exist. Also feel free to ignore timezones, you may assume everyone is in the same timezone.

In [None]:
## Code here












In [None]:
## Code here












In [None]:
## Code here












### Polar Plots

A type of plot that may help identify seasonality in your data is a <i>polar plot</i>. For example, from <a href="https://otexts.com/fpp2/seasonal-plots.html">Forecasting: Principles and Practice</a>
<img src="PolarPlot.png" style="width:60%"></img>

Read through the `matplotlib` documentation, <a href="https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.polar.html">https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.polar.html</a>, <a href="https://matplotlib.org/3.2.1/gallery/pie_and_polar_charts/polar_scatter.html">https://matplotlib.org/3.2.1/gallery/pie_and_polar_charts/polar_scatter.html</a>. And use Google to find out how to make one for any of the time series data sets in the repository.

In [None]:
## Code here












In [None]:
## Code here












In [None]:
## Code here












### A Simple Method for Trends - The Drift Method

Another simple forecasting method we didn't mention in class allows you to build a forecast for trending data.

This is a variation on the na&iuml;ve method allows for your forecast to increase or decrease over time, using the average change from the historical (training) data.

The forecast is given by:
$$
\hat{y}_{T+h|T} = y_T + \frac{h}{T-1} \sum_{t=2}^T (y_t - y_{t-1}) = y_T + h\left( \frac{y_T - y_1}{T-1} \right)
$$

Use this to build a forecast on the `elec` data. Use a maximum horizon of $2$ years. Plot the data with the forecast you've made.

In [None]:
## Code here












In [None]:
## Code here












### Transformation

Sometimes it is helpful to transform your time series data before fitting a model to it.

#### Calendar Transformation

These types of transformations can help remove variations in the data that are due to simple calendar effects. For example, it is possible that some months have a greater production simply due to the fact that they have more days than other months.

Read in the `milk.csv` data below. Then adjust the monthly values to be gallons per day as opposed to just gallons. Make a new plot of month vs gallons per day, what do you notice.

In [None]:
milk = pd.read_csv("milk.csv", parse_dates=['month'])
milk.head()

In [None]:
plt.figure(figsize=(12,6))

plt.plot(milk.month,milk.gallons)

plt.show()

In [None]:
## Code here












In [None]:
## Code here












In [None]:
## Code here












#### Box-Cox Transformations

Sometimes it may be useful to perform a mathematical transformation to the data. For example, it is possible to have a time series where the seasonal variation increases or decreases over time. It is more desirable to have a time series with seasonal variation that is constant over time.

One common transformation that is helpful the <i>box-cox transformation</i>. For a given value of $\lambda$ the box-cox transfomation takes $y_t$ and produces a new time series $w_t$ like so:
$$
w_t = \left\lbrace \begin{array}{l l}
    \log(y_t) & \text{if } \lambda = 0, \\
    \left( y_t^\lambda - 1 \right)/\lambda & \text{else}.
\end{array} \right.
$$

Write a function that will take in a value of $\lambda$ and perform a Box-Cox transformation on a time series. Use this function to find a value of $\lambda$ in $[0,2]$ that produces a constant variation across time for the `elec.csv` data. Do this through visual inspection.

In [None]:
## Code here












In [None]:
## Code here












In [None]:
## Code here












### Linear Regression with Time Series Data

All of the material we learned in the Regression materials does not immediately go out the window when dealing with time series data. However, we now have to be more careful in the modeling process.

Read in the `uschange.csv` data below. In addition to our standard regression exploratory data analysis create side by side time series plots of the variables in the dataset. Then build a multiple linear regression model to predict `Consumption` using the other variables.

In [None]:
uschange = pd.read_csv("uschange.csv")

In [None]:
## Code here












In [None]:
## Code here












In [None]:
## Code here












In [None]:
## Code here












Check your model residuals for the following time series version of our linear regression assumptions.

<q>First, we assume that the model is a reasonable approximation to reality; that is, the relationship between the forecast variable and the predictor variables satisfies this linear equation.

Second, we make the following assumptions about the errors $(\epsilon_1,..,\epsilon_T)$:
<ul>
    <li>they have mean zero; otherwise the forecasts will be systematically biased.</li>
    <li>they are not autocorrelated; otherwise the forecasts will be inefficient, as there is more information in the data that can be exploited.</li>
    <li>they are unrelated to the predictor variables; otherwise there would be more information that should be included in the systematic part of the model.</li>
</ul>

It is also useful to have the errors being normally distributed with a constant variance σ2
in order to easily produce prediction intervals.</q> From <a href="https://otexts.com/fpp2/regression-intro.html">Forecasting: Principles and Practice</a>.


Note we'll return to what to do if these assumptions are violated below.

In [None]:
## Code here












In [None]:
## Code here












In [None]:
## Code here












### A Unit Root Test for Stationarity

Sometimes you may want a more statistically sound way to test for stationarity. One such way is the unit root test.

The version of the test utilized by `statsmodels` is the Dickey-Fuller test <a href="https://en.wikipedia.org/wiki/Augmented_Dickey%E2%80%93Fuller_test">https://en.wikipedia.org/wiki/Augmented_Dickey%E2%80%93Fuller_test</a>. In this test the null hypothesis is that a unit root is present in the time series data, while the alternative is that there is no unit root. If your $p$-value is below a predetermined level you reject the null hypothesis, a common rule of thumb is below $0.05$, in which case there is evidence of stationarity.

You can perform this test in python using this code. We demonstrate with a time series we've decided has stationarity based on the image the google differenced data. You then perform the test on the lynx data.

In [None]:
from statsmodels.tsa.api import adfuller

In [None]:
goog_diff = pd.read_csv("goog_diff.csv")
goog_diff.head()

In [None]:
# run the test like so
unit_test = adfuller(goog_diff.day_to_day_diff)

In [None]:
# print p-value
np.round(unit_test[1],8)

In [None]:
lynx = pd.read_csv("lynx.csv")

In [None]:
## Code here












In [None]:
## Code here












### Seasonal ARIMA Models

Seasonal extensions of the ARIMA framework exist and are formulated in a similar manner as standard ARIMA models.

For seasonal models we have ARIMA$(p,d,q),(P,D,Q)_m$ models where the second set of variables correspond to the seasonal part of the model, and $m$ is the number of timesteps for one season to complete.

Read the documentation here, <a href="https://www.statsmodels.org/stable/generated/statsmodels.tsa.arima.model.ARIMA.html">https://www.statsmodels.org/stable/generated/statsmodels.tsa.arima.model.ARIMA.html</a>, to learn how to implement seasonal ARIMA in python. Then use it to predict measles cases from 1959 to 1963.

In [None]:
## Code here












In [None]:
## Code here












In [None]:
## Code here












In [None]:
## Code here












### Dynamic Regression Models

We end the Time Series homework by presenting a way to combine linear regression methods with ARIMA to make forecasts that consider both other features and the temporal nature of the data.

If you'd like to read about it check out chapter 9 of <a href="https://otexts.com/fpp2/dynamic.html">Forecasting: Principles and Practice</a>.

The typical approach is to fit your linear regression model:
$$
y = X\beta + \epsilon
$$.

Then you form a new time series $\eta_t$ from the residuals:
$$
\eta_t = y_t - \hat{y}_t
$$

You then model the time dependence in the residuals using an ARIMA model.

Now we could try and estimate such a model by estimating $\beta$ independently of our ARIMA model, but this is not statistically sound. Remember OLS regression estimates involve the assumption that our errors are independent, but by the point of an ARIMA model is that they are, in fact, not independent.

The preferred approach is to fit the entirety of the model at once. This can be done by adding an `exog = OLS Features` argument when you run `ARIMA(time_series)`. Do this to predict the Consumption from the uschange dataset. Use a maximum horizon of $10$ time steps.

In [None]:
uschange = pd.read_csv("uschange.csv")

In [None]:
## Code here












In [None]:
## Code here












In [None]:
## Code here












In [None]:
## Code here












In [None]:
## Code here










