# Predicting with Time Series

Below, we review some of the basics in exploring and modeling with time series data.  Our data is from stock tickers for Amazon.  We will target predicting the **close** feature or closing price of the stock on a given date.



In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

In [2]:
amzn = pd.read_csv('../resource/asnlib/publicdata/AMZN_2006-01-01_to_2018-01-01.csv')

In [3]:
amzn.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Name
0,2006-01-03,47.47,47.85,46.25,47.58,7582127,AMZN
1,2006-01-04,47.48,47.73,46.69,47.25,7440914,AMZN
2,2006-01-05,47.16,48.2,47.11,47.65,5417258,AMZN
3,2006-01-06,47.97,48.58,47.32,47.87,6154285,AMZN
4,2006-01-09,46.55,47.1,46.4,47.08,8945056,AMZN


### Question I: `to_ts_format`

In [38]:
###GRADED
def to_ts_format(df):
    '''
    This function takes a DataFrame with a Date
    column.  We return a DataFrame with the Date feature
    as a datetime object and set to the index of the DataFrame.
    '''
    df['Date'] = pd.to_datetime(df['Date'])
    df.set_index("Date", inplace = True)

    #df = df.copy()
    #df['Date'] = df.to_datetime(df['Date'])
    #df.set_index('Date', inplace=True)
    return df
    ###
    ### YOUR CODE HERE
    ###
    

In [6]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


### Question 2: Closing Price by Month



In [7]:
amzn = pd.read_csv('../resource/asnlib/publicdata/amzn_prepared.csv', index_col = 0, parse_dates = True)
amzn.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Name
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2006-01-03,47.47,47.85,46.25,47.58,7582127,AMZN
2006-01-04,47.48,47.73,46.69,47.25,7440914,AMZN
2006-01-05,47.16,48.2,47.11,47.65,5417258,AMZN
2006-01-06,47.97,48.58,47.32,47.87,6154285,AMZN
2006-01-09,46.55,47.1,46.4,47.08,8945056,AMZN


In [8]:
###GRADED
###QUESTION 2
###Compute the average Closing price by quarter.
###Save your results to the variable amzn_mean_close_by_quarter.
###HINT: Use the .resample() method.
amzn_mean_close_by_quarter = amzn.resample('Q')['Close'].mean()
###
### YOUR CODE HERE
###


In [8]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


### Question 3: Moving Average

In [9]:
###GRADED
amzn_20day_mvg_avg = amzn['Close'].rolling(20).mean()
###
### YOUR CODE HERE
###


In [10]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


In [11]:
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

### Question 4:  Stationarity

![](amzn_autocorr.png)

In [11]:
###GRADED
###QUESTION 4
###Based on the Autocorrelation plot of AMZN's closing
###prices, believe the time series is stationary.
###Assign a boolean answer to ans_4 below.
ans_4 = False
###
### YOUR CODE HERE
###


In [12]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


### Question 5: Augmented Dickey Fuller test



In [13]:
from statsmodels.tsa.stattools import adfuller

In [14]:
adfuller(amzn.Close)[1]

1.0

In [15]:
adfuller(amzn.Close.diff().dropna())[1]

1.0983139936571346e-17

In [16]:
###GRADED
###QUESTION 5
###Interpret  the results of the augmented Dickey-Fuller test
###by selecting all valid statements below.  Assign your solutions
###as strings in the list ans_5 (i.e. ans_5 = ['a', 'c']):
###a) We fail to reject the null hypothesis for the original series
###b) We reject the null hypothesis for the first difference of the original series
###c) neither the original or diff'd version of the series is stationary
###d) We fail to reject the null hypothesis that the first difference is stationary.
ans_5 = ['a','b']
###
### YOUR CODE HERE
###


In [17]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


### Question 6: Train/Test Split

Below, we create a training and testing set for our models.  As discussed in the lectures, we want to create these splits in order rather than a random selection of observations.  For us, we will use data starting in **2016** for our training data, and the final ten days of data as our testing data.

In [24]:
###GRADED
###Training Data starts in 2016
###Testing Data is last 10 observations in dataset.
###Both should be pandas series type indexed by date.
###The last date in train should be '2017-12-14'.
amzn_2016 = amzn[amzn.index.year >= 2016]
train = amzn_2016[:'2017-12-14'].Close
test = amzn_2016[-10:].Close
###
### YOUR CODE HERE
###


In [22]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


### Question 7: Building the Autoregressive Model

Adjust the `ARIMA` instance below to build an autoregression model using your training data from **Question 6**.  This model should work on the first difference of the original closing price,  and use one prior term or an order 1 `AR` model.  

In [23]:
from statsmodels.tsa.arima_model import ARIMA

In [26]:
ar = ARIMA(train, order = (1, 1, 0))
model = ar.fit() #fit the model here
###
### YOUR CODE HERE
###


In [27]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


### Question 8: Making Predictions

In [29]:
###GRADED
###QUESTION 8
###Use your fit model instance to generate a forecast for the 
###next 10 days closing prices.  (HINT: Use the .forecast() method!)
###Assign your solution as a series of predictions with appropriate index to ar_predictions below.
pred = model.forecast(steps=10)[0]
ar_predictions = pd.Series(pred, index = test.index)
###
### YOUR CODE HERE
###


In [25]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


**Plotting Predictions**

![](predictions.png)

### Question 9: MSE and RMSE

In [32]:
###GRADED
###Question 9
###Compute the Mean Squared Error and 
###Root Mean Squared Error on the testing data.

from sklearn.metrics import mean_squared_error
mse = mean_squared_error(test, ar_predictions)
rmse = np.sqrt(mse)
###
### YOUR CODE HERE
###


In [27]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


### Question 10: ARIMA Model

In [33]:
###GRADED
###Question 10
###Adjust the ARIMA Model below to fit a model with 1 AR term
###on the first difference of the training data with 1 moving average term.
ar = ARIMA(train, order = (1, 1, 1))
model = ar.fit() #fit the model here
###
### YOUR CODE HERE
###


In [29]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


### Question 11: Evaluating Results MSE and RMSE

In [34]:
###GRADED
###QUESTION 11
###Use your fit ARIMA model to evaluate the Mean Squared Error
###and Root Mean Squared Error on the test dataset.

pred = model.forecast(steps=10)[0]
new_pred = pd.Series(pred, index = test.index)

mse = mean_squared_error(test, new_pred)
rmse = np.sqrt(mse)

###
### YOUR CODE HERE
###


In [31]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


### What's Next

As we saw in the lectures, these values for parameters `p, q`, and `d` might not be just right.  We would like to also consider some near values in the vicinity of our already tested.  You may want to run a grid search over some different parameters -- comparing the performance on predicting your test data.