## RNN for Stock Price Prediction

This week we are going to build an RNN model to predict the price of Apple stock give the history of Apple stock in the past 60 days. First we use Pandas Datareader to fetch the stock price of Apple from 1980 to 2018 from the internet. This will be our traiing data.

[Learn more about Pandas Datareader](https://pandas-datareader.readthedocs.io/en/latest/remote_data.html)

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt

from pandas_datareader import data
tickers = 'AAPL'

start_date = '1980-12-01'
end_date = '2018-12-31'

stock_data = data.get_data_yahoo(tickers, start_date, end_date)
stock_data.head(10)

In the table above, "High" and "Low" represents the highest and the lowest prices for the day, “Open” represents the price of the first transaction, and “Close” represents the price ended at for the day. For this exercise, we'll use the Close price as our target.

In the next cell, please find out the number of training data points (number of rows of the data) and store that number as a variable named stock_data_len.

In [None]:
stock_data_len = len(stock_data)
print(stock_data_len)

In the next cell, we select the Close price and save it as another numpy array. Then we use MinMaxScaler from Scikit-learn to normalize the data from 0 to 1 (lowest price = 0, highest price = 1).

In [None]:
# Select the Close price
close_prices = stock_data.iloc[:, 3:4].values

# Rescale to 0-1
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler(feature_range = (0, 1))
training_set_scaled = sc.fit_transform(close_prices)
print(training_set_scaled.shape)

Now we need to prepare the inputs for RNN. The RNN takes a 3D tensor with shape (batch, timesteps, number of features). We train the whole set at once so batch = stock_data_len. We use the prices of previous 60 days so each timesteps is 60. The number of features is 1 (the price).

In [None]:
# Prepare the 3D input tensor for RNN
# Each set contains prices of the previous 60 days (features), 
# label = true (actual) price 'today' given the previous 60 days
features = []
labels = []
for i in range(60, stock_data_len):
    features.append(training_set_scaled[i-60:i, 0])
    labels.append(training_set_scaled[i, 0])

features = np.array(features)
labels = np.array(labels)

print('shape of features is:', features.shape)
print('shape of labels is:', labels.shape)

# Reshape feature into 3D
features = features[:,:,np.newaxis]
print('shape of features is now:', features.shape)

Now we build the RNN model. Please read the comments so that you understand the codes do.

In [None]:
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.models import Sequential,Model
from tensorflow.keras.layers import *
from tensorflow.keras.optimizers import Adam

model = Sequential()
# LSTM(50) means using 50 units (cells) in the layer
# return_sequence means return the hidden state output for each input time step.
# Must set return_sequences=True when stacking LSTM layers
# so that the LSTM layer has a three-dimensional sequence input from the previous one.  
# Default is false. Use in the last LSTM.          
model.add(LSTM(50, return_sequences=True, input_shape = (features.shape[1], 1)))
model.add(Dropout(0.2))          
model.add(LSTM(50, return_sequences=True))
model.add(Dropout(0.2))          
model.add(LSTM(50, return_sequences=True))
model.add(Dropout(0.2))          
model.add(LSTM(50))     
model.add(Dropout(0.2))          
model.add(Dense(1))          

model.summary()

Let's fit the model!

In [None]:
model.compile(optimizer = 'adam', loss = 'mean_squared_error',metrics=['accuracy'])
history = model.fit(features, labels, epochs = 20, batch_size = 32)

Now we want to apply the model on a set of test data. Please take the data of the Apple stock price from 2019-01-01 to 2019-06-30, select the Close prices, and normalize them as before.

In [None]:
# Prepare test data

start_date = '2019-01-01'
end_date = '2019-06-30'

test_stock_data = data.get_data_yahoo(tickers, start_date, end_date)

In [None]:
# Select the Close price 
test_prices = test_stock_data.iloc[:, 3:4].values
# Check the length
test_data_len = len(test_stock_data)
print(test_data_len)

In [None]:
# Normalize the test data to (0,1)
test_prices_scaled = sc.fit_transform(test_prices)
print(test_prices_scaled.shape)

Prepare the input (features) for the model. (No need for labels this time.)

In [None]:
# Input for model prediction

test_features = []

for i in range(60, test_data_len):
    test_features.append(test_prices_scaled[i-60:i, 0])

test_features = np.array(test_features)

print('shape of features is:', test_features.shape)

# Reshape feature into 3D
test_features = test_features[:,:,np.newaxis]
print('shape of test_features is now:', test_features.shape)

In the next cell, make a plot to compare the predicted price to the actual price in the test data. 
Remeber to use the [inverse transform](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler.inverse_transform)
method of MinMaxScalar to transform the prediction from (0,1) to its orginal value.

Also note that since we use the previous 60 days for prediction, the comparison should start from
the 61st day in the test sample.

In [None]:

predicted_stock_price = model.predict(test_features)
predicted_stock_price = sc.inverse_transform(predicted_stock_price)

actual_stock_price = test_prices[60:test_data_len,0]

plt.figure(figsize=(10,6))  
plt.plot(actual_stock_price, color='blue', label='Actual Apple Stock Price')  
plt.plot(predicted_stock_price , color='red', label='Predicted Apple Stock Price')  
plt.title('Apple Stock Price Prediction')  
plt.xlabel('Date')  
plt.ylabel('Apple Stock Price')  
plt.legend()  
plt.show()  