# Using Recurrent Neural Networks (RNN) for time series forecasting

In this activity we will use Recurernt Neural Networks (RNN) for time series foreacasting. 
Remember that RNNs are suitable when we want to model data with sequential or temporal structures. 

A "time series" is generally the observation of a variable over time, see https://en.wikipedia.org/wiki/Time_series for more details. 
Time series forecasting is about making predictions about the future based on past data. 
We will work on a relatively simple example, trying to predict Google stock prices (univariate time series), without getting into details on the statistics and math behind it. 

For an overview of non-DeepLearning models for this task see [here](https://towardsdatascience.com/the-complete-guide-to-time-series-analysis-and-forecasting-70d476bfe775) 

Note that for rigorous time series analysis we would need other analysis than what is presented this notebook.

## Data Preprocessing

Upload the "Google_test.csv" and "Google_train.csv" datasets to your session before proceeding (both are posted on elearn). 

You can easily download similar datasets as csv files for any Stock ticker from https://finance.yahoo.com/. 

In [None]:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

### Training data
Our training data is daily observations of Google stock price from *January 2010* till *January 2016*; 1529 observations over time

We will use only the **Opening** price values. Observations of a single variable over time is also called a univariate time series.
We try to predict future Opening prices based on its past values. 

In [None]:
dataset_train = pd.read_csv('Google_train.csv')
training_set = dataset_train.iloc[:, 1:2].values # use only Opening price as input data
dataset_train.head(2), dataset_train.tail(2)

In [None]:
training_set

### OPTIONAL alternative training data
Alternatively, you can use the "yfinance" library to directly pull stock data from yahoo finance using a company's ticker. 
Uncomment the code and use it instead of the provided datasets.

If you use this section, also make sure to pull corresponding testing data (see **OPTIONAL alternative testing data** in TOC). 

As is, it pulls the same dates for Google stock as in the provided dataset. You can experiment using other timeframes.

In [None]:
!pip install yfinance

In [None]:
import yfinance as yf
dataset_train=yf.download("GOOGL", start="2010-01-01", end="2016-01-30")
training_set=dataset_train[['Adj Close']].values #using adjusted closing prices
training_set.shape

### Scaling our data

In [None]:
# As we now, we need to scale our dataet
# Feature Scaling using MinMax scaling
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler(feature_range = (0, 1))
training_set_scaled = sc.fit_transform(training_set)

In [None]:
len(training_set)

### Specifying time window (time steps)
We will look at the time window of 60 past days (60 time steps) to make predictions for today.
Our model will learn the current value based on the past window. 

This value can be tuned to potentially improve model performance depending on the available data. 


In [None]:
# Creating a data structure with 60 timesteps and 1 output
window=60 #change this value to try another window size

In [None]:

# i.e., at each time t we look at the 60 time steps before, t-1, ...,t-60 (3 previous financial months)
X_train = [] #each observation contains the 60 prices on the 60 time points before t
y_train = [] #each observation is the stock price on time t
for i in range(window, len(training_set)): #(60,1529)
    X_train.append(training_set_scaled[i-window:i, 0])
    y_train.append(training_set_scaled[i, 0])
    
X_train, y_train = np.array(X_train), np.array(y_train)

Our data has to be reshaped into a tensor (3d data structure) before we can feed it to a RNN. 
The tensor that can be feed to a RNN has the following structure:
* number of observations = length of training data - window =1529-60
* number of time steps; size of time window 
* number of variables/features = 1 for univariate time series 

In [None]:
# Reshaping the data so that it fits the format required for the RNN input layer
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
# 3D data structure (tensor) allowing for more than 1 variable, (number of observations,number of time steps,1)
X_train.shape

### Testing Data
Our test data is daily observations of Google stock price from *February 2016* till *April 2016*; 42 observations.


In [None]:
dataset_test = pd.read_csv('Google_test.csv')
dataset_test.head(2), dataset_test.tail(2), len(dataset_test)

### OPTIONAL alternative testing data
Use this to pull testing data if you have pulled your training data using the alternative option.

As is, it pull the same dates for Google as in the provided dataset.

In [None]:
#import yfinance as yf
#dataset_test=yf.download("GOOGL", start="2016-02-01", end="2016-04-01")
#training_set=dataset_test[['Adj Close']].values #using adjusted closing prices
#training_set.shape

we process our test data similar to what we did for the training data above.

In [None]:
# window=60
dataset_total = pd.concat((dataset_train['Open'], dataset_test['Open']), axis = 0)
inputs = dataset_total[len(dataset_total) - len(dataset_test) - window:].values  
inputs = inputs.reshape(-1,1)
inputs = sc.transform(inputs) 

X_test = [] #each observation contains the 60 prices on the 60 time points before t
y_test = [] #each observation is the stock price on time t
for i in range(window, len(inputs)): 
    X_test.append(inputs[i-window:i, 0])
    y_test.append(inputs[i, 0])
    
X_test, y_test = np.array(X_test), np.array(y_test)
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))

In [None]:
#let's make sure our datasets are shaped correctly before we proceed...
print(X_train.shape, y_train.shape,X_test.shape, y_test.shape)

## Building the first RNN model with LSTM

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import Model
print(tf.__version__)

We will build a network with 
* an input layer 
* two LSTM layers (each with 50 units) 
* each followed by a dropout layer to precent overfitting on the training data (see notebook 7 for more details on Dropout). 
* our ouput layer has 1 node



In [None]:
# Initialisi#ng the RNN
ts= keras.Sequential()

# Adding the first LSTM layer and some Dropout regularisation
# Notice we specify the input layer here as well using input_shape=(60,1)
ts.add(layers.GRU(units = 64, return_sequences = True, input_shape =(60,1) ))#(X_train.shape[1], X_train.shape[2])))
#ts.add(layers.Dropout(0.2))  #randomly drops 20% of observations to avoid overfitting

# Adding a second LSTM layer and some Dropout regularisation
# Note that the "return_sequences=False (default value)" for the last RNN layer and True for previous layers
ts.add(layers.GRU(units = 32)) 
#ts.add(layers.Dropout(0.2))

# Adding the output layer
ts.add(layers.Dense(units = 1))


In [None]:
ts.summary()

In [None]:
# we use "Mean Squared Error" as our loss function 
ts.compile(optimizer = 'adam', #can also try  optimizer='rmsprop'
           loss = 'mean_squared_error', # mean_absolute_error
           metrics=['MeanSquaredError', 'MeanAbsoluteError', 'MeanAbsolutePercentageError']) #MSE, MAE, MAPE

In [None]:
# Fitting the RNN to the Training set
history_LSTM=ts.fit(X_train, y_train, 
               epochs = 10, batch_size = 30,verbose=1,
               validation_data=(X_test,y_test))


### Evaluating the LSTM network and making predictions
Visualizing the loss (MSE) as the model trains over several epochs

In [None]:
import matplotlib.pyplot as plt
plt.plot(history_LSTM.history['loss'],label="training")
plt.plot(history_LSTM.history['val_loss'],label="validation")
plt.xlabel("epoch")
plt.legend()
plt.show()

### Question 1
Would improving the number of epochs (for training the model) improve the model fit? How do you know?

...

Let's predict the future stock prices and see how it compares with the actual prices

In [None]:
real_stock_price = dataset_test.iloc[:, 1:2].values #outcome var=open price
len(real_stock_price)
#this is the number of future time units we will make predictions for

In [None]:
predicted_stock_price = ts.predict(X_test)
# note that we need to inverse_transform the predicted values 
# because the predictions are on the scaled [0,1] range (we MinMax scaled our data before training)
predicted_stock_price_LSTM = sc.inverse_transform(predicted_stock_price)

In [None]:
# Visualising the results
plt.plot(real_stock_price, color = 'red', label = 'Real Stock Price')
plt.plot(predicted_stock_price_LSTM, color = 'blue', label = 'Predicted Stock Price')
plt.title('Google Stock Price Prediction')
plt.xlabel('Time')
plt.ylabel('Google Stock Price')
plt.legend()
plt.show()


### RMSE, MAE metrics for LSTM network 
RMSE are MAE are both reasonable metrics to compare performance of forecasting models for time series (given the same data)

In [None]:
from sklearn import metrics
import math
#print('Coefficients: \n', lin_reg.coef_) #regression coefficients
print("Root Mean squared error: %.2f" % math.sqrt(metrics.mean_squared_error(real_stock_price, predicted_stock_price_LSTM))) #RMSE
print("Mean absolute error: %.2f" % metrics.mean_absolute_error(real_stock_price, predicted_stock_price_LSTM)) #MAE

## Building another RNN model with GRUs

Let's build the same network using GRU cells instead of LSTM. 

In [None]:
# Initialisi#ng the RNN
ts2= keras.Sequential()

# Adding the first GRU layer and some Dropout regularisation
# Notice we specify the input layer here as well using input_shape=(60,1)
ts2.add(layers.GRU(units = 50, return_sequences = True, input_shape = (X_train.shape[1], X_train.shape[2])))
ts2.add(layers.Dropout(0.2))  #randomly drops 20% of observations to avoid overfitting

# Adding a second GRU layer and some Dropout regularisation
# Note that the "return_sequences=False (default value)" for the last RNN layer and True for previous layers
ts2.add(layers.GRU(units = 50)) 
ts2.add(layers.Dropout(0.2))

# Adding the output layer
ts2.add(layers.Dense(units = 1))

ts2.summary()

In [None]:
# we use "Mean Squared Error" as our loss function 
ts2.compile(optimizer = 'adam', #can also try  optimizer='rmsprop'
           loss = 'mean_squared_error', # can also try mean_absolute_error  
           metrics=['MeanSquaredError', 'MeanAbsoluteError', 'MeanAbsolutePercentageError']) #MSE, MAE, MAPE

In [None]:
# Fitting the RNN to the Training set
history_GRU=ts2.fit(X_train, y_train, 
               epochs = 10, batch_size = 30,verbose=1,
               validation_data=(X_test,y_test))

### Evaluating the GRU network and making predictions



In [None]:
import matplotlib.pyplot as plt
plt.plot(history_GRU.history['loss'],label="training")
plt.plot(history_GRU.history['val_loss'],label="validation")
plt.xlabel("epoch")
plt.legend()
plt.show()

###Question 2 
Is this network overfitting the training data? How do you know?

In [None]:
predicted_stock_price = ts2.predict(X_test)
# note that we need to inverse_transform the predicted values 
# because the predictions are on the scaled [0,1] range (we MinMax scaled our data before training)
predicted_stock_price_GRU = sc.inverse_transform(predicted_stock_price)

In [None]:
# Visualising the results
plt.plot(real_stock_price, color = 'red', label = 'Real Stock Price')
plt.plot(predicted_stock_price_GRU, color = 'blue', label = 'Predicted Stock Price (GRU)')
plt.title('Google Stock Price Prediction')
plt.xlabel('Time')
plt.ylabel('Google Stock Price')
plt.legend()
plt.show()

### RMSE, MAE metrics for GRU network 

In [None]:
print("Root Mean squared error: %.2f" % math.sqrt(metrics.mean_squared_error(real_stock_price, predicted_stock_price_GRU))) #RMSE
print("Mean absolute error: %.2f" % metrics.mean_absolute_error(real_stock_price, predicted_stock_price_GRU)) #MAE

##Comparing the LSTM and GRU models 
Let's plot both models' predictions and the actual values

In [None]:
plt.plot(real_stock_price, color = 'red', label = 'Real Stock Price')
plt.plot(predicted_stock_price_GRU, color = 'blue', label = 'Predicted Stock Price (GRU)')
plt.plot(predicted_stock_price_LSTM, color = 'green', label = 'Predicted Stock Price (LSTM)')
plt.title('Google Stock Price Prediction')
plt.xlabel('Time')
plt.ylabel('Google Stock Price')
plt.legend()
plt.show()

### Question 3
Based on the above plot, which model seems to better forecast the future Opening prices? 

In [None]:
print("RMSE (LSTM): %.2f" % math.sqrt(metrics.mean_squared_error(real_stock_price, predicted_stock_price_LSTM))) #RMSE
print("MAE (LSTM): %.2f" % metrics.mean_absolute_error(real_stock_price, predicted_stock_price_LSTM)) #MAE

print("RMSE (GRU): %.2f" % math.sqrt(metrics.mean_squared_error(real_stock_price, predicted_stock_price_GRU))) #RMSE
print("MAE (GRU): %.2f" % metrics.mean_absolute_error(real_stock_price, predicted_stock_price_GRU)) #MAE

### Question 4
Which model has a better predictive performance? which metric did you use? 

## Can we improve the model?

There are several approaches we can try to potentially improve forecasting accuracy, including:

* trying a deeper model with more layers and/or units in each layer
* We can specify a longer time window for the network to learn from (for example 100 instead of 60). 
* Adjusting the Dropout rate. Pay attention to overfitting when using smaller dropout rates (the 20% we used is a common middle ground)

We will try the first approach first

## Third model with 4 hidden layers

Let's try a model with 4 LSTM layers (same number of units in each layer), and same Dropout rate after each layer. 

In [None]:
# Initialisi#ng the RNN
ts3= keras.Sequential()

# Adding the first LSTM layer and some Dropout regularisation
ts3.add(layers.LSTM(units = 50, return_sequences = True, input_shape = (X_train.shape[1], X_train.shape[2])))
ts3.add(layers.Dropout(0.2))  #randomly drops 20% of observations to avoid overfitting

# Adding a second LSTM layer and some Dropout regularisation
ts3.add(layers.LSTM(units = 50, return_sequences = True))
ts3.add(layers.Dropout(0.2))

# Adding a third LSTM layer and some Dropout regularisation
ts3.add(layers.LSTM(units = 50, return_sequences = True))
ts3.add(layers.Dropout(0.2))

# Adding a fourth LSTM layer and some Dropout regularisation
ts3.add(layers.LSTM(units = 50))
ts3.add(layers.Dropout(0.2))

# Adding the output layer
ts3.add(layers.Dense(units = 1))

ts3.summary()

In [None]:
# we use "Mean Squared Error" as our loss function 
ts3.compile(optimizer = 'adam', #can also try  optimizer='rmsprop'
           loss = 'mean_squared_error', # can also try mean_absolute_error  
           metrics=['MeanSquaredError', 'MeanAbsoluteError', 'MeanAbsolutePercentageError'])

In [None]:
# Fitting the model to the Training set
history_3=ts3.fit(X_train, y_train, 
               epochs = 15, batch_size = 30,verbose=1,
               validation_data=(X_test,y_test));

### Evaluating the 3rd model 

In [None]:
import matplotlib.pyplot as plt
plt.plot(history_3.history['loss'],label="training")
plt.plot(history_3.history['val_loss'],label="validation")
plt.xlabel("epoch")
plt.legend()
plt.show()

In [None]:
predicted_stock_price = ts3.predict(X_test)
predicted_stock_price_3 = sc.inverse_transform(predicted_stock_price)

# Visualising the results
# you can add the plot for previous models' predictions by uncommenting the correspoding lines below
plt.plot(real_stock_price, color = 'red', label = 'Real Stock Price')
plt.plot(predicted_stock_price_3, color = 'blue', label = 'Predicted Stock Price (LSTM2)')
#plt.plot(predicted_stock_price_GRU, color = 'yellow', label = 'Predicted Stock Price (GRU)')
plt.plot(predicted_stock_price_LSTM, color = 'green', label = 'Predicted Stock Price (LSTM)')
plt.title('Google Stock Price Prediction')
plt.xlabel('Time')
plt.ylabel('Google Stock Price')
plt.legend()
plt.show()

In [None]:
print("Root Mean squared error: %.2f" % math.sqrt(metrics.mean_squared_error(real_stock_price, predicted_stock_price_3))) #RMSE
print("Mean absolute error: %.2f" % metrics.mean_absolute_error(real_stock_price, predicted_stock_price_3)) #MAE


###Question 5
Does the new model with 4 LSTM layers have a better performance? 

By adding more layers we are building a more complex model, which can be helpful (to improve model performance) if the additional complexity helps capturing the underlying complexity in our data.In the current example, forecasting a univariate time series, this does not seem to be the case. 

## Changing window size (number of time steps)

Let's try increasing the window size to 120 (number of time steps). This means we will use the opening price from the past 120 days to predict the value of today's opening price. 

To do that go to section **Specifying time window (time steps)** and change the value for *window* (first cell in the section) to **window=120** (from 60).
Rerun all the subsequent cells in the **Data Preprocessing** section to reshape the data with the new time window. 

Then, rerun the first (LSTM network with 2 layers) and second (GRU network with 2 layers) model. *Run the following cell to keep the performance metrics for the two models (based on window=60) before changing the window size and retraining the models*.

In [None]:
print("RMSE (LSTM):%.2f" % math.sqrt(metrics.mean_squared_error(real_stock_price, predicted_stock_price_LSTM)) +";",
      "MAE (LSTM):%.2f" % metrics.mean_absolute_error(real_stock_price, predicted_stock_price_LSTM)) #MAE

print("RMSE (GRU):%.2f" % math.sqrt(metrics.mean_squared_error(real_stock_price, predicted_stock_price_GRU))+";",
      "MAE (GRU):%.2f" % metrics.mean_absolute_error(real_stock_price, predicted_stock_price_GRU)) #MAE

###Question 6
Does increasing the window size improve the predictive performance for any of the 2 models?

Which one(s)?

###Question 7 (Bonus question)
Do you think reducing the window size (e.g., from 60 to 30) could improve model's forecasting performance (i.e., in terms of RMSE or MAE)? why? 
