# Exercise on Recurrent Neural Network

In this exercise we want to use RNNs for predicting if a ice cream store has ice on stock.
We only can use the past weather to make our predictions and hope that the ice stock depends on the weather in the past couple of days. 
To asses if our RNN model prediction is useful we want to compare the RNN prediction performance to an easy baseling prediction model. As baseline prediction model we use a random forest.

 
The weather is described by 3 states: 0=sunny, 1=cloudy and 2=rainy. People only buy ice when its sunny and the ice cream 
stand has an unknow stock of ice and reorders sometimes (unknown policy but we hope it depends on the weather).  
Unfortunately, we are quite busy with working so we can  only remember the weather of the last 2 days - for that reason our lookback is only 2 days.


a) Go to the beginning of paragraph *Prepare data* and look at the real data generating process is in cell 2. Have a look at the code and try to understand the process (not necessary to continue).


b) Go to the beginning of paragraph *Train and evaluate the baseline Random Forest model*. How large is the accuracy of the random forest model?


c) Go to the beginning of paragraph *Train and evaluate the RNN model* and look RNN model definition. Draw the corresponding computational graph of the unrolled model.


d) Here's is the model summary of the RNN, explain the Param # for the SimpleRNN layer and Dense layer.
```
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
simple_rnn_1 (SimpleRNN)     (None, 4)                 32        
_________________________________________________________________
dense_1 (Dense)              (None, 2)                 10        
_________________________________________________________________
activation_1 (Activation)    (None, 2)                 0         
=================================================================
Total params: 42
Trainable params: 42
Non-trainable params: 0
_________________________________________________________________
```


d) What is the size of our hidden state in the model and where do we define it?


e) Do you expect that the ice store has ice on stock on day 2 and day 4?
Hint: Take the trained model and predict the first two examples of the test set


f) To understand what excactly is done by the RNN model, check the prediction 
by "hand"/"numpy" with the learned weights from the model.  

Hint: use model.get_weights() to get the leraned weights!

f) Compare the performace of the Random Forrest and the RNN, how good are the models?  
What could be the reason for the observed performaces?
Hint: keep in mind the data generating process


f) What do you expect, if you increase the lookback? Play around with this parameter and check if your expectation was right.



## Load packages

In [None]:
import numpy as np
import sys
np.random.seed(42)
import tensorflow as tf

import keras
%matplotlib inline
import matplotlib.pyplot as plt
tf.__version__, sys.version_info
import pandas as pd

## Prepare data

In [None]:
def gen_data(size=1000000):
    Xs = np.array(np.random.choice(3, size=(size,))) #Random Weather
    Y = []
    ice = 2 # stock of icecream at start
    for t,x in enumerate(Xs):
        # (t-3) >= 0 the first ice cream could be delivered on day 3
        # Xs[t - 3] cloudy three days before today => we ordered ice cream
        # ice < 2 not full
        if (t - 3) >= 0 and Xs[t - 3] == 1 and ice < 2: 
            ice += 1
        if x == 0: # It is sunny we therefore sell ice, if we have
            if ice > 0: # We have ice cream
                ice -= 1
        if ice > 0: #We are not out of stock
            Y.append(1)
        else:
            Y.append(0)
    return Xs, np.array(Y)

### generating the data and split it to a train valid and test set

In [None]:
X, Y = gen_data(40000) 

lookback=2  # how many days of weather info we use

X_tr = X[0:20000]  # number of days with weather in train data
Y_tr = Y[0:20000]
idx=np.arange(0, len(X_tr),lookback)  
X_train=np.zeros((len(idx),lookback))
Y_train=np.zeros((len(idx),1))
for i in range(0,len(idx)-1):
    X_train[i]=X_tr[idx[i]:idx[i+1]]
    Y_train[i]=Y_tr[idx[i]+lookback]

X_va = X[20000:30000]
Y_va = Y[20000:30000]
idx=np.arange(0, len(X_va),lookback)
X_valid=np.zeros((len(idx),lookback))
Y_valid=np.zeros((len(idx),1))
for i in range(0,len(idx)-1):
    X_valid[i]=X_va[idx[i]:idx[i+1]]
    Y_valid[i]=Y_va[idx[i]+lookback]

X_te = X[30000:40000]
Y_te = Y[30000:40000]
idx=np.arange(0, len(X_te),lookback)
X_test=np.zeros((len(idx),lookback))
Y_test=np.zeros((len(idx),1))
for i in range(0,len(idx)-1):
    X_test[i]=X_te[idx[i]:idx[i+1]]
    Y_test[i]=Y_te[idx[i]+lookback]    

In [None]:
print(X_train.shape)
print(Y_train.shape)

print(X_valid.shape)
print(Y_valid.shape)

print(X_test.shape)
print(Y_test.shape)


### prepare the data for the Random Forrest

In [None]:
X_train_RF=pd.DataFrame(X_train)
for i in range(0,lookback):
    X_train_RF[i]=X_train_RF[i].astype('category')
#X_train_RF.dtypes

Y_train_RF=pd.DataFrame(Y_train)
Y_train_RF[0]=Y_train_RF[0].astype('category')
#Y_train_RF.dtypes

X_test_RF=pd.DataFrame(X_test)
for i in range(0,lookback):
    X_test_RF[i]=X_test_RF[i].astype('category')
#X_train_RF.dtypes

Y_test_RF=pd.DataFrame(Y_test)
Y_test_RF[0]=Y_test_RF[0].astype('category')
#Y_train_RF.dtypes

### converting to one hot encoding for keras

In [None]:
from keras.utils.np_utils import to_categorical   

X_train=to_categorical(X_train,3)
Y_train=to_categorical(Y_train,2)

X_valid=to_categorical(X_valid,3)
Y_valid=to_categorical(Y_valid,2)

X_test=to_categorical(X_test,3)
Y_test=to_categorical(Y_test,2)


## Train and evaluate the baseline Random Forest model

In [None]:
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(random_state=22)
clf.fit(X_train_RF,np.ravel(Y_train_RF))

In [None]:
from sklearn.metrics import confusion_matrix
pred=clf.predict(X_test_RF)
print(confusion_matrix(Y_test_RF, pred))

# your code here to determine the accuracy:


## Train and evaluate the RNN model

In [None]:
print(X_train.shape)
print(Y_train.shape)

print(X_valid.shape)
print(Y_valid.shape)

print(X_test.shape)
print(Y_test.shape)


In [None]:
from keras.layers import Activation, Dense, SimpleRNN, TimeDistributed

### define and train RNN model

In [None]:
model = keras.models.Sequential()

name = 'RNN'

model.add(SimpleRNN(4, batch_input_shape=(None,lookback, 3)))
model.add(Dense(2))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [None]:
model.summary()

In [None]:
model.evaluate(X_train,Y_train)

In [None]:
print(model.predict(X_train[0:5]))
print(Y_train[0:5])

In [None]:
history = model.fit(X_train, Y_train, 
                    batch_size=32,
                    epochs=20, 
                    verbose=2,
                    validation_data=(X_valid,Y_valid))

In [None]:
# summarize history
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'valid'], loc='lower right')
plt.show()
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'valid'], loc='upper right')
plt.show()

In [None]:
# your code here to extract the weights

In [None]:
# lets prepare the matrices and biases to do the prediction by hand:
W1=np.row_stack(model.get_weights()[0:2])
b1=model.get_weights()[2]
W2=model.get_weights()[3]
b2=model.get_weights()[4]

## Use the trained model for prediction

In [None]:
# your code here to make a prediction of the first observation in the test set:
y_pred1=
y_pred1

### forwardpass in numpy by "hand"

In [None]:
h0=np.array((0,0,0,0),dtype="float32")
# intialize hidden state with zeros

In [None]:
# your code here to determine the activation of the hidden state of the first of the two days
# hint: use W1 and b1
h1=

In [None]:
# your code here to determine the activation of the hidden state of the second of the two days
# hint: use W1 and b1
h2=

In [None]:
# your code here to determine the predicted probabilities for ice (yes or no)
# hint: use W2 and b2
Z=
np.exp(Z)/np.sum(np.exp(Z))

In [None]:
# do the same in a loop:
ht=np.array((0,0,0,0),dtype="float32")#first hidden stare = all zeros
for i in range(0,lookback):
    ht=np.tanh(np.matmul(np.concatenate((X_test[0][i],ht)),W1)+b1)
Z=np.matmul(ht,W2)+b2
np.exp(Z)/np.sum(np.exp(Z))

In [None]:
pred=model.predict(X_test)
print(confusion_matrix(np.argmax(Y_test,axis=1), np.argmax(pred,axis=1)))
np.sum(np.argmax(pred,axis=1)==np.argmax(Y_test,axis=1))/len(Y_test)