# Exercise on Stateful Recurrent Neural Network: 

**In this exercise we want to use an improved RNN model for predicting if a ice cream store has ice on stock today. We only can use the past weather to make our predictions and hope that the ice stock today depends on the weather in the past couple of days.** 

**The weather is described by 3 states: 0=sunny, 1=cloudy and 2=rainy. People only buy ice when its sunny and the ice cream stand has an unknow stock of ice and reorders sometimes (unknown policy but we hope it depends on the weather).
Unfortunately, we are quite busy with working so we can only remember the weather of the last 2 days - for that reason our lookback is only 2 days.**

**To improve the simple RNN model we will use a stateful RNN model.  This means we will pass the learned hidden state into the next mini-batch connecting to the continuation of the sequence (not reset it to zero!). (For prediction with this stateful RNN we need to work on the test data with the same minibatch size as we have used for training). To work with a stateful RNN model we need to prepare our mini-batches in a special way - the first example of the fist batch has to be connected to the first example of the second batch and so on (see lecture slides).**  
**The idea  of passing the current hidden state into the next mini-batch is, that we can learn something from the past of the sequence that is further behind than only two steps (the past is summerized in the current hidden state).**


**a) Look at RNN model definition, the data preparation, and the model training, what is different compared to the simple RNN?**       

**b) Take the trained model and predict the first two examples of the test set. What are the probabilities for ice/no-ice for this two examples?**   

**c) Complete the code to do the prediction by "hand/numpy" using the extracted weight matrices. (We use model.get_weights() to get the learned weights.) Which state-values do we need to give the in-coming hidden state have for example 1 and for example 2 of the test data? Do we get the same probability vectors as we got it with model.predict?**

**d) Assess the performance of the stateful RNN model on the test data set. How does the achieved accuracy compare to the accuracy you have achieved with a simple RNN model?**

**e) Explain why the stateful RNN model does outperform the simple RNN model in our example. (Hint: remember the data generating process) How could you improve the performance of the simple RNN model? Play around with the code to check your ideas.**



### Import packages

In [None]:
import numpy as np
import sys
np.random.seed(42)
import tensorflow as tf

import keras
%matplotlib inline
import matplotlib.pyplot as plt
tf.__version__, sys.version_info
import pandas as pd

## Prepare data

In [None]:
def gen_data(size=1000000):
    Xs = np.array(np.random.choice(3, size=(size,))) #Random Weather
    Y = []
    ice = 2 # stock of icecream at start
    for t,x in enumerate(Xs):
        # (t-3) >= 0 the first ice cream could be delivered on day 3
        # Xs[t - 3] claudy three days before today => we ordered ice cream
        # ice < 2 not full
        if (t - 3) >= 0 and Xs[t - 3] == 1 and ice < 2: 
            ice += 1
        if x == 0: # It is sunny we therefore sell ice, if we have
            if ice > 0: # We have ice cream
                ice -= 1
        if ice > 0: #We are not out of stock
            Y.append(1)
        else:
            Y.append(0)
    return Xs, np.array(Y)

### generating the data and split it to a train valid and test set

In [None]:
X, Y = gen_data(40000) 

lookback=2

X_tr = X[0:20000]
Y_tr = Y[0:20000]
idx=np.arange(0, len(X_tr),lookback)
X_train=np.zeros((len(idx),lookback))
Y_train=np.zeros((len(idx),1))
for i in range(0,len(idx)-1):
    X_train[i]=X_tr[idx[i]:idx[i+1]]
    Y_train[i]=Y_tr[idx[i]+lookback]

X_va = X[20000:30000]
Y_va = Y[20000:30000]
idx=np.arange(0, len(X_va),lookback)
X_valid=np.zeros((len(idx),lookback))
Y_valid=np.zeros((len(idx),1))
for i in range(0,len(idx)-1):
    X_valid[i]=X_va[idx[i]:idx[i+1]]
    Y_valid[i]=Y_va[idx[i]+lookback]

X_te = X[30000:40000]
Y_te = Y[30000:40000]
idx=np.arange(0, len(X_te),lookback)
X_test=np.zeros((len(idx),lookback))
Y_test=np.zeros((len(idx),1))
for i in range(0,len(idx)-1):
    X_test[i]=X_te[idx[i]:idx[i+1]]
    Y_test[i]=Y_te[idx[i]+lookback]    

In [None]:
print(X_train.shape)
print(Y_train.shape)

print(X_valid.shape)
print(Y_valid.shape)

print(X_test.shape)
print(Y_test.shape)


### converting to one hot encoding for keras

In [None]:
from keras.utils.np_utils import to_categorical   

X_train=to_categorical(X_train,3)
Y_train=to_categorical(Y_train,2)

X_valid=to_categorical(X_valid,3)
Y_valid=to_categorical(Y_valid,2)

X_test=to_categorical(X_test,3)
Y_test=to_categorical(Y_test,2)


In [None]:
print(X_train.shape)
print(Y_train.shape)

print(X_valid.shape)
print(Y_valid.shape)

print(X_test.shape)
print(Y_test.shape)


### prepare stateful batches


In [None]:
batch_s=50
#first create stateful mini-batches from the training data
batches=np.int(len(X_train)/batch_s)
idx=np.arange(0, batches*batch_s,batches)
for i in range(1,batches):
    idx=np.append(idx,np.arange(0, batches*batch_s,batches)+i)
print(idx[0:100])
X_train_stateful=np.zeros((len(X_train),lookback,3))
for i in range(0,len(idx)):
    X_train_stateful[i]=X_train[idx[i]]
Y_train_stateful=np.zeros((len(Y_train),2))
for i in range(0,len(idx)):
    Y_train_stateful[i]=Y_train[idx[i]]

In [None]:
#now create stateful mini-batches from the validation data
batches=np.int(len(X_valid)/batch_s)
idx=np.arange(0, batches*batch_s,batches)
for i in range(1,batches):
    idx=np.append(idx,np.arange(0, batches*batch_s,batches)+i)
X_valid_stateful=np.zeros((len(X_valid),lookback,3))
for i in range(0,len(idx)):
    X_valid_stateful[i]=X_valid[idx[i]]
Y_valid_stateful=np.zeros((len(Y_valid),2))
for i in range(0,len(idx)):
    Y_valid_stateful[i]=Y_valid[idx[i]]

## Setting up the stateful RNN model

In [None]:
from keras.layers import Activation, Dense, SimpleRNN

In [None]:
model = keras.models.Sequential()

name = 'RNN_stateful'

model.add(SimpleRNN(4, batch_input_shape=(50,lookback, 3),stateful=True))
model.add(Dense(2))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [None]:
model.summary()

In [None]:
model.evaluate(X_train_stateful[0:50],Y_train_stateful[0:50],batch_size=50)

In [None]:
print(model.predict(X_train_stateful[0:50],batch_size=50)[0:5])
print(Y_train_stateful[0:5])

### train the stateful RNN model

In [None]:
for i in range(30):
    history1 = model.fit(X_train_stateful, Y_train_stateful, 
                        epochs=1, 
                        batch_size=50, 
                        verbose=2, 
                        validation_data=(X_valid_stateful,Y_valid_stateful),
                        shuffle=False) 
    model.reset_states()  
 

### After the training is completed we extract the learned weights

In [None]:
model.get_weights()

In [None]:
W1=np.row_stack(model.get_weights()[0:2])
b1=model.get_weights()[2]
W2=model.get_weights()[3]
b2=model.get_weights()[4]

In [None]:
W1 # stacked matrices of hidden and input 

### Prepare the test data for a stateful RNN model

In [None]:
# prepare the test data for a stateful RNN model
batch_s=50
batches=np.int(len(X_test)/batch_s)
idx=np.arange(0, batches*batch_s,batches)
for i in range(1,batches):
    idx=np.append(idx,np.arange(0, batches*batch_s,batches)+i)

X_test_stateful=np.zeros((len(X_test),lookback,3))
for i in range(0,len(idx)):
    X_test_stateful[i]=X_test[idx[i]]
Y_test_stateful=np.zeros((len(Y_test),2))
for i in range(0,len(idx)):
    Y_test_stateful[i]=Y_test[idx[i]]

### Do the prediction on the first two examples of the test data

In [None]:
# reset the hidden state to zero
model.reset_states()
# predict the first two mini-batches (each has size 50)
y_pred1=model.predict(X_test_stateful[0:100],batch_size=50)
print(y_pred1.shape) # we get for each time point the 2dim prediction
# check the prediction of the first instance in minibatch 1 and in minibatch 2 (each mini-batch has size 50):
# below we will do this by hand and check if we get same predictions
print(y_pred1[0])
print(y_pred1[50])

## One forwardpass of a stateful RNN in numpy by "hand"

### first determine the prediction of the first instance of mini-batch 1

In [None]:
# prepare the ingoing hidden state for the first example of the test data:
h0=np.array((0,0,0,0),dtype="float32")

In [None]:
h1=np.tanh(np.matmul(np.concatenate((X_test_stateful[0][0],h0)),W1)+b1)

In [None]:
h2=np.tanh(np.matmul(np.concatenate((X_test_stateful[0][1],h1)),W1)+b1)

In [None]:
Z=np.matmul(h2,W2)+b2
np.exp(Z)/np.sum(np.exp(Z))

In [None]:
# do the same again but this time with a for loop to go over the elements of a sequence
# initialize hidden state of first(!) mini-batch with zeros
ht_m1=np.array((0,0,0,0),dtype="float32") 

for i in range(0,lookback):
    ht_m1=np.tanh(np.matmul(np.concatenate((X_test_stateful[0][i],ht_m1)),W1)+b1)
Z=np.matmul(ht_m1,W2)+b2
np.exp(Z)/np.sum(np.exp(Z))

#### now determine the prediction of the first instance of mini-batch 2 (stateful connected to first instance of mini-batch 1)

In [None]:
# your code here to define the incoming hidden state for the second example of the test data

ht_m2 = 

for i in range(0,lookback):
    ht_m2=np.tanh(np.matmul(np.concatenate((X_test_stateful[50][i],ht_m2)),W1)+b1)
Z=np.matmul(ht_m2,W2)+b2
np.exp(Z)/np.sum(np.exp(Z))

## Check if the Performance of the stateful RNN is better than the simple RNN model

In [None]:
model.reset_states()
from sklearn.metrics import confusion_matrix
pred=model.predict(X_test_stateful,batch_size=50,)
print(confusion_matrix(np.argmax(Y_test_stateful,axis=1), np.argmax(pred,axis=1)))
np.sum(np.argmax(pred,axis=1)==np.argmax(Y_test_stateful,axis=1))/len(Y_test)