## Sequence Prediction using RNN

In this exercise, we try to predict values of a curve given earlier data points.

We will formulate our problem like this – given a sequence of 50 numbers belonging to a sine wave, predict the 51st number in the series

In [None]:
import keras
import pandas as pd
import matplotlib.pyplot as plt
from keras.models import Sequential, load_model      # type de modele
from keras.layers import Dense                       # Fully Connected Layers
from keras.optimizers import Adam
from keras import layers
from keras.layers import LSTM, Dense
from sklearn.metrics import mean_squared_error

%pylab inline
import math

What does our network model expect the data to be like? It would accept a single sequence of length 50 as input. So the shape of the input data will be:

(number_of_records x length_of_sequence x types_of_sequences)

Here, types_of_sequences is 1, because we have only one type of sequence – the sine wave.

On the other hand, the output would have only one value for each record. This will of course be the 51st value in the input sequence. So its shape would be:

(number_of_records x types_of_sequences) # where types_of_sequences is 1

-----------------------------------

fix random seed for reproducibility of the results (hint: `random`)

In [None]:
# fix random seed for reproducibility (may not be necessary with Keras?)
random.seed(42)

Create the sine wave data and visualize it (hint: `math.sin(x)` for values ranging from 0 to 200)

Data should be stored in a numpy array of 200 values.

In [None]:
sin_wave = 

plt.plot()

Visualize a sequence of only 50, for example the 50 first values of the calculated sine wave.

In [None]:
plt.plot()

We will first set up the training data:

X should be an array of 100 sequences, that is, an array of arrays.

Y should be an array of 100 outputs, that is, for each input sequence, the value that follows the sequence.

hint: `X.append`, `Y.append`

Note that we loop for `num_records – 50` because we want to set aside the last 50 records as our validation data. 

In [None]:
X = []
Y = []

seq_len = 50
num_records = len(sin_wave) - seq_len

for i in range(num_records - 50):
    # ...


If we plot the data we can check that X(t) is Y(t-1)

In [None]:
plt.plot( , 'r', linewidth=3)
plt.plot( , 'b')

Print the shape of the data: 

X should be an array of 100 sequences of 50.

Y should be an array of 100 values.

hint: `shape`

We reshape the data because we need one more dimension for the model

In [None]:
X = X.reshape(1, 100, 50)
Y = Y.reshape(1, 100)

Now, in a similar way to what we did with the training data, we setup the validation data:



In [None]:
X_val = []
Y_val = []

for i in range(num_records - 50, num_records):
    # ...

T = 50                   # length of sequence

output_dim = 1
X_val.shape, Y_val.shape

Now we can instantiate a Keras model:

In [None]:
#definition of the model
model = Sequential()
model.add(LSTM(units = 3,input_shape=(None, 50)))
model.add(Dense(100))
model.compile(loss='mean_squared_error', optimizer='adam')
model.summary()


Then we can test the model with only 20 iterations:

In [None]:
model.fit(X, Y, epochs=20, batch_size=1 )

Get the predictions and plot them.

Hint `model.predict`

In [None]:
prediction = 
print(shape(prediction))
plt.plot(, 'g')
plt.plot( , 'r')
plt.show()

We can see that the model is learning, but we need more iterations for a better fit.

Test the model with 500 epochs.

Plot the new predictions: there should be a better fit now:

In [None]:
trainPredict = 
plt.plot(  , 'g')
plt.plot(  , 'r')
plt.show()


Now, instead of working with training data, we can make predictions on the validation data:

In [None]:
# make predictions
testPredict = 
print(shape(testPredict))


Calculate root mean squared error for the training and the validation data.

We expect the error to be very low on the training data, and a bit less low for the validation data.

Hint: `sklearn.metrics.mean_squared_error`

In [None]:
# calculate root mean squared error
trainScore = 
print('Train Score: %.2f RMSE' % (trainScore))
testScore = 
print('Test Score: %.2f RMSE' % (testScore))

We plot the validation data to check if there is overfitting

In [None]:
plt.plot(  , 'g')
plt.plot(  , 'r')
plt.show()

Great that's look pretty good....We can check with a clean model and less epochs and see....

Could also try with a different series, for example a square wave?