# Recurrent Neural Networks and Long Short Term Memory

## 1. What is meant by Recurrent Neural Networks?

A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes form a directed or undirected graph along a temporal sequence. 
This allows it to exhibit temporal dynamic behavior. Derived from feedforward neural networks, RNNs can use their internal state (memory) to process variable length sequences of inputs.
This makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition.
Recurrent neural networks are theoretically Turing complete and can run arbitrary programs to process arbitrary sequences of inputs.

## 2. What is meant by vanishing and exploding gradient and why is that a problem in RNN?

RNNs suffer from the problem of vanishing gradients, which hampers learning of long data sequences.
The gradients carry information used in the RNN parameter update and when the gradient becomes smaller and smaller, the parameter updates become insignificant which means no real learning is done.

Vanishing –
As the backpropagation algorithm advances downwards(or backward) from the output layer towards the input layer, the gradients often get smaller and smaller and approach zero which eventually leaves the weights of the initial or lower layers nearly unchanged. As a result, the gradient descent never converges to the optimum. This is known as the vanishing gradients problem.

The exploding gradient is the inverse of the vanishing gradient and occurs when large error gradients accumulate, resulting in extremely large updates to neural network model weights during training. As a result, the model is unstable and incapable of learning from your training data.

## 3. What is meant by Long Short Term Memory?

Long Short-Term Memory (LSTM) networks are a type of recurrent neural network capable of learning order dependence in sequence prediction problems.
This is a behavior required in complex problem domains like machine translation, speech recognition, and more. LSTMs are a complex area of deep learning.

## 4. What is meant by Gated Recurrent Unit?

Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al. 
The GRU is like a long short-term memory (LSTM) with a forget gate, but has fewer parameters than LSTM, as it lacks an output gate. 
GRU's performance on certain tasks of polyphonic music modeling, speech signal modeling and natural language processing was found to be similar to that of LSTM.
GRUs have been shown to exhibit better performance on certain smaller and less frequent datasets.

## 5. Train a bi-directional LSTM on imdb movies sentiment dataset from keras (tutorial available on its website, follow that tutorial) (https://keras.io/examples/nlp/bidirectional_lstm_imdb/)

In [2]:
import pandas as pd
import numpy as np
from keras.datasets import imdb
from sklearn.model_selection import train_test_split
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Dense, Embedding, LSTM, Bidirectional, Dropout
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers

max_features = 20000  
maxlen = 200  

In [3]:
(train_data, train_label), (test_data, test_label) = imdb.load_data()
data = np.concatenate((train_data, test_data), axis=0)
sentiments = np.concatenate((train_label, test_label), axis=0)
word_index = imdb.get_word_index()
inverted_word_index = dict([(value, key) for (key, value) in word_index.items()]) 
reviews = []
for d in data:
    decoded = " ".join( [inverted_word_index.get(i - 3, "#") for i in d] )
    reviews.append(decoded)
reviews = np.array(reviews)
reviews[0]

"# this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert redford's is an amazing actor and now the same being director norman's father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for retail and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also congratulations to the two little boy's that played the part's of norman and paul they were just brilliant children are often left out of the praising list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should b

In [None]:
train_reviews, test_reviews, train_sentiments, test_sentiments = train_test_split(reviews, sentiments, test_size=0.2, random_state=42)

In [6]:
tokenizer = Tokenizer(num_words=5000)
tokenizer.fit_on_texts(train_reviews)
train_sequences = tokenizer.texts_to_sequences(train_reviews)
test_sequences = tokenizer.texts_to_sequences(test_reviews)
X_train = pad_sequences(train_sequences, 500)
X_test = pad_sequences(test_sequences, 500)
vocab_size = len(tokenizer.word_index) + 1

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, None)]            0         
                                                                 
 embedding (Embedding)       (None, None, 128)         2560000   
                                                                 
 bidirectional (Bidirectiona  (None, None, 128)        98816     
 l)                                                              
                                                                 
 bidirectional_1 (Bidirectio  (None, 128)              98816     
 nal)                                                            
                                                                 
 dense (Dense)               (None, 1)                 129       
                                                                 
Total params: 2,757,761
Trainable params: 2,757,761
Non-train

In [14]:
model = Sequential()
model.add(Embedding(vocab_size, 50, input_length=500))
model.add(Bidirectional(LSTM(128)))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss = 'binary_crossentropy',
               optimizer = 'adam',
               metrics = ['accuracy'])
model.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_3 (Embedding)     (None, 500, 50)           4098050   
                                                                 
 bidirectional_4 (Bidirectio  (None, 256)              183296    
 nal)                                                            
                                                                 
 dropout_2 (Dropout)         (None, 256)               0         
                                                                 
 dense_3 (Dense)             (None, 1)                 257       
                                                                 
Total params: 4,281,603
Trainable params: 4,281,603
Non-trainable params: 0
_________________________________________________________________


In [None]:
model.fit(X_train,
          train_sentiments,
          epochs=3,
          batch_size=128,
          validation_data=(X_test, test_sentiments))

In [None]:
loss, accuracy = model.evaluate(X_test, test_sentiments)
print(f'Accuracy: {round(accuracy * 100, 2)}%')