# <center> <font size=24 color= 'steelblue'> **Sequence Models**
# <center><img src = "https://drive.google.com/uc?export=view&id=1_KIQA0fg8V1f4VygalHlBBQ2orYJZUq2">

# <a id= 'r0'>
<font size = 4>
    
**Table of Contents:**<br>
[1. Recurrent neural network ](#r1)<br>
> [1.1 RNN refresher](#r1.1)<br>
> [1.2 RNN limitations](#r1.2)<br>

[2. LSTM ](#r2)<br>
[3. Bi-directional recurrent neural network](#r3)<br>
[4. Building the models](#r4)<br>
[5. Takeaways](#r5)<br>

###### <a id = 'r1'>
<font size = 10 color = 'midnightblue'>**Recurrent Neural Networks (RNN)**

###### <a id = 'r1.1'>
<font size = 6 color = pwdrblue> <b>RNN: Refresher

<div class="alert alert-block alert-success">   
<font size=4 >
    
**RNN can handle a stream of data sequentially. This is the way we humans are used to dealing with sequential inputs like text.**

- The hidden layer of this network exhibits self-connection, earning it the term "recurrent." Essentially, the activation of a subsequent word depends not only on its own input but also on the activation of the preceding word. <br>
- This mechanism enables the network to encapsulate the influence of prior words on subsequent ones, enhancing the comprehension of meaning, akin to how human understanding unfolds.

<center> <img src = https://www.researchgate.net/profile/Subburam-Rajaram/publication/324680970/figure/fig1/AS:617962817986561@1524345225985/A-recurrent-neural-network-and-the-unfolding-in-time-of-the-computation-involved-in-its.png/Time_unfoldment.png>

<center> <font size = 4 > <b>The illustration below depicting the temporal unfolding of an RNN vividly clarifies this concept.</b>
    

<div class="alert alert-block alert-success">  
<font size=4 >
    
- The output may or may not be available at every time step, but the other units remain unaffected.
- In each recurrent layer, neurons establish complete connections with themselves.
- For instance, with three neurons in a hidden layer, there would be a total of 9 connections within the layer.
- A temporal shift occurs, meaning that in the initial pass, we have a straightforward hidden layer.
- However, in subsequent passes, the output activations from the previous pass influence the outputs, creating a cascading effect.

###### <a id = 'r1.2'>
<font size = 6 color = pwdrblue> **Limitations - RNN**

<div class="alert alert-block alert-success">   
<font size=4 >

- BPTT (Backpropagation Through Time) emphasizes later sequence layers.
- With increasing depth, the impact of the previous neuron on the output diminishes due to continual thresholding and the compounding effect thereof.
    - This issue extends to forward propagation, affecting error back-propagation as well.
    - Consequently, weight updates are minimally influenced by words earlier in the sequence compared to those appearing later.






[top](#r0)

###### <a id = 'r2'>
<font size = 10 color = 'midnightblue'>**LSTM**

<div class="alert alert-block alert-success">   
<font size=4 >
    
- LSTM, short for Long Short Term Memory, tackles the previously discussed problem.
- It suggests incorporating connections from past layers to future ones in a systematic manner.
- The neural network learns the strength of these connections, retaining the robust ones and discarding others.
- This mechanism closely mimics our sequential processing of textual information.
- When reading a passage, we progress through it word by word, akin to this approach.
- Likewise, when faced with a particular task, our memory adjusts to meet the specific requirements.

<div class="alert alert-block alert-success">   
<font size=4 >

**Consider a scenario**
- A passage is provided detailing the impacts of global warming over the past four decades.
- Am inquiry is now generated as: <br>
</div>

<center><font size = 4 color = seagreen> <b><i>`What were the consequences of global warming on marine ecosystems in 1984? Identify the pivotal events in the unfolding tragedy.`<b>
    

<div class="alert alert-block alert-success">   
<font size=4 >   

**To answer this query,** <br>

- It will be required to systematically read the passage, focusing on the mention of 1984 and relevant events from that timeframe.
- Progressing through the sentences, information related to 1984 is naturally retained and the details unrelated to the specified year are forgotten.
- In LSTM, such an architecture is implemented using gates.

# <center> <img src="https://drive.google.com/uc?export=view&id=19cqa917fL3b1XfN9nUy1e3INa9VFKEDO" style="border: 3px solid  gray;" >

||||
|-|-|-|
|<font size = 5 color = 'midnightblue'> **Input Gate (Write Gate)** |<font size = 4.7 color = \color>Decides whether the incoming input should be considered or not.|
<font size = 5 color = 'midnightblue'> **Forget Gate (Memory Gate)** |<font size = 4.7 color = \color> Decides whether to retain the memory in subsequent step.|
<font size = 5 color = 'midnightblue'> **Output Gate (Read Gate)** |<font size = 4.7 color = \color> Controls the flow of information from one cell to another.|

<div class="alert alert-block alert-success">   
<font size=4 >
    
- The Neural Network learns the values of these gates, and being sigmoid activated, they fall within the real-valued range of 0 to 1.
- This implies that their states are not strictly binary but can vary.
- An alternative method to achieve the same functionality as LSTM is the Gated Recurrent Unit (GRU).
- It is a simpler model compared to LSTM, and as a result, many applications that once utilized LSTM are now transitioning to GRUs.

[top](#r0)

###### <a id = 'r3'>
<font size = 10 color = 'midnightblue'>**Bi-directional recurrent neural network**

<div class="alert alert-block alert-success">   
<font size=4 >

- While humans typically read sentences sequentially, they possess the ability to mentally backtrack to earlier portions as new information surfaces. <br>
- Humans can adeptly handle information presented in a less-than-optimal order. Enabling your model to similarly backtrack across the input is facilitated by bidirectional recurrent neural networks.

# <center> <img src = "https://upload.wikimedia.org/wikipedia/commons/3/35/Structural_diagrams_of_unidirectional_and_bidirectional_recurrent_neural_networks.png" width = 800>

# <center> <img src="https://drive.google.com/uc?export=view&id=1FkmDT01c86RzYbfKsLKvOWhY4Tq7LOx-" width = 800>

[top](#r0)

###### <a id = 'r4'>
<font size = 10 color = 'midnightblue'>**Build the models**

<font size = 5 color = 'pwdrblue'>
<b>Build an RNN model to perform movie review analysis using Keras

<font size = 5 color = seagreen> <b> Download the original dataset from the Stanford AI department [🔗](https://ai.stanford.edu/%7eamaas/data/sentiment/).<br>
<div class="alert alert-block alert-success">   
<font size=4 >
    
1. <font size = 4> This is a dataset compiled for the 2011 paper Learning Word Vectors for Sentiment Analysis.<br>
2. <font size = 4> It has a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

<font size = 5 color = seagreen> <b>Import required modules

In [49]:
import os
import re
import tarfile
import glob
import numpy as np
from random import shuffle
from nltk.tokenize import TreebankWordTokenizer
from gensim.models import KeyedVectors

<font size = 5 color = seagreen> <b>Prepare the data for model input

In [50]:
def pre_process_data(filepath):
    positive_path = os.path.join(filepath, 'pos')
    negative_path = os.path.join(filepath, 'neg')

    pos_label = 1
    neg_label = 0

    dataset = []

    for filename in glob.glob(os.path.join(positive_path, '*.txt')):
        with open(filename, 'r') as f:
            dataset.append((pos_label, f.read()))

    for filename in glob.glob(os.path.join(negative_path, '*.txt')):
        with open(filename, 'r') as f:
            dataset.append((neg_label, f.read()))

    shuffle(dataset)

    return dataset

<font size = 5 color = seagreen> <b>Checking working directory

<font size = 5 color = seagreen> <b>Loading pre-trained word vectors from the `Google News dataset` using the Word2Vec format.

In [51]:
word_vectors = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz', binary=True, limit=200000)

<font size = 5 color = seagreen> <b>Combine the tokenizer and vectorizer into a single function

In [52]:
def tokenize_and_vectorize(dataset):
    tokenizer = TreebankWordTokenizer()
    vectorized_data = []
    expected = []
    for sample in dataset:
        tokens = tokenizer.tokenize(sample[1])
        sample_vecs = []
        for token in tokens:
            try:
                sample_vecs.append(word_vectors[token])

            except KeyError:
                pass  # No matching token in the Google w2v vocab

        vectorized_data.append(sample_vecs)

    return vectorized_data

<font size = 5 color = seagreen> <b>Get the word vectors for a given word

In [53]:
word_vectors["dog"]

array([ 5.12695312e-02, -2.23388672e-02, -1.72851562e-01,  1.61132812e-01,
       -8.44726562e-02,  5.73730469e-02,  5.85937500e-02, -8.25195312e-02,
       -1.53808594e-02, -6.34765625e-02,  1.79687500e-01, -4.23828125e-01,
       -2.25830078e-02, -1.66015625e-01, -2.51464844e-02,  1.07421875e-01,
       -1.99218750e-01,  1.59179688e-01, -1.87500000e-01, -1.20117188e-01,
        1.55273438e-01, -9.91210938e-02,  1.42578125e-01, -1.64062500e-01,
       -8.93554688e-02,  2.00195312e-01, -1.49414062e-01,  3.20312500e-01,
        3.28125000e-01,  2.44140625e-02, -9.71679688e-02, -8.20312500e-02,
       -3.63769531e-02, -8.59375000e-02, -9.86328125e-02,  7.78198242e-03,
       -1.34277344e-02,  5.27343750e-02,  1.48437500e-01,  3.33984375e-01,
        1.66015625e-02, -2.12890625e-01, -1.50756836e-02,  5.24902344e-02,
       -1.07421875e-01, -8.88671875e-02,  2.49023438e-01, -7.03125000e-02,
       -1.59912109e-02,  7.56835938e-02, -7.03125000e-02,  1.19140625e-01,
        2.29492188e-01,  

<font size = 5 color = seagreen> <b>Unzip the target variable into separate (but corresponding) samples.

In [54]:
def collect_expected(dataset):
    expected = []
    for sample in dataset:
        expected.append(sample[0])
    return expected

<font size = 5 color = seagreen> <b>Decompress the downloaded data

<div class="alert alert-block alert-success">   
<font size=4 >
    
- Uncomment the below code to decompress.<br>
- Replace the path with respective paths to the file on your machine/lab.<br>


In [55]:
import tarfile

#open file
file = tarfile.open('aclImdb_v1.tar.gz')
#extracting file
file.extractall('/content/sample_data')
file.close()

<font size = 5 color = seagreen> <b> Data preprocessing - using the `pre_process_data()`, created earlier

In [56]:
dataset = pre_process_data('/content/sample_data/aclImdb/train')
vectorized_data = tokenize_and_vectorize(dataset)
expected = collect_expected(dataset)

In [57]:
split_point = int(len(vectorized_data)*.8)

x_train = vectorized_data[:split_point]
y_train = expected[:split_point]
x_test = vectorized_data[split_point:]
y_test = expected[split_point:]

In [58]:
len(x_train), len(x_test), len(y_train), len(y_test)

(20000, 5000, 20000, 5000)


<div class="alert alert-block alert-info">
<font size = 4>
    
**Note:**
This step is an optional step and need not be used in case of availability required resources.


In [59]:
x_train_s = x_train[:1000]
x_test_s = x_test[:200]
y_train_s = y_train[:1000]
y_test_s = y_test[:200]

<font size = 5 color = seagreen> <b>Declare the hyperparameters for training

In [60]:
maxlen = 400            # arbitrary sequence length
batch_size = 32         # Number of sample sequences to pass through (and aggregate the error) before backpropagating
embedding_dims = 300    # from pre-trained word2vec model
epochs = 2

<font size = 5 color = seagreen> <b>Prepare the data by making each point of uniform length

In [61]:
def pad_trunc(data, maxlen):
    """ For a given dataset pad with zero vectors or truncate to maxlen """
    new_data = []

    # Create a vector of 0's the length of our word vectors
    zero_vector = []
    for _ in range(len(data[0][0])):
        zero_vector.append(0.0)

    for sample in data:

        if len(sample) > maxlen:
            temp = sample[:maxlen]
        elif len(sample) < maxlen:
            temp = sample
            additional_elems = maxlen - len(sample)
            for _ in range(additional_elems):
                temp.append(zero_vector)
        else:
            temp = sample
        new_data.append(temp)
    return new_data

In [62]:
x_train_s = pad_trunc(x_train_s, maxlen)
x_test_s = pad_trunc(x_test_s, maxlen)

<font size = 5 color = seagreen> <b>Reshape into a numpy data structure for compatibility

In [63]:
x_train_s = np.reshape(x_train_s, (len(x_train_s), maxlen, embedding_dims))
y_train_s = np.array(y_train_s)
x_test_s = np.reshape(x_test_s, (len(x_test_s), maxlen, embedding_dims))
y_test_s = np.array(y_test_s)

<font size = 5 color = seagreen> <b>Build a model: Start with a standard `Sequential()`, that is the layered Keras model.

In [64]:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, SimpleRNN

num_neurons = 50


model = Sequential()  # Initialize an empty Keras network

model.add(SimpleRNN(num_neurons, return_sequences=True, input_shape=(maxlen, embedding_dims))) # add recurrent layer
model.add(Dropout(.2)) # adding a drop-out layer

model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))

model.compile('rmsprop', 'binary_crossentropy',  metrics=['accuracy']) # compile recurrent network
print(model.summary())

None


In [65]:
model.fit(x_train_s, y_train_s,
          batch_size=batch_size,
          epochs=epochs,
          validation_data=(x_test_s, y_test_s))
model_structure = model.to_json()
with open("simplernn_model1.json", "w") as json_file:
    json_file.write(model_structure)

model.save_weights("simplernn_1.weights.h5")
print('Model saved.')

Epoch 1/2
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 212ms/step - accuracy: 0.4883 - loss: 0.8840 - val_accuracy: 0.5450 - val_loss: 0.6814
Epoch 2/2
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 179ms/step - accuracy: 0.7888 - loss: 0.4845 - val_accuracy: 0.6200 - val_loss: 0.6788
Model saved.


In [66]:
from keras.models import model_from_json
with open("simplernn_model1.json", "r") as json_file:
    json_string = json_file.read()
model = model_from_json(json_string)

model.load_weights('simplernn_1.weights.h5')

In [67]:
sample_1 = "I hate that the dismal weather that had me down for so long, when will it break! Ugh, when does happiness return?  The sun is blinding and the puffy clouds are too thin.  I can't wait for the weekend."

In [68]:
# We pass a dummy value in the first element of the tuple just because our helper expects it from the way processed the initial data.  That value won't ever see the network, so it can be whatever.
vec_list = tokenize_and_vectorize([(1, sample_1)])

# Tokenize returns a list of the data (length 1 here)
test_vec_list = pad_trunc(vec_list, maxlen)

test_vec = np.reshape(test_vec_list, (len(test_vec_list), maxlen, embedding_dims))

In [69]:
model.predict(test_vec)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 158ms/step


array([[0.30947044]], dtype=float32)

In [70]:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, SimpleRNN

num_neurons = 100


model = Sequential()

model.add(SimpleRNN(num_neurons, return_sequences=True, input_shape=(maxlen, embedding_dims)))
model.add(Dropout(.2))

model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))

model.compile('rmsprop', 'binary_crossentropy',  metrics=['accuracy'])
print(model.summary())

None


In [71]:
model.fit(x_train_s, y_train_s,
          batch_size=batch_size,
          epochs=epochs,
          validation_data=(x_test_s, y_test_s))
model_structure = model.to_json()
with open("simplernn_model2.json", "w") as json_file:
    json_file.write(model_structure)

model.save_weights("simplernn_2.weights.h5")
print('Model saved.')

Epoch 1/2
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 246ms/step - accuracy: 0.5036 - loss: 1.5717 - val_accuracy: 0.5000 - val_loss: 0.7512
Epoch 2/2
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 165ms/step - accuracy: 0.7781 - loss: 0.4780 - val_accuracy: 0.5350 - val_loss: 0.9040
Model saved.


[top](#r0)

<font size = 6 color = 'pwdrblue'>**Build the bi-directional recurrent neural network**

<div class="alert alert-block alert-success">   
<font size=4 >

- While humans typically read sentences sequentially, they possess the ability to mentally backtrack to earlier portions as new information surfaces. <br>
- Humans can adeptly handle information presented in a less-than-optimal order. Enabling your model to similarly backtrack across the input is facilitated by bidirectional recurrent neural networks.

# <center> <img src = "https://upload.wikimedia.org/wikipedia/commons/3/35/Structural_diagrams_of_unidirectional_and_bidirectional_recurrent_neural_networks.png" width = 800>

# <center> <img src="https://drive.google.com/uc?export=view&id=1FkmDT01c86RzYbfKsLKvOWhY4Tq7LOx-" width = 800>

In [72]:
from keras.models import Sequential
from keras.layers import SimpleRNN
from tensorflow.keras.layers import Bidirectional

num_neurons = 10
maxlen = 100
embedding_dims = 300

model = Sequential()
model.add(Bidirectional(SimpleRNN(
      num_neurons, return_sequences=True),\
    input_shape=(maxlen, embedding_dims)))

  super().__init__(**kwargs)


[top](#r0)

<font size = 6 color = 'pwdrblue'> **Enhancing memory retention using long short-term memory networks**

# <center> <img src = "https://upload.wikimedia.org/wikipedia/commons/9/93/LSTM_Cell.svg" width = 700>

In [73]:

from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM, GlobalMaxPooling1D

# Assuming maxlen and embedding_dims are defined
num_neurons = 50

# Initialize the Sequential model
model = Sequential()

# Add LSTM layer with return_sequences=True
model.add(LSTM(num_neurons, return_sequences=True, input_shape=(maxlen, embedding_dims)))

# Add Dropout layer for regularization
model.add(Dropout(0.2))

# Use GlobalMaxPooling1D to reduce the sequence output to a single vector
model.add(GlobalMaxPooling1D())

# Add Dense layer for binary classification
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])

# Print the model summary to verify the layer shapes
print(model.summary())

None


In [74]:
# Fit the model
model.fit(
    x_train_s, y_train_s,
    batch_size=batch_size,
    epochs=epochs,
    validation_data=(x_test_s, y_test_s)
)

# Save the model architecture to JSON
model_structure = model.to_json()
with open("lstm_model1.json", "w") as json_file:
    json_file.write(model_structure)

# Save the model weights
model.save_weights("lstm_1.weights.h5")
print('Model saved.')

Epoch 1/2
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 429ms/step - accuracy: 0.5032 - loss: 0.6966 - val_accuracy: 0.5500 - val_loss: 0.6630
Epoch 2/2
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 384ms/step - accuracy: 0.6528 - loss: 0.6387 - val_accuracy: 0.7550 - val_loss: 0.5553
Model saved.


In [75]:
from keras.models import model_from_json
with open("lstm_model1.json", "r") as json_file:
    json_string = json_file.read()
model = model_from_json(json_string)

model.load_weights('lstm_1.weights.h5')

In [76]:
sample_1 = "I hate that the dismal weather that had me down for so long, when will it break! Ugh, when does happiness return?  The sun is blinding and the puffy clouds are too thin.  I can't wait for the weekend."

# Pass a dummy value in the first element of the tuple just because our helper expects it from the way processed the initial data.  That value won't ever see the network, so it can be whatever.
vec_list = tokenize_and_vectorize([(1, sample_1)])

# Tokenize returns a list of the data (length 1 here)
test_vec_list = pad_trunc(vec_list, maxlen)

test_vec = np.reshape(test_vec_list, (len(test_vec_list), maxlen, embedding_dims))

print("Sample's sentiment, 1 - pos, 2 - neg : {}".format(model.predict(test_vec)))
print("Raw output of sigmoid function: {}".format(model.predict(test_vec)))

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 179ms/step
Sample's sentiment, 1 - pos, 2 - neg : [[0.31449306]]
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step
Raw output of sigmoid function: [[0.31449306]]


[top](#r0)

##### <a id = 'r5'>
<font size = 10 color = 'midnightblue'>**Take aways**

<div class="alert alert-block alert-success">
<font size = 4>
    
- **Temporal Importance in Natural Language Sequences:**

> Recognizing the relevance of preceding elements, whether words or characters, is vital for the model's understanding of natural language sequences.

- **Temporal Dimension Splitting:**

> Dividing a natural language statement along the temporal dimension of tokens allows the machine to deepen its comprehension of language.

- **Challenges in RNN Gradients:**

> RNNs, being inherently deep, face challenges with gradients, such as vanishing or exploding gradients, making them particularly sensitive.

- **Efficient Modeling with RNNs:**
    
> Efficiently modeling natural language character sequences became feasible with the application of recurrent neural networks to the task.
    
- **Aggregate Weight Adjustment:**
    
> In an RNN, weights are adjusted collectively across time for a given sample, contributing to the network's understanding of temporal dependencies.
    
- **Methods for Analyzing RNN Outputs:**

> Various methods, including simultaneous backward and forward processing through an RNN, can be employed to examine the output of recurrent neural nets.