# **Week 9: RNN GRU Seq2seq**

## **1.  RNN and GRU.**
1-1. RNN : Recurrent Neural Network <br>
1-2. GRU : Gated Recurrent Units <br>
1-3. Padding


## **2.  Seq2seq model with attention.**
2-1. Encoder <br>
2-2. Decoder with Attention <br>
2-3. Training <br>
2-4. Evaluation <br>


reference  
https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html \
https://towardsdatascience.com/sentiment-analysis-using-lstm-step-by-step-50d074f09948

## **1. Implement RNN and GRU**


### **0) Import required packages**

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F

### **1) Recurrent Neural Network (RNN)**

Before we start, let's implement the simple RNN.

![picture](https://miro.medium.com/max/558/1*sx2s7DD8OuVqC2TUP46Whw.png)

In [2]:
import torch.nn as nn
import torch.nn.functional as F

class RNNCell(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNNCell, self).__init__()

        self.hidden_size = hidden_size
        self.W = nn.Linear(hidden_size, hidden_size, bias=True)
        self.U = nn.Linear(input_size, hidden_size, bias=True)
        self.V = nn.Linear(hidden_size, output_size, bias=True)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, x, h):
        h = F.tanh(self.W(h)+self.U(x))
        o = self.V(h)
        y = self.softmax(o)

        return y, h

In [3]:
## Single RNN Cell
# input_size = 3
# hidden_size = 4

x = torch.randn(1, 3)   # batch_size, feature
h0 = torch.randn(1, 4)  # batch_size, hidden_size
rnn1 = RNNCell(input_size = 3, hidden_size = 4, output_size = 4)
output, hidden = rnn1(x, h0)

print('output shape:', output.shape)
print('output:', output.data)
print('hidden shape:', hidden.shape)
print('hidden:', hidden.data)

output shape: torch.Size([1, 4])
output: tensor([[-1.3883, -1.1979, -1.2751, -1.7765]])
hidden shape: torch.Size([1, 4])
hidden: tensor([[-0.4688,  0.6666, -0.4333,  0.6679]])


In the following sections, we will learn basic usage of pytorch RNN layer with some example codes and practices.

**PARAMETERS**

*   **input_size**: the number of expected features in the input x
*   **hidden_size**: the number of features in the hidden state h
*   **num_layers**: number of recurrent layers
*   **bidirectional**: if `True`, becomes bidirectional RNN
*   **batch_first**: if `True`, the input and output tensors are provided as `(batch, seq, feature)`


**INPUT**
*    **input**: data of shape (batch_size, seq_len, input_size) if **batch_first** = `True`
*    **h0**: tensor containing the initial hidden state of shape (num_layers * num_directions, batch_size, hidden_size)


**OUTPUT**
*    **output**: tensor containing the output features from the last layer of the RNN, for each t provided as `(batch, seq, hidden_size)` if **batch_first** = `True`
*    **hidden**: tensor containing the hidden state for the last time step, provided as `(batch, num_layers * num_directionals, hidden_size)` if **batch_first** = `True`



Detailed explanation and default values are available on the official site
(https://pytorch.org/docs/stable/generated/torch.nn.RNN.html)

In [4]:
## RNN with 2 layers
# input_size = 3
# seq_len = 5
# hidden_size = 4

x = torch.randn(1, 5, 3)   # batch_size, seq_len, feature
h0 = torch.randn(2, 1, 4) # num_layers * num_directions, batch_size, hidden_size
rnn2 = nn.RNN(input_size=3, hidden_size=4, num_layers=2, bidirectional = False, batch_first=True)
output, hidden = rnn2(x, h0)
print('RNN with 2 layers')
print('output shape:', output.shape)
print('output:', output.data)
print('hidden shape:', hidden.shape)
print('hidden:', hidden.data)

RNN with 2 layers
output shape: torch.Size([1, 5, 4])
output: tensor([[[ 0.6947, -0.7187,  0.1474,  0.8146],
         [ 0.2425,  0.2043,  0.3520,  0.9608],
         [-0.1329,  0.3523,  0.4403,  0.9448],
         [-0.4013,  0.5772,  0.0994,  0.8798],
         [-0.4624,  0.8396, -0.3480,  0.5773]]])
hidden shape: torch.Size([2, 1, 4])
hidden: tensor([[[ 0.6594, -0.5729,  0.7091,  0.5817]],

        [[-0.4624,  0.8396, -0.3480,  0.5773]]])


In [5]:
## Bidirectional RNN
# input_size = 3
# seq_len = 5
# hidden_size = 4

x = torch.randn(1, 5, 3)   # batch_size, seq_len, feature
h0 = torch.randn(2, 1, 4) # num_layers * num_directions, batch_size, hidden_size
rnn3 = nn.RNN(input_size=3, hidden_size=4, num_layers=1, bidirectional=True, batch_first=True)
output, hidden = rnn3(x, h0)
print('Bidirectional RNN')
print('output shape:', output.shape)
print('output:', output.data)
print('hidden shape:', hidden.shape)
print('hidden:', hidden.data)

Bidirectional RNN
output shape: torch.Size([1, 5, 8])
output: tensor([[[ 0.7568, -0.5379, -0.8527, -0.9606,  0.0326,  0.2968, -0.0707,
           0.4100],
         [ 0.1555, -0.4317, -0.7704,  0.1452, -0.3424,  0.3487, -0.1918,
           0.2409],
         [ 0.5456, -0.4131, -0.8978, -0.0057, -0.6481, -0.3502,  0.0561,
          -0.3682],
         [ 0.5072, -0.5802, -0.7824, -0.3172, -0.0979,  0.3126, -0.1089,
           0.6428],
         [ 0.1224, -0.8063, -0.4432, -0.8384, -0.4404,  0.2281,  0.8903,
           0.0561]]])
hidden shape: torch.Size([2, 1, 4])
hidden: tensor([[[ 0.1224, -0.8063, -0.4432, -0.8384]],

        [[ 0.0326,  0.2968, -0.0707,  0.4100]]])


In [6]:
###########################################################################################################
###TO-DO: Make your own RNN!
###Here, you can make your own RNN changing the dimensions, number of layers, bidirectional etc.


class RNN(nn.Module):
  def __init__(self, input_size, hidden_size, output_size, num_layers, bidirectional):
    super().__init__()
    self.input_size, self.hidden_size, self.output_size = input_size, hidden_size, output_size
    self.num_layers = num_layers
    self.num_directions = 2 if bidirectional else 1

    self.rnn = nn.RNN(input_size=self.input_size, hidden_size = self.hidden_size, num_layers=self.num_layers, bidirectional=bidirectional, batch_first=True)
    self.output_fc = nn.Linear(self.hidden_size * self.num_directions, output_size)

  def forward(self, x):
    self.batch_size = x.size(0)
    h0 = self.init_hidden()
    output, hidden = self.rnn(x, h0)
    output = self.output_fc(output[:, -1, :])
    return output

  def init_hidden(self):
    return torch.zeros(self.num_layers * self.num_directions, self.batch_size, self.hidden_size)

In [7]:
## Run your own RNN!
input_size = 3 ; hidden_size = 4; output_size = 2
num_layers = 1
bidirectional = True

x = torch.randn(1, 4, input_size)   # batch_size, seq_len, input_size

rnn = RNN(input_size, hidden_size, output_size, num_layers, bidirectional)
output = rnn(x)

print('output shape:', output.shape)
print('output: ', output.data)

output shape: torch.Size([1, 2])
output:  tensor([[-0.6006, -0.1572]])


### **2) Gated Recurrent Unit (GRU)**

In PyTorch, GRU layer is given with the pytorch `torch.nn.GRU` package. In this section, we will learn basic usage of pytorch GRU layer with some example codes and practices.

**PARAMETERS**

*   **input_size**: the number of expected features in the input x
*   **hidden_size**: the number of features in the hidden state h
*   **num_layers**: number of recurrent layers
*   **bidirectional**: if `True`, becomes bidirectional RNN
*   **batch_first**: if `True`, then the input and output tensors are provided as `(batch, seq, feature)`

**INPUT**
*    **input**: data of shape (batch_size, seq_len, input_size) if **batch_first** = `True`
*    **h0**: tensor containing the initial hidden state of shape (num_layers * num_directions, batch_size, hidden_size)


**OUTPUT**
*    **output**: tensor containing the output features from the last layer of the RNN, for each t provided as `(batch, seq, hidden_size)` if **batch_first** = `True`
*    **hidden**: tensor containing the hidden state for the last time step, provided as `(batch_size, num_layers * num_directionals, hidden_size)` if **batch_first** = `True`



Detailed explanation and default values are available on the official site
(https://pytorch.org/docs/stable/generated/torch.nn.GRU.html)

In [8]:
# GRU with 1 layer
# input_size = 3
# seq_len = 5
# hidden_size = 4

x = torch.randn(1, 5, 3)   # batch_size, seq_len, feature
h0 = torch.randn(1, 1, 4)  # num_layers * num_directions, batch_size, hidden_size
gru1 = nn.GRU(input_size = 3, hidden_size = 4, num_layers = 1, bidirectional = False, batch_first = True)
output, hidden = gru1(x, h0)

print('GRU with 1 layer')
print('output shape:', output.shape)
print('output:', output.data)
print('hidden shape:', hidden.shape)
print('hidden:', hidden.data)

GRU with 1 layer
output shape: torch.Size([1, 5, 4])
output: tensor([[[ 1.0315, -0.5076, -0.0039, -0.4675],
         [ 0.3735, -0.2203,  0.1027, -0.3931],
         [-0.0317, -0.1871, -0.6386,  0.2288],
         [-0.0837, -0.0909, -0.1136,  0.2738],
         [-0.1474,  0.0662, -0.0428,  0.2876]]])
hidden shape: torch.Size([1, 1, 4])
hidden: tensor([[[-0.1474,  0.0662, -0.0428,  0.2876]]])


In [9]:
# GRU with 2 layer
# input_size = 3
# seq_len = 5
# hidden_size = 4

x = torch.randn(1, 5, 3)   # batch_size, seq_len, feature
h0 = torch.randn(2, 1, 4) # num_layers * num_directions, batch_size, hidden_size
gru2 = nn.GRU(input_size=3, hidden_size=4, num_layers=2, bidirectional = False, batch_first=True)
output, hidden = gru2(x, h0)
print('GRU with 2 layers')
print('output shape:', output.shape)
print('output:', output.data)
print('hidden shape:', hidden.shape)
print('hidden:', hidden.data)
print()

GRU with 2 layers
output shape: torch.Size([1, 5, 4])
output: tensor([[[-0.0597,  0.0814, -0.5560,  0.2593],
         [ 0.2085, -0.1170, -0.3164,  0.0947],
         [ 0.0894, -0.3699, -0.0379,  0.0216],
         [ 0.2878, -0.3556,  0.0108,  0.0086],
         [ 0.3910, -0.2485,  0.0363, -0.0519]]])
hidden shape: torch.Size([2, 1, 4])
hidden: tensor([[[ 0.3435,  0.4623,  0.1109,  0.2069]],

        [[ 0.3910, -0.2485,  0.0363, -0.0519]]])



In [10]:
# bidirectional GRU
# input_size = 3
# seq_len = 5
# hidden_size = 4

x = torch.randn(1, 5, 3)   # batch_size, seq_len, feature
h0 = torch.randn(2, 1, 4) # num_layers * num_directions, batch_size, hidden_size
gru3 = nn.GRU(input_size=3, hidden_size=4, num_layers=1, bidirectional=True, batch_first=True)
output, hidden = gru3(x, h0)
print('Bidirectional GRU')
print('output shape:', output.shape)
print('output:', output.data)
print('hidden shape:', hidden.shape)
print('hidden:', hidden.data)

Bidirectional GRU
output shape: torch.Size([1, 5, 8])
output: tensor([[[ 0.3211, -0.0923, -0.6631, -0.2137, -0.4275, -0.0317,  0.5445,
           0.0427],
         [-0.1161, -0.2310, -0.0242, -0.1823, -0.0792, -0.2338,  0.3013,
           0.3305],
         [ 0.4185, -0.0058,  0.1077, -0.2110, -0.2793, -0.0235,  0.5988,
           0.1269],
         [ 0.1764, -0.0087,  0.4716, -0.1515,  0.3714, -0.6782,  0.2772,
           0.2965],
         [ 0.0130, -0.0178,  0.2857, -0.1817,  0.1973, -0.6664,  0.1710,
          -0.0857]]])
hidden shape: torch.Size([2, 1, 4])
hidden: tensor([[[ 0.0130, -0.0178,  0.2857, -0.1817]],

        [[-0.4275, -0.0317,  0.5445,  0.0427]]])


In [11]:
################################################################
### TODO: Make your own GRU!
### Here, you can make your own GRU changing the dimensions, number of layers, bidirectional etc.

class GRU(nn.Module):
  def __init__(self, input_size, hidden_size, output_size, num_layers, bidirectional):
    super().__init__()
    self.input_size, self.hidden_size, self.output_size = input_size, hidden_size, output_size
    self.num_layers = num_layers
    self.num_directions = 2 if bidirectional else 1

    self.gru = nn.GRU(input_size=self.input_size, hidden_size = self.hidden_size, num_layers=self.num_layers, bidirectional=bidirectional, batch_first=True)
    self.output_fc = nn.Linear(self.hidden_size * self.num_directions, output_size)

  def forward(self, x):
    self.batch_size = x.size(0)
    h0 = self.init_hidden()
    output, hidden = self.gru(x, h0)
    output = self.output_fc(output[:, -1, :])
    return output

  def init_hidden(self):
    return torch.zeros(self.num_layers * self.num_directions, self.batch_size, self.hidden_size)

In [12]:
## Run your own GRU!
input_size = 3 ; hidden_size = 4; output_size = 2
num_layers = 1
bidirectional = True

x = torch.randn(1, 4, input_size)   # batch_size, seq_len, input_size

gru = GRU(input_size, hidden_size, output_size, num_layers, bidirectional)
output = gru(x)

print('output shape:', output.shape)
print('output: ', output.data)

output shape: torch.Size([1, 2])
output:  tensor([[-0.4308,  0.2253]])


### **3) Padding**

We can make better use of the GPU by training on batches of many sequences at once, but doing so brings up the question of **how to deal with sequences of variable lengths**. The simple solution is to **"pad"** the shorter sentences with some padding symbol (in this case 0).

In [13]:
from string import punctuation
from collections import Counter
import numpy as np

import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset

import random
import pandas as pd

In [29]:
!wget -O imdb.csv https://www.dropbox.com/scl/fi/mregn7xd8n8wi03b7918k/imdb.csv?rlkey=cfv5lry44tcwtdy4tethopiym&dl=0

--2023-11-01 12:24:40--  https://www.dropbox.com/scl/fi/mregn7xd8n8wi03b7918k/imdb.csv?rlkey=cfv5lry44tcwtdy4tethopiym
Resolving www.dropbox.com (www.dropbox.com)... 162.125.81.18, 2620:100:6031:18::a27d:5112
Connecting to www.dropbox.com (www.dropbox.com)|162.125.81.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://ucefa52d61e91d31582f88fe07e1.dl.dropboxusercontent.com/cd/0/inline/CGtv2fpnWIgSJYvyGOc9-WHygbejg-7GDNPfrcoq_DmZlbNKFAPpwyVow56v0IU-CtJbxJRaoR8VVvg3Yslb5zlnp21bSP5ovFBNLNnQ9ajQnOjrMmn3JOFoTnZN4pQglXkE_3Breo9ZyKHsrl88hef3/file# [following]
--2023-11-01 12:24:41--  https://ucefa52d61e91d31582f88fe07e1.dl.dropboxusercontent.com/cd/0/inline/CGtv2fpnWIgSJYvyGOc9-WHygbejg-7GDNPfrcoq_DmZlbNKFAPpwyVow56v0IU-CtJbxJRaoR8VVvg3Yslb5zlnp21bSP5ovFBNLNnQ9ajQnOjrMmn3JOFoTnZN4pQglXkE_3Breo9ZyKHsrl88hef3/file
Resolving ucefa52d61e91d31582f88fe07e1.dl.dropboxusercontent.com (ucefa52d61e91d31582f88fe07e1.dl.dropboxusercontent.com)... 162.125.81.15, 2620:1

In [30]:
imdb = pd.read_csv('imdb.csv')[:500]

In [31]:
print(imdb)

                                                review sentiment
0    One of the other reviewers has mentioned that ...  positive
1    A wonderful little production. <br /><br />The...  positive
2    I thought this was a wonderful way to spend ti...  positive
3    Basically there's a family where a little boy ...  negative
4    Petter Mattei's "Love in the Time of Money" is...  positive
..                                                 ...       ...
495  "American Nightmare" is officially tied, in my...  negative
496  First off, I have to say that I loved the book...  negative
497  This movie was extremely boring. I only laughe...  negative
498  I was disgusted by this movie. No it wasn't be...  negative
499  Such a joyous world has been created for us in...  positive

[500 rows x 2 columns]


### **0) Preprocessing**

In [32]:
# lower case
imdb['review'] = imdb['review'].str.lower()
imdb['review'] = imdb['review'].apply(lambda x: ''.join([c for c in x if c not in punctuation]))

# create list of reviews
reviews_split = imdb.review.values.tolist()
labels_split = imdb.sentiment.values.tolist()
print ('Number of reviews :', len(reviews_split))

## tokenize - create vocab to int mapping dictionary
all_text2 = ' '.join(reviews_split)
# create a list of words
words = all_text2.split()
# Count all the words using Counter Method
count_words = Counter(words)

total_words = len(words)
sorted_words = count_words.most_common(total_words)
print("count_words")
print(count_words)
print()

# make vocab2int
#There is a small trick here, in this mapping index will start from 0
#i.e. mapping of ‘the’ will be 0. But later on we are going to do padding for shorter reviews and conventional choice for padding is 0. So we need to start this indexing from 1
vocab_to_int = {w:i+1 for i, (w,c) in enumerate(sorted_words)}
print("vocab_to_int")
print(vocab_to_int)

Number of reviews : 500
count_words

vocab_to_int


In [33]:
encoded_labels = [1 if label =='positive' else 0 for label in labels_split]
encoded_labels = np.array(encoded_labels)

In [34]:
# Change input to index
reviews_int = []
for review in reviews_split:
    r = [vocab_to_int[w] for w in review.split()]
    reviews_int.append(r)

print (reviews_int[0:3])

[[31, 4, 1, 81, 3156, 45, 757, 11, 105, 162, 40, 490, 1759, 535, 491, 29, 1760, 37, 24, 226, 13, 8, 6, 637, 42, 921, 15, 2058, 12, 1, 91, 148, 11, 1249, 65, 41, 1759, 14, 30, 6378, 2, 4232, 124, 4, 413, 53, 281, 7, 226, 34, 1, 460, 125, 1381, 65, 8, 6, 23, 3, 185, 16, 1, 6379, 4233, 38, 6380, 8, 185, 2488, 50, 4234, 15, 3157, 5, 758, 376, 38, 413, 30, 6, 3158, 7, 1, 390, 301, 4, 1, 6381, 12, 10, 6, 476, 1759, 13, 11, 6, 1, 4235, 302, 5, 1, 4236, 3159, 3160, 1250, 6382, 10, 4237, 1536, 20, 6383, 613, 35, 3161, 4238, 4, 1, 638, 102, 28, 1, 3162, 26, 2489, 3163, 2, 377, 6384, 36, 6385, 6, 23, 402, 20, 1, 3164, 2490, 613, 6, 282, 5, 6386, 3165, 6387, 6388, 4239, 6389, 2491, 2, 6390, 6391, 434, 6392, 3166, 6393, 2, 6394, 6395, 24, 107, 216, 6396, 12, 9, 57, 149, 1, 256, 922, 4, 1, 185, 6, 477, 5, 1, 231, 11, 10, 275, 102, 81, 391, 614, 1761, 759, 150, 1155, 6397, 16, 2059, 1156, 759, 1537, 759, 6398, 172, 852, 198, 1, 91, 535, 9, 128, 236, 1249, 65, 13, 36, 760, 10, 14, 6399, 9, 378, 149, 9

In [35]:
## Return features of review_ints, where each review is padded with 0's or truncated to the input seq_length.
def pad_features(reviews_int, seq_length):

    features = np.zeros((len(reviews_int), seq_length), dtype = int)

    for i, review in enumerate(reviews_int):
        review_len = len(review)

        if review_len <= seq_length:
            zeroes = list(np.zeros(seq_length-review_len))
            new = zeroes+review
        elif review_len > seq_length:
            new = review[0:seq_length]

        features[i,:] = np.array(new)

    return features

In [36]:
from torch.nn.utils.rnn import pad_sequence

example_reviews = []
for i in range(len(reviews_int)):
  example_reviews.append(torch.Tensor(reviews_int[i]))

example_padded = pad_sequence(example_reviews, batch_first=True)

In [37]:
seq_length = 200
indexed_reviews = pad_features(reviews_int, seq_length)

In [38]:
# Dataloader
# create Tensor Dataset

split_frac = 0.8
train_x = indexed_reviews[0:int(split_frac * len(indexed_reviews))]
train_y = encoded_labels[0:int(split_frac * len(indexed_reviews))]

test_x = indexed_reviews[int(split_frac * len(indexed_reviews)):]
test_y = encoded_labels[int(split_frac * len(indexed_reviews)):]

train_data = TensorDataset(torch.from_numpy(train_x), torch.from_numpy(train_y))
test_data = TensorDataset(torch.from_numpy(test_x), torch.from_numpy(test_y))

batch_size = 50
train_loader = DataLoader(train_data, shuffle=True, batch_size=batch_size)
test_loader = DataLoader(test_data, shuffle=True, batch_size=batch_size)

In [40]:
# Check the data
samp_dataiter = iter(train_loader)
sample_x, sample_y = next(samp_dataiter)

print('Sample input size: ', sample_x.size()) # batch_size, seq_length
print('Sample input: \n', sample_x)
print()
print('Sample label size: ', sample_y.size()) # batch_size
print('Sample label: \n', sample_y)

Sample input size:  torch.Size([50, 200])
Sample input: 
 tensor([[    0,     0,     0,  ...,    28,     6,  5779],
        [  230,    78,  2649,  ...,    25,   594,   117],
        [  677,     9,  1557,  ...,    39,     1, 10248],
        ...,
        [   35,  5412,   778,  ...,    54,   559,     2],
        [  134,   294,  1704,  ...,    59,  1838,    20],
        [    0,     0,     0,  ...,    34,     1,   357]])

Sample label size:  torch.Size([50])
Sample label: 
 tensor([0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0,
        1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1,
        0, 1])


In [41]:
# Sentiment LSTM
class SentimentLSTM(nn.Module):
  def __init__(self, vocab_size, output_size, embedding_dim, hidden_dim, n_layers):
    super().__init__()

    self.output_size = output_size
    self.n_layers = n_layers
    self.hidden_dim = hidden_dim

    self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx = 0)
    self.lstm = nn.LSTM(embedding_dim, hidden_dim, n_layers, batch_first = True)

    self.output_fc = nn.Linear(hidden_dim, output_size)
    self.sig = nn.Sigmoid()

  def forward(self, x):
    self.batch_size = x.size(0)
    embeds = self.embedding(x)
    h0, c0 = self.init_hidden()

    output, hidden = self.lstm(embeds, (h0, c0))
    output = self.output_fc(output[:, -1, :])
    output = self.sig(output)
    return output

  def init_hidden(self):
    h0 = torch.zeros(self.n_layers, self.batch_size, self.hidden_dim)
    c0 = torch.zeros(self.n_layers, self.batch_size, self.hidden_dim)
    return h0, c0

In [42]:
# Let's train our model
vocab_size = len(vocab_to_int) + 1   # +1 for the 0 padding
output_size = 1
embedding_dim = 400
hidden_dim = 256
n_layers = 1

sentiment_net = SentimentLSTM(vocab_size, output_size, embedding_dim, hidden_dim, n_layers)

lr = 0.01
n_epochs = 20
counter = 0

criterion = nn.BCELoss()
opt = torch.optim.Adam(sentiment_net.parameters(), lr=lr)

for epoch in range(n_epochs + 1):
  for inputs, labels in train_loader:
    counter += 1
    output = sentiment_net(inputs)
    pred = (output.squeeze() > 0.5).float()
    acc = torch.mean((pred == labels).float())

    loss = criterion(output.squeeze(), labels.float())

    opt.zero_grad()
    loss.backward()
    opt.step()

  sentiment_net.eval()
  test_losses = []
  test_acc = []

  for inputs, labels in test_loader:
    output = sentiment_net(inputs)
    test_loss = criterion(output.squeeze(), labels.float())
    test_losses.append(test_loss.item())

    pred = (output.squeeze() > 0.5).float()
    acc = torch.mean((pred == labels).float())
    test_acc.append(acc.item())

  sentiment_net.train()
  print("Epoch: {}/{} ".format(epoch, n_epochs),
        "Step: {} ".format(counter),
        "Loss: {:.6f} ".format(loss.item()),
        "Acc: {:.6f} ".format(acc.item()),
        "Test Loss: {:.6f} ".format(np.mean(test_losses)),
        "Test Acc: {:.6f} ".format(np.mean(test_acc)))

Epoch: 0/20  Step: 8  Loss: 0.680561  Acc: 0.480000  Test Loss: 0.711375  Test Acc: 0.500000 
Epoch: 1/20  Step: 16  Loss: 0.290083  Acc: 0.600000  Test Loss: 0.860036  Test Acc: 0.600000 
Epoch: 2/20  Step: 24  Loss: 0.052536  Acc: 0.580000  Test Loss: 1.146805  Test Acc: 0.570000 
Epoch: 3/20  Step: 32  Loss: 0.067580  Acc: 0.520000  Test Loss: 1.668201  Test Acc: 0.540000 
Epoch: 4/20  Step: 40  Loss: 0.000962  Acc: 0.540000  Test Loss: 1.850822  Test Acc: 0.530000 
Epoch: 5/20  Step: 48  Loss: 0.000289  Acc: 0.440000  Test Loss: 1.842130  Test Acc: 0.540000 
Epoch: 6/20  Step: 56  Loss: 0.000402  Acc: 0.500000  Test Loss: 1.854905  Test Acc: 0.540000 
Epoch: 7/20  Step: 64  Loss: 0.000202  Acc: 0.420000  Test Loss: 1.872337  Test Acc: 0.540000 
Epoch: 8/20  Step: 72  Loss: 0.000297  Acc: 0.620000  Test Loss: 1.888909  Test Acc: 0.550000 
Epoch: 9/20  Step: 80  Loss: 0.000197  Acc: 0.520000  Test Loss: 1.906363  Test Acc: 0.540000 
Epoch: 10/20  Step: 88  Loss: 0.000120  Acc: 0.6000

In [43]:
# Evaluation
sentiment_net.eval()

random.seed(1234)

def get_key(val, my_dict):
  for key, value in my_dict.items():
    if val == value:
      return key

review, label = random.choice(train_data)
review_sentence = [get_key(rev, vocab_to_int) for rev in review]
print("input sentence: ", list(filter(None, review_sentence)))

output = sentiment_net(review.unsqueeze(0))   # additional dimension for batch
sentiment = 'negative' if output <= 0.5 else 'positive'
print('sentiment:', sentiment)

print('============================================================================')

random.seed(7532)

review, label = random.choice(train_data)
review_sentence = [get_key(rev, vocab_to_int) for rev in review]
print("input sentence: ", list(filter(None, review_sentence)))

output = sentiment_net(review.unsqueeze(0))   # additional dimension for batch
sentiment = 'negative' if output <= 0.5 else 'positive'
print('sentiment:', sentiment)

input sentence:  ['after', 'a', 'snowstorm', 'the', 'roads', 'are', 'blocked', 'and', 'the', 'highway', 'patrolman', 'jason', 'adam', 'beach', 'comes', 'to', 'the', 'diner', 'of', 'his', 'friend', 'fritz', 'jurgen', 'prochnow', 'and', 'advises', 'his', 'clients', 'that', 'they', 'will', 'only', 'be', 'able', 'to', 'follow', 'their', 'trips', 'on', 'the', 'next', 'day', 'among', 'the', 'weird', 'strangers', 'jason', 'meets', 'his', 'former', 'sweetheart', 'nancy', 'rose', 'mcgowan', 'who', 'has', 'just', 'left', 'her', 'husband', 'in', 'los', 'angeles', 'along', 'the', 'night', 'without', 'any', 'communication', 'with', 'his', 'base', 'jason', 'faces', 'distressful', 'and', 'suspicious', 'situations', 'with', 'the', 'clients', 'and', 'finds', 'some', 'corpses', 'indicating', 'that', 'among', 'them', 'there', 'is', 'a', 'killerbr', 'br', 'the', 'last', 'stop', 'could', 'be', 'an', 'average', 'thriller', 'but', 'the', 'screenplay', 'is', 'simply', 'awful', 'most', 'of', 'the', 'characters

### **2. Seq2seq model with attention**
Here, you are going to translate French to English using seq2seq model.

*    Encoder : GRU
*    Decoder : GRU with attention

### **0) Preprocessing**

In [44]:
from __future__ import unicode_literals, print_function, division
from io import open
import unicodedata
import string
import re
import random
import numpy as np

from nltk.translate.bleu_score import sentence_bleu

import torch
import torch.nn as nn
from torch import optim
import torch.nn.functional as F

import matplotlib.pyplot as plt
plt.switch_backend('agg')
import matplotlib.ticker as ticker
%matplotlib inline

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

import warnings
warnings.filterwarnings('ignore')

In [46]:
!wget https://raw.githubusercontent.com/L1aoXingyu/seq2seq-translation/master/data/eng-fra.txt

--2023-11-01 12:35:11--  https://raw.githubusercontent.com/L1aoXingyu/seq2seq-translation/master/data/eng-fra.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9541158 (9.1M) [text/plain]
Saving to: ‘eng-fra.txt’


2023-11-01 12:35:12 (342 MB/s) - ‘eng-fra.txt’ saved [9541158/9541158]



We will need a unique index per word to use as the inputs and targets of the networks. To keep track of all this, we will use a class called `Lang` which has word -> index (`word2index`) and index -> word (`index2word`) dictionaries, as well as a count of each word word2count to use to later replace rare words.  

In [47]:
SOS_token = 0
EOS_token = 1

# Since there is a lot of example sentences and we want to train quickly, we will trim the data set to only relatively short and simple sentences.
# Here, the maximum length is 10 words (including ending punctuation)
# We are filtering to sentences that translate to the form "I am" or "He is" etc
MAX_LENGTH = 10

eng_prefixes = (
    "i am ", "i m ",
    "he is", "he s ",
    "she is", "she s ",
    "you are", "you re ",
    "we are", "we re ",
    "they are", "they re "
)

# The full process for preparing the data is:
#     - Read text file and split into lines, split lines into pairs
#     - Normalize text, filter by length and content
#     - Make word lists from sentences in pairs

# Tokenizer
# We’ll need a unique index per word to use as the inputs and targets of the networks later.
class Lang:
    def __init__(self, name):
        self.name = name
        self.word2index = {}
        self.word2count = {}
        self.index2word = {0: "SOS", 1: "EOS"}
        self.n_words = 2  # Count SOS and EOS

    def addSentence(self, sentence):
        for word in sentence.split(' '):
            self.addWord(word)

    def addWord(self, word):
        if word not in self.word2index:
            self.word2index[word] = self.n_words
            self.word2count[word] = 1
            self.index2word[self.n_words] = word
            self.n_words += 1
        else:
            self.word2count[word] += 1



# Turn a Unicode string to plain ASCII
# The file are all in Unicode, to simplify we will turn Unicode characters to ASCII, make everything lowercase, and trim most punctuation.
# For more information, refer https://stackoverflow.com/a/518232/2809427

def unicodeToAscii(s):
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'
    )


# Lowercase, trim, and remove non-letter characters

def normalizeString(s):
    s = unicodeToAscii(s.lower().strip())
    s = re.sub(r"([.!?])", r" \1", s)
    s = re.sub(r"[^a-zA-Z.!?]+", r" ", s)
    return s



# Read file
# If you want to translate Other Language -> English, then set `reverse = True`
def readLangs(lang1, lang2, reverse=False):
    print("Reading lines...")

    # Read the file and split into lines
    lines = open('./%s-%s.txt' % (lang1, lang2), encoding='utf-8').\
        read().strip().split('\n')

    # Split every line into pairs and normalize
    pairs = [[normalizeString(s) for s in l.split('\t')] for l in lines]

    # Reverse pairs, make Lang instances
    if reverse:
        pairs = [list(reversed(p)) for p in pairs]
        input_lang = Lang(lang2)
        output_lang = Lang(lang1)
    else:
        input_lang = Lang(lang1)
        output_lang = Lang(lang2)

    return input_lang, output_lang, pairs


def filterPair(p):
    return len(p[0].split(' ')) < MAX_LENGTH and \
        len(p[1].split(' ')) < MAX_LENGTH and \
        p[1].startswith(eng_prefixes)


def filterPairs(pairs):
    return [pair for pair in pairs if filterPair(pair)]


def prepareData(lang1, lang2, reverse=False):
    input_lang, output_lang, pairs = readLangs(lang1, lang2, reverse)
    print("Read %s sentence pairs" % len(pairs))
    pairs = filterPairs(pairs)
    print("Trimmed to %s sentence pairs" % len(pairs))
    print("Counting words...")
    for pair in pairs:
        input_lang.addSentence(pair[0])
        output_lang.addSentence(pair[1])
    print("Counted words:")
    print(input_lang.name, input_lang.n_words)
    print(output_lang.name, output_lang.n_words)
    return input_lang, output_lang, pairs


input_lang, output_lang, pairs = prepareData('eng', 'fra', True)
print(random.choice(pairs))

Reading lines...
Read 135842 sentence pairs
Trimmed to 10599 sentence pairs
Counting words...
Counted words:
fra 4345
eng 2803
['nous ne sommes pas a la maison .', 'we re not home .']


### **1) Encoder**

The encoder of a seq2seq network is a GRU that outpus some value for **every word** from the input sentence. For every input word, the encoder outputs a vector and a hidden state, and uses the hidden state for the next input word.

In [48]:
class Encoder(nn.Module):
  def __init__(self, input_size, hidden_size):
    super().__init__()
    self.hidden_size = hidden_size

    self.embedding = nn.Embedding(input_size, hidden_size)
    self.gru = nn.GRU(hidden_size, hidden_size)

  def forward(self, x, hidden):
    embedded = self.embedding(x).view(1, 1, -1)
    output, hidden = self.gru(embedded, hidden)
    return output, hidden

  def initHidden(self):
    return torch.zeros(1, 1, self.hidden_size, device = device)

### **2) Decoder**

The decoder is another GRU that takes the output vector(s) and outputs a sequence of words to create the translation. Here, we are going to use **Attention Decoder**

In [49]:
class AttnDecoder(nn.Module):
  def __init__(self, hidden_size, output_size, max_length=MAX_LENGTH):
    super().__init__()
    self.hidden_size = hidden_size
    self.output_size = output_size
    self.max_length = max_length

    self.embedding = nn.Embedding(self.output_size, self.hidden_size)
    self.attn_matrix = nn.Parameter(data = torch.ones((self.hidden_size, self.hidden_size)), requires_grad = True)
    self.gru = nn.GRU(self.hidden_size * 2, self.hidden_size)
    self.out = nn.Linear(self.hidden_size, self.output_size)

  def forward(self, x, hidden, encoder_outputs):
    embedded = self.embedding(x).view(1, 1, -1)

    attn_weights = torch.matmul(torch.matmul(hidden[0], self.attn_matrix), torch.transpose(encoder_outputs, 0, 1))  # 1 x max_length
    attn_weights = F.softmax(attn_weights, dim=1)

    attn_applied = torch.bmm(attn_weights.unsqueeze(1), encoder_outputs.view(1, -1, self.hidden_size))

    input_gru = torch.cat((attn_applied[0], embedded[0]), dim=1)

    output, hidden = self.gru(input_gru.unsqueeze(0), hidden)
    output = F.log_softmax(self.out(output[0]), dim=1)

    return output, hidden, attn_weights

  def initHidden(self):
    return torch.zeros(1, 1, self.hidden_size, device=device)

### **3) Train our model**

#### Preparing Training Data

To train, for each pair we will need an input tensor (indexes of the words in the input sentence) and target tensor (indexes of the words in the target sentence). While creating these vectors we will append the EOS token to both sequences.

In [50]:
def indexesFromSentence(lang, sentence):
    return [lang.word2index[word] for word in sentence.split(' ')]


def tensorFromSentence(lang, sentence):
    indexes = indexesFromSentence(lang, sentence)
    indexes.append(EOS_token)
    return torch.tensor(indexes, dtype=torch.long, device=device).view(-1, 1)


def tensorsFromPair(pair):
    input_tensor = tensorFromSentence(input_lang, pair[0])
    target_tensor = tensorFromSentence(output_lang, pair[1])
    return (input_tensor, target_tensor)

#### **BLEU(Bilingual Evaluation Understudy) score**

Like FID score you are already familiar with, there is a metric called **BLEU score** in NLP. BLEU score is a metric for evaluating a generated sentence to a reference sentence. </br>

BLEU score is calculated by comparing n-gram matches between each candidate translation to the reference translations. BLEU compares the n-gram of the candidate translation with n-gram of the reference translation to count the number of matches. These matches are independent of the positions where they occur.


NLTK provides the `sentence_bleu()` function for evaluating a candidate sentence against one or more reference sentences.


For more information about BLEU, please refer these websites:  
https://donghwa-kim.github.io/BLEU.html  
https://wikidocs.net/31695  
https://machinelearningmastery.com/calculate-bleu-score-for-text-python/  
https://towardsdatascience.com/bleu-bilingual-evaluation-understudy-2b4eab9bcfd1


In [None]:
hidden_size = 256
teacher_forcing_ratio = 0.5
max_length = MAX_LENGTH
n_iters = 55000   # Change this! (maybe 75000?)
n_epochs = 1
lr = 0.001

encoder = Encoder(input_size = input_lang.n_words, hidden_size = hidden_size).to(device)
decoder = AttnDecoder(hidden_size=hidden_size, output_size=output_lang.n_words, max_length=max_length).to(device)

encoder_opt = optim.Adam(encoder.parameters(), lr=lr)
decoder_opt = optim.Adam(decoder.parameters(), lr=lr)
criterion = nn.NLLLoss()

training_pairs = [tensorsFromPair(random.choice(pairs)) for i in range(n_iters)]

for epoch in range(n_epochs + 1):
  for iter in range(1, n_iters + 1):

    training_pair = training_pairs[iter - 1]
    input_tensor = training_pair[0]
    target_tensor = training_pair[1]

    # Encoder
    encoder_hidden = encoder.initHidden()

    input_length = input_tensor.size(0)
    target_length = target_tensor.size(0)

    encoder_outputs = torch.zeros(max_length, encoder.hidden_size, device=device)

    loss = 0

    for ei in range(input_length):
      encoder_output, encoder_hidden = encoder(input_tensor[ei], encoder_hidden)
      encoder_outputs[ei] = encoder_output[0, 0]

    # Decoder
    output_sentence = [output_lang.index2word[t.item()] for t in target_tensor]

    decoder_input = torch.tensor([[SOS_token]], device=device)
    decoder_hidden = encoder_hidden

    # Teacher forcing: Feed the target as the next input
    use_teacher_forcing = True if random.random() < teacher_forcing_ratio else False

    decoded_sentence = []
    if use_teacher_forcing:
      for di in range(target_length):
        decoder_output, decoder_hidden, decoder_attention = decoder(decoder_input, decoder_hidden, encoder_outputs)
        topv, topi = decoder_output.topk(1)
        decoded_sentence.append(output_lang.index2word[topi.item()])

        loss += criterion(decoder_output, target_tensor[di])
        decoder_input = target_tensor[di]  # Teacher forcing

    else:
      for di in range(target_length):
        decoder_output, decoder_hidden, decoder_attention = decoder(decoder_input, decoder_hidden, encoder_outputs)
        topv, topi = decoder_output.topk(1)   # value, indice
        decoded_sentence.append(output_lang.index2word[topi.item()])
        decoder_input = topi.squeeze().detach()   # detach from history as input

        loss += criterion(decoder_output, target_tensor[di])
        if decoder_input.item() == EOS_token:
          break


    encoder_opt.zero_grad()
    decoder_opt.zero_grad()

    loss.backward()

    encoder_opt.step()
    decoder_opt.step()

    # BLEU score
    bleu_score = sentence_bleu([output_sentence[:-1]], decoded_sentence[:-1])


    if iter % 5000 == 0:
      print('epoch: {}, iter: {}, loss: {:.6f},  bleu_score: {:.6f}'.format(epoch, iter, loss.item() / target_length, bleu_score))

epoch: 0, iter: 5000, loss: 2.919885,  bleu_score: 0.000000
epoch: 0, iter: 10000, loss: 1.924574,  bleu_score: 0.000000
epoch: 0, iter: 15000, loss: 1.634818,  bleu_score: 0.000000
epoch: 0, iter: 20000, loss: 0.831388,  bleu_score: 0.000000
epoch: 0, iter: 25000, loss: 0.836614,  bleu_score: 0.000000
epoch: 0, iter: 30000, loss: 4.054732,  bleu_score: 0.000000
epoch: 0, iter: 35000, loss: 0.439615,  bleu_score: 0.488923
epoch: 0, iter: 40000, loss: 2.425239,  bleu_score: 0.000000
epoch: 0, iter: 45000, loss: 0.739899,  bleu_score: 0.381417
epoch: 0, iter: 50000, loss: 0.707865,  bleu_score: 0.000000
epoch: 0, iter: 55000, loss: 0.413419,  bleu_score: 0.000000
epoch: 1, iter: 5000, loss: 2.120848,  bleu_score: 0.000000
epoch: 1, iter: 10000, loss: 0.073630,  bleu_score: 1.000000
epoch: 1, iter: 15000, loss: 0.395682,  bleu_score: 1.000000
epoch: 1, iter: 20000, loss: 0.057244,  bleu_score: 1.000000


### **4) Evaluate our model**

Evaluation is mostly the same as training, but there are no targets so we simply feed the decoder's predictions back to itself for each step (**NO TEACHER FORCING**) Every time it predicts a word we add it to the output string, and if it predicts the EOS token we stop there. We also store the decoder’s attention outputs for display later.

In [None]:
## > input sentence
## = true output sentence
## < predicted sentence

def evaluate(encoder, decoder, sentence, output_sentence, max_length=max_length):
    with torch.no_grad():
        input_tensor = tensorFromSentence(input_lang, sentence)
        input_length = input_tensor.size()[0]

        output_tensor = tensorFromSentence(output_lang, output_sentence)
        output_sent = [output_lang.index2word[t.item()] for t in output_tensor]
        encoder_hidden = encoder.initHidden()

        encoder_outputs = torch.zeros(max_length, encoder.hidden_size, device=device)

        for ei in range(input_length):
            encoder_output, encoder_hidden = encoder(input_tensor[ei], encoder_hidden)
            encoder_outputs[ei] += encoder_output[0, 0]

        decoder_input = torch.tensor([[SOS_token]], device=device)  # SOS

        decoder_hidden = encoder_hidden

        decoded_words = []
        decoder_attentions = torch.zeros(max_length, max_length)

        for di in range(max_length):
            decoder_output, decoder_hidden, decoder_attention = decoder(
                decoder_input, decoder_hidden, encoder_outputs)
            decoder_attentions[di] = decoder_attention.data
            topv, topi = decoder_output.data.topk(1)
            if topi.item() == EOS_token:
                decoded_words.append('<EOS>')
                break
            else:
                decoded_words.append(output_lang.index2word[topi.item()])

            decoder_input = topi.squeeze().detach()

        bleu_score = sentence_bleu([output_sent[:-1]], decoded_words[:-1])
        return decoded_words, decoder_attentions[:di + 1], bleu_score

def evaluateRandomly(encoder, decoder, n=10):
    for i in range(n):
        pair = random.choice(pairs)
        print('>', pair[0])
        print('=', pair[1])
        output_words, attentions, bleu_score = evaluate(encoder, decoder, pair[0], pair[1])
        output_sentence = ' '.join(output_words)
        print('<', output_sentence)
        print('BLEU score: {:.6f}'.format(bleu_score))
        print('')

evaluateRandomly(encoder, decoder)

In [None]:
def evaluate_attn(encoder, decoder, sentence, max_length=max_length):
    with torch.no_grad():
        input_tensor = tensorFromSentence(input_lang, sentence)
        input_sentence = [input_lang.index2word[t.item()] for t in input_tensor]
        input_length = input_tensor.size()[0]

        encoder_hidden = encoder.initHidden()

        encoder_outputs = torch.zeros(max_length, encoder.hidden_size, device=device)

        for ei in range(input_length):
            encoder_output, encoder_hidden = encoder(input_tensor[ei], encoder_hidden)
            encoder_outputs[ei] += encoder_output[0, 0]

        decoder_input = torch.tensor([[SOS_token]], device=device)  # SOS

        decoder_hidden = encoder_hidden

        decoded_words = []
        decoder_attentions = torch.zeros(max_length, max_length)

        for di in range(max_length):
            decoder_output, decoder_hidden, decoder_attention = decoder(
                decoder_input, decoder_hidden, encoder_outputs)
            decoder_attentions[di] = decoder_attention.data
            topv, topi = decoder_output.data.topk(1)
            if topi.item() == EOS_token:
                decoded_words.append('<EOS>')
                break
            else:
                decoded_words.append(output_lang.index2word[topi.item()])

            decoder_input = topi.squeeze().detach()

        return decoded_words, decoder_attentions[:di + 1]

output_words, attentions = evaluate_attn(encoder, decoder, "tu as parfaitement raison .")
print(output_words)
plt.matshow(attentions.numpy())
plt.show()

In [None]:
def showAttention(input_sentence, output_words, attentions):
    # Set up figure with colorbar
    fig = plt.figure()
    ax = fig.add_subplot(111)
    cax = ax.matshow(attentions.numpy(), cmap='bone')
    fig.colorbar(cax)

    # Set up axes
    ax.set_xticklabels([''] + input_sentence.split(' ') +
                       ['<EOS>'], rotation=90)
    ax.set_yticklabels([''] + output_words)

    # Show label at every tick
    ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
    ax.yaxis.set_major_locator(ticker.MultipleLocator(1))

    plt.show()


def evaluateAndShowAttention(input_sentence):
    output_words, attentions = evaluate_attn(
        encoder, decoder, input_sentence)
    print('input =', input_sentence)
    print('output =', ' '.join(output_words))
    showAttention(input_sentence, output_words, attentions)


evaluateAndShowAttention("elle a cinq ans de moins que moi .")

print()

evaluateAndShowAttention("elle est trop petit .")

print()

evaluateAndShowAttention("je ne crains pas de mourir .")

print()

evaluateAndShowAttention("c est un jeune directeur plein de talent .")



If you train your model more changing hyper parameters, you will get a decent result like below.

![picture](https://drive.google.com/uc?id=1cfGnVRdHeCHtLgvUwKo3xMxU5GJ-5fmx)