# CS 584 Assignment 4 -- Sequence to Sequence Models

#### Name: Varun Reddy Doddipalli

## In this assignment, you are required to follow the steps below:
1. Review the lecture slides.
2. Implement a Seq2Seq model.

*** Please note that there are many online resources for Seq2Seq models, you are allowed to check them, but DO NOT DIRECTLY COPY from them. Otherwise, you will get ZERO for this assignment. ***

In [1]:
import urllib.request
urllib.request.urlretrieve('https://www.statmt.org/europarl/v7/es-en.tgz','es-en.tgz')

('es-en.tgz', <http.client.HTTPMessage at 0x7f459524ad10>)

In [2]:
import tarfile
my_tar = tarfile.open('es-en.tgz')
my_tar.extractall('./data')
my_tar.close()

In [3]:
%pip install numpy scikit-learn tqdm matplotlib
%pip install -U spacy
%python -m spacy download en_core_web_sm
%python -m spacy download es_core_news_sm

Collecting spacy
  Downloading spacy-3.2.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.0 MB)
[K     |████████████████████████████████| 6.0 MB 4.3 MB/s 
Collecting pathy>=0.3.5
  Downloading pathy-0.6.1-py3-none-any.whl (42 kB)
[K     |████████████████████████████████| 42 kB 1.3 MB/s 
Collecting langcodes<4.0.0,>=3.2.0
  Downloading langcodes-3.3.0-py3-none-any.whl (181 kB)
[K     |████████████████████████████████| 181 kB 59.5 MB/s 
[?25hCollecting spacy-loggers<2.0.0,>=1.0.0
  Downloading spacy_loggers-1.0.2-py3-none-any.whl (7.2 kB)
Collecting typing-extensions<4.0.0.0,>=3.7.4
  Downloading typing_extensions-3.10.0.2-py3-none-any.whl (26 kB)
Collecting srsly<3.0.0,>=2.4.1
  Downloading srsly-2.4.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (457 kB)
[K     |████████████████████████████████| 457 kB 36.9 MB/s 
[?25hCollecting spacy-legacy<3.1.0,>=3.0.8
  Downloading spacy_legacy-3.0.9-py2.py3-none-any.whl (20 kB)
Collecting thinc<8.1.0,>=8.0.12
  Dow

# 1. Data Process
In this section, you are required to 
1. Divide the data into train, validation, and test.
2. Preprocess the text data

**Note:** The default dataset is for Spanish-English. Feel free to change other pairs of languages, you can find them [here](http://www.statmt.org/europarl/).

In [1]:
import tensorflow as tf
from tensorflow.keras.layers import Input, LSTM, Embedding, Dense, SimpleRNN, GRU, Layer, Bidirectional, AdditiveAttention, Dot, Softmax, Concatenate
from tensorflow.keras.layers import Conv1D, GlobalAveragePooling1D
from tensorflow.keras.models import Model
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.utils import plot_model, to_categorical, Sequence
from tensorflow.keras.callbacks import ModelCheckpoint

tf.config.run_functions_eagerly(True)
# tf.data.experimental.enable_debug_mode()

In [2]:
import numpy as np
from tqdm.notebook import tqdm

def load_data(filepath):
    texts = []
    with open(filepath, 'r', encoding='utf-8') as f:
        count = len(f.readlines())
        
    with open(filepath, 'r', encoding='utf-8') as f:
        for line in tqdm(f, total=count):
            texts.append(line.strip())
            
    return np.array(texts[:20000])

sources = load_data('data/europarl-v7.es-en.en')
targets = load_data('data/europarl-v7.es-en.es')

print('Englist sentences size:', sources.shape)
print('Spanish sentences size:', targets.shape)

  0%|          | 0/1965734 [00:00<?, ?it/s]

  0%|          | 0/1965734 [00:00<?, ?it/s]

Englist sentences size: (20000,)
Spanish sentences size: (20000,)


## 1.1 Divide the data into train, validation, and test

In [3]:
from sklearn.model_selection import train_test_split

source_train, source_test, target_train, target_test = train_test_split(sources, targets, test_size=0.2)
source_train, source_valid, target_train, target_valid = train_test_split(source_train, target_train, test_size=0.2)

train_texts = [(sent_source, sent_target) for sent_source, sent_target in zip(source_train, target_train)]
valid_texts = [(sent_source, sent_target) for sent_source, sent_target in zip(source_valid, target_valid)]
test_texts = [(sent_source, sent_target) for sent_source, sent_target in zip(source_test, target_test)]

print('The size of training set:', len(train_texts))
print('The size of valid set:', len(valid_texts))
print('The size of test set:', len(test_texts))

The size of training set: 12800
The size of valid set: 3200
The size of test set: 4000


## 1.2 Data preprocessing
1. Lowercase the text
2. Tokenize the text
3. Create vocabulary for the source language and target language, respectively.

In [4]:
import re
import string
import spacy
from itertools import chain
from unicodedata import normalize, category

nlp_en = spacy.load('en_core_web_sm')
nlp_es = spacy.load('es_core_news_sm')


def tokenize_en(text):
    # text = re.sub(r'’', '\'', text) # some of texts contain invalid apostrophes, such as ’
    # text = normalize('NFD', text).encode('ascii', 'ignore').decode('UTF-8')
    # text = re.sub(r'([.!.?])', r' \1', text)
    # text = re.sub(r'[^a-zA-Z-]', r' ', text)
    # tokens = text.lower().strip().split()
    doc = nlp_en(text.lower())
    return [token.text for token in doc]
    # return tokens

def tokenize_es(text):
    doc = nlp_es(text.lower())
    return [token.text for token in doc]

def preprocess(texts, type='en'):
    if type == 'en':
        return [tokenize_en(text) for text in tqdm(texts, desc='en')]
    elif type == 'es':
        return [tokenize_es(text) for text in tqdm(texts, desc='es')]

def generate_vocab(tokenized_texts):
    # Flatten the input to 1d array, Thanks for: 
    # https://stackoverflow.com/questions/29244286/how-to-flatten-a-2d-list-to-1d-without-using-numpy
    vocab = list(chain.from_iterable(tokenized_texts))
    return list(set(vocab))

source_tokens = preprocess(source_train, 'en')
target_tokens = preprocess(target_train, 'es')

source_vocab = generate_vocab(source_tokens)
source_vocab.insert(0, '<unk>')
source_token2idx = {token: idx for idx, token in enumerate(source_vocab)}

target_vocab = generate_vocab(target_tokens)
target_vocab.insert(0, '<end>')
target_vocab.insert(0, '<start>')
target_vocab.insert(0, '<unk>')
target_token2idx = {token: idx for idx, token in enumerate(target_vocab)}

print('English vocab size:', len(source_vocab))
print('Spanish vocab size:', len(target_vocab))

en:   0%|          | 0/12800 [00:00<?, ?it/s]

es:   0%|          | 0/12800 [00:00<?, ?it/s]

English vocab size: 12473
Spanish vocab size: 19753


In [5]:
train_set = [([source_token2idx[t] for t in tokenize_en(source)], [target_token2idx[t] for t in ['<start>']+tokenize_es(target)+['<end>']]) for source, target in tqdm(train_texts, total=len(train_texts), desc='Generating training data\t')]
valid_set = [([source_token2idx[t] if t in source_token2idx else 0 for t in tokenize_en(source)], [target_token2idx[t] if t in target_token2idx else 0 for t in ['<start>']+tokenize_es(target)+['<end>']]) for source, target in tqdm(valid_texts, total=len(valid_texts), desc='Generating validation data\t')]
test_set = [([source_token2idx[t] if t in source_token2idx else 0 for t in tokenize_en(source)], [target_token2idx[t] if t in target_token2idx else 0 for t in ['<start>']+tokenize_es(target)+['<end>']]) for source, target in tqdm(test_texts, total=len(test_texts), desc='Generating testing data\t')]

Generating training data	:   0%|          | 0/12800 [00:00<?, ?it/s]

Generating validation data	:   0%|          | 0/3200 [00:00<?, ?it/s]

Generating testing data	:   0%|          | 0/4000 [00:00<?, ?it/s]

In [19]:
from pickle import dump

train_file = open('train_set.pkl','wb')
dump(train_set, train_file)
train_file.close()

valid_file = open('valid_set.pkl', 'wb')
dump(valid_set, valid_file)
valid_file.close()

test_file = open('test_set.pkl','wb')
dump(test_set, test_file)
test_file.close()

In [22]:
from pickle import load

train_file = open('train_set.pkl','rb')
train_set = load(train_file)
train_file.close()

valid_file = open('valid_set.pkl', 'rb')
valid_set = load(valid_file)
valid_file.close()

test_file = open('test_set.pkl', 'rb')
test_set = load(test_file)
test_file.close()

In [23]:
max_source_len = 0
max_target_len = 0

for i in range(len(train_set)):    
    max_source_len = max(max_source_len, len(train_set[i][0]))
    max_target_len = max(max_target_len, len(train_set[i][1]))

print("Maximum length of input sentence is:", max_source_len)
print("Maximum length of target sentence is:", max_target_len)

Maximum length of input sentence is: 173
Maximum length of target sentence is: 173


In [24]:
source_sentence_window = 200
target_sentence_window = 200
embedding_dim = 128
cell_units = 50
batch_size = 32
source_vocab_size = len(source_vocab)
target_vocab_size = len(target_vocab)
epochs = 10

In [25]:
class DataGenerator(Sequence):
    def __init__(self, data, batch_size, source_window, target_window, target_vocab_size):
        self.data = data
        self.batch_size = batch_size
        self.source_window = source_window
        self.target_window = target_window
        self.vocab_size = target_vocab_size

    def __len__(self):
        return int(np.floor(len(self.data)/self.batch_size))
        
    def __getitem__(self, index):
        batch_data = self.data[index*self.batch_size: (index+1)*self.batch_size]

        source_input = pad_sequences([x[0] for x in batch_data], maxlen=self.source_window, padding='post' ,dtype='int32')
        target_input = pad_sequences([x[1] for x in batch_data], maxlen=self.target_window, padding='post' ,dtype='int32')
        y_batch = np.hstack((target_input[:,1:], np.zeros((batch_size, 1))))
        # y_batch = np.zeros((len(batch_data), self.target_window, self.vocab_size), dtype='float16')

        # for i in range(len(batch_data)):
        #     y_batch[i] = pad_sequences(
        #         to_categorical(batch_data[i][1][1:], self.vocab_size, dtype='float16').T
        #         , self.target_window, padding='post', dtype='float16').T
                
        return (source_input, target_input), y_batch

In [26]:
train_gen = DataGenerator(train_set, batch_size, target_sentence_window, source_sentence_window,len(target_vocab))
valid_gen = DataGenerator(valid_set, batch_size, target_sentence_window, source_sentence_window, len(target_vocab))
test_gen = DataGenerator(test_set, batch_size, target_sentence_window, source_sentence_window, len(target_vocab))

In [27]:
(sample_source_input, sample_target_input), sample_decoder_output = train_gen[0]
print(f"Sample encoder input shape (batch_size, source_padded_sentence): {sample_source_input.shape}")
print(f"Sample decoder input shape (batch_size, target_padded_sentence): {sample_target_input.shape}")
print(f"Sample decoder output shape (batchsize, target_padded_sentence, target_vocab_size): {sample_decoder_output.shape}")

Sample encoder input shape (batch_size, source_padded_sentence): (32, 200)
Sample decoder input shape (batch_size, target_padded_sentence): (32, 200)
Sample decoder output shape (batchsize, target_padded_sentence, target_vocab_size): (32, 200)


## 2. Seq2Seq Model (50 points)
In this section, you are required to 
1. Implement a seq2seq model which includes a CNN encoder and an RNN based decoder.  
2. For each RNN cell, try a simple RNN, LSTM, and GRU.  
3. Use the validation set to compute the BLEU score and report the best choice.

**Note:** you could use any package you are comfortable with, such as PyTorch and TensorFlow.



### 2.1 Encoder (25 points)

Try a simple RNN, LSTM, and GRU

In [28]:
class Encoder(Layer):
    def __init__(self, vocab_size, embedding_dim, rnn_units, cell_type='LSTM', return_sequences = True, bidirectional=True, **kwargs):
        super(Encoder, self).__init__(**kwargs)
        if cell_type not in ['RNN', 'LSTM', 'GRU']: raise ValueError('There is no cell type %s'%cell_type) 
        else: self.cell_type = cell_type
        self.bidirectional = bidirectional

        self.embedding_layer = Embedding(vocab_size, embedding_dim, mask_zero=True, name='EncoderEmbedding')
        if cell_type == 'LSTM': self.rnn_layer1 = LSTM(rnn_units, return_state=True, return_sequences=return_sequences, name='EncoderLSTM1')
        elif cell_type == 'RNN': self.rnn_layer1 = SimpleRNN(rnn_units, return_state=True, return_sequences=return_sequences, name='EncoderRNN1')
        else: self.rnn_layer1 = GRU(rnn_units, return_state=True, return_sequences=return_sequences, name='EncoderGRU1')
        if bidirectional: self.rnn_layer1 = Bidirectional(self.rnn_layer1, name='EncoderBidirectional')
    
    def call(self, inputs, state=None):
        return self.get_stacked_layers(inputs, state)

    def get_stacked_layers(self, inputs, state=None):
        x = self.embedding_layer(inputs)
        if not self.bidirectional:
            if self.cell_type=='LSTM': 
                x, state_h, state_c = self.rnn_layer1(x, initial_state=state)
                hidden_state1 = [state_h, state_c]
            else:
                x, hidden_state1 = self.rnn_layer1(x, initial_state=state)
        else:
            if self.cell_type == 'LSTM':
                x, h_forward, c_forward, h_backward, c_backward = self.rnn_layer1(x, initial_state=state)
                hidden_state1 = [Concatenate()([h_forward, h_backward]), Concatenate()([c_forward, c_backward])]
            else:
                x, h_forward, h_backward = self.rnn_layer1(x, initial_state=state)
                hidden_state1 = Concatenate()([h_forward, h_backward])
        return x, hidden_state1

In [29]:
encoder = Encoder(source_vocab_size, embedding_dim, 10, return_sequences=True, bidirectional=False)
sample_encoder_output, sample_encoder_hidden_state = encoder(sample_source_input)

print(f"Sample Encoder output shape {sample_encoder_output.shape}")
print(f"Shape of cell memory gate {sample_encoder_hidden_state[0].shape}")
print(f"Shape of cell forget gate {sample_encoder_hidden_state[1].shape}")


Sample Encoder output shape (32, 200, 10)
Shape of cell memory gate (32, 10)
Shape of cell forget gate (32, 10)


### 2.2 CNN Encoder (Extra Credit)

A traditional encoder of Seq2Seq models is an RNN-based model, such as RNN, LSTM, or GRU. But CNNs can also be encoders. In this sub-section, you need to implement a CNN encoder for your Seq2Seq model.

**Hint:** We already learned how CNN can be applied to sentence classification tasks. You can simply drop the last layer of TextCNN and output the context vector. Please check this [paper](https://arxiv.org/abs/1510.03820) for details.

In [30]:
class CNNEncoder(Layer):
    def __init__(self, vocab_size, embedding_dim, filter, decoder_units, decoder_cell, **kwargs):
        super(CNNEncoder, self).__init__(**kwargs)
        self.decoder_cell = decoder_cell

        self.embedding_layer = Embedding(vocab_size, embedding_dim, mask_zero=True, name='EncoderEmbedding')
        self.conv_layer1 = Conv1D(filters=filter, kernel_size=3, strides=1, activation='relu')
        self.conv_layer2 = Conv1D(filters=filter, kernel_size=4, strides=1, activation='relu')
        self.conv_layer3 = Conv1D(filters=filter, kernel_size=5, strides=2, activation='relu')
        self.conv_layer4 = Conv1D(filters=filter, kernel_size=6, strides=2, activation='relu')
        self.h_layer = Dense(decoder_units, activation='tanh')
        if decoder_cell == 'LSTM': self.c_layer = Dense(decoder_units, activation='tanh')
    
    def call(self, inputs:Input):
        return self.get_stacked_layers(inputs)

    def get_stacked_layers(self, inputs: Input):
        embeddings = self.embedding_layer(inputs)
        conv_output1 = self.conv_layer1(embeddings)
        conv_output1 = GlobalAveragePooling1D()(conv_output1)
        conv_output2 = self.conv_layer2(embeddings)
        conv_output2 = GlobalAveragePooling1D()(conv_output2)
        conv_output3 = self.conv_layer3(embeddings)
        conv_output3 = GlobalAveragePooling1D()(conv_output3)
        conv_output4 = self.conv_layer4(embeddings)
        conv_output4 = GlobalAveragePooling1D()(conv_output4)

        concat_output = Concatenate()([conv_output1, conv_output2, conv_output3, conv_output4])
        h_state = self.h_layer(concat_output)
        if self.decoder_cell == 'LSTM': h_state = [h_state, self.c_layer(concat_output)]
        return None, h_state


In [31]:
cnn_encoder = CNNEncoder(source_vocab_size, embedding_dim, 24, 10, 'LSTM')
_, hidden_state = cnn_encoder(sample_source_input)
print(hidden_state[0].shape, hidden_state[1].shape)

(32, 10) (32, 10)


### 2.3 Decoder (25 points)
Try a simple RNN, LSTM, and GRU

In [32]:
class Attention(Layer):
    def __init__(self, units, **kwargs):
        super(Attention, self).__init__(**kwargs)
        
        self.query_dense_layer = Dense(units, use_bias=False, name='QueryDense')
        self.value_dense_layer = Dense(units, use_bias=False, name='ValueDense')

        self.attention_layer = AdditiveAttention(name='AdditiveAttention')
        
    def call(self, query, value, encoder_mask):
        return self.get_stacked_layers(query, value, encoder_mask)
    
    def get_stacked_layers(self, query, value, mask):
        query_dense = self.query_dense_layer(query)
        value_dense = self.value_dense_layer(value)

        context_vector, attention_weights = self.attention_layer(
            inputs = [query_dense, value, value_dense],
            mask = mask,
            return_attention_scores = True
        )
        
        return context_vector, attention_weights

In [33]:
class Decoder(Layer):
    def __init__ (self,vocab_size, embedding_dim, units, cell_type='LSTM', attention_bool=True, **kwargs):
        super(Decoder, self).__init__(**kwargs)
        if cell_type not in ['RNN', 'LSTM', 'GRU']: raise ValueError('There is no cell type %s'%cell_type) 
        else: self.cell_type = cell_type
        self.attention_bool = attention_bool
        
        self.embedding_layer = Embedding(vocab_size, embedding_dim, mask_zero=True, name='DecoderEmbedding')
        if cell_type == 'LSTM': self.rnn_layer1 = LSTM(units, return_state=True, return_sequences=True, name='DecoderLSTM1')
        elif cell_type == 'RNN': self.rnn_layer1 = SimpleRNN(units, return_state=True, return_sequences=True, name='DecoderRNN1')
        else: self.rnn_layer1 = GRU(units, return_state=True, return_sequences=True, name='DecoderGRU1')
        if attention_bool: self.attention_layer = Attention(units)
        self.Wc = Dense(units, activation='tanh', use_bias=False)

        self.concatenate_layer = Concatenate(axis=-1)
        self.dense_layer = Dense(vocab_size, activation='softmax')
        
    def call(self, inputs, state, encoded_output=None, encoder_mask=None):
        return self.get_stacked_layers(inputs, state, encoded_output, encoder_mask)

    def get_stacked_layers(self, inputs, state, encoded_output=None, encoder_mask=None):
        x = self.embedding_layer(inputs)
        if self.cell_type == 'LSTM':
            x, state_h, state_c = self.rnn_layer1(x, initial_state=state)
            hidden_state1 = [state_h, state_c]
        else: x,hidden_state1 = self.rnn_layer1(x, initial_state=state)

        score = None
        if self.attention_bool:
            encoded_output._keras_mask = inputs!=0 
            context, score = self.attention_layer(x, encoded_output, [inputs!=0,encoder_mask])
            x = self.concatenate_layer([x,context])
            x = self.Wc(x)
        x = self.dense_layer(x)
        
        return x, hidden_state1, score

In [34]:
decoder = Decoder(target_vocab_size, embedding_dim, 10)

sample_one_hot, sample_hidden, sample_score = decoder(sample_target_input, sample_encoder_hidden_state, sample_encoder_output)

print(f"Sample decoder output shape {sample_one_hot.shape}")
print(f"Sample score shape: {sample_score.shape}")
print(f"Sample hidden memory gate shape: {sample_hidden[0].shape}")
print(f"Sample hidden forget gate shape: {sample_hidden[1].shape}")

Sample decoder output shape (32, 200, 19753)
Sample score shape: (32, 200, 200)
Sample hidden memory gate shape: (32, 10)
Sample hidden forget gate shape: (32, 10)


## 3. Training your Seq2Seq (30 points)
1. Implement training process
2. Use validation set to calculate BLEU score
3. Plot your training loss and validation loss (You could use the code in assignment 1)

In [35]:
def build_seq2seq(name:string, source_window:int, target_window:int, source_vocab_size:int, target_vocab_size:int, embedding_dim:int, encoder_type:string, cell_type:string, cell_units:int, bidirectional=True, attention=True, filter_size=None, plot_model_to_img=True):

    # Building Encoder Model
    encoder_input = Input(shape=(source_window, ))
    if encoder_type == 'RNN':
        encoder_layer = Encoder(source_vocab_size, embedding_dim, cell_units, cell_type, return_sequences=attention, bidirectional = bidirectional)
        
    elif encoder_type == 'CNN':
        encoder_layer = CNNEncoder(source_vocab_size, embedding_dim, filter_size, cell_units, cell_type)
    else:
        raise ValueError(f'No Encoder type of {encoder_type}. Choose from RNN or CNN.')
    
    encoder_output, hidden_state = encoder_layer(encoder_input)
    
    
    
    # Buidling Decoder Model
    decoder_input = Input(shape=(target_window, ))
    if encoder_type == 'CNN':
        decoder_layer = Decoder(target_vocab_size, embedding_dim, cell_units, cell_type, attention_bool=False)  
    else:
        decoder_layer = Decoder(target_vocab_size, embedding_dim, (bidirectional+1)*cell_units, cell_type, attention)

    if not attention: encoder_output, encoder_mask = None, None
    else: encoder_mask = encoder_input!=0

    decoder_output, decoder_hidden_state, _ = decoder_layer(decoder_input, hidden_state, encoder_output, encoder_mask)
    
    def get_model_plot():
        pic_encoder_output, pic_hidden_state = encoder_layer.get_stacked_layers(encoder_input)
        if not attention: pic_decoder_output = None
        pic_decoder_output, _, _ = decoder_layer.get_stacked_layers(decoder_input, pic_hidden_state, pic_encoder_output)
        plot_model(Model([encoder_input, decoder_input], pic_decoder_output, name=name), '%s.png'%name, show_shapes=True, show_layer_names=True,)
    if plot_model_to_img: get_model_plot()

    return Model([encoder_input, decoder_input], decoder_output, name=name)

In [36]:
GRUseq2seq = build_seq2seq("GRUSeq2Seq", source_sentence_window, target_sentence_window, source_vocab_size, target_vocab_size, embedding_dim, 'RNN', 'GRU', cell_units, bidirectional=True, attention=False)

In [37]:
CNNseq2seq = build_seq2seq("CNNSeq2Seq", source_sentence_window, target_sentence_window, source_vocab_size, target_vocab_size, embedding_dim, 'CNN', 'LSTM', cell_units, bidirectional=False, attention=False, filter_size=24)

In [36]:
LSTMseq2seqAtt = build_seq2seq("LSTMSeq2SeqAtt", source_sentence_window, target_sentence_window, source_vocab_size, target_vocab_size, embedding_dim, 'RNN', 'LSTM', cell_units, bidirectional=True, attention=True)

In [38]:
class MaskedLoss(tf.keras.losses.SparseCategoricalCrossentropy):
    def __init__(self):
        super(MaskedLoss, self).__init__(reduction='none', from_logits=True)
        self.name = 'masked_loss'

    def __call__(self, y_true, y_pred):
        loss = self.loss(y_true, y_pred)
        mask = tf.cast(y_true != 0, tf.float32)
        loss *= mask
        return tf.reduce_mean(loss)
loss_fn = MaskedLoss()

scc_loss = tf.keras.losses.SparseCategoricalCrossentropy(reduction='none', from_logits=True)
def calculate_loss(y_true, y_pred):
    mask = tf.cast(y_true!=0, tf.float32)
    loss_ = scc_loss(y_true, y_pred)
    loss_ *= mask
    return tf.reduce_sum(loss)

optimizer = tf.keras.optimizers.Adam()

In [29]:
del LSTMseq2seqAtt

In [40]:
LSTMseq2seqAtt.compile(optimizer=optimizer, loss=MaskedLoss())
LSTMseq2seqAtt.fit(train_gen, epochs=10, sample_weight=None, verbose=1, validation_data=valid_gen, callbacks=[ModelCheckpoint('seq2seqAttention', save_best_only=True)])

Epoch 1/10


TypeError: __call__() got an unexpected keyword argument 'sample_weight'

In [None]:
class MaskedLoss(tf.keras.losses.Loss):
  def __init__(self):
    self.name = 'masked_loss'
    self.loss = tf.keras.losses.SparseCategoricalCrossentropy()

  def __call__(self, y_true, y_pred):
    loss = self.loss(y_true, y_pred)
    mask = tf.cast(y_true != 0, tf.float32)
    loss *= mask
    return tf.reduce_sum(loss)
loss_fn = MaskedLoss()
optimizer = tf.keras.optimizers.Adam()

In [None]:
sample_model = Model([encoder_input, decoder_input], decoder_output)
sample_model.compile(optimizer=optimizer, loss=tf.keras.losses.CategoricalCrossentropy())

In [None]:
history = sample_model.fit(train_gen, epochs=epochs, verbose=1, validation_data=valid_gen, callbacks=[ModelCheckpoint('checkpoint', save_best_only=True)])



Epoch 1/1000

KeyboardInterrupt: 

## 4. Evaluation (20 points)
1. Calculate the BLEU scores of the best seq2seq model obtained from section 2&3 on the test set.
2. Select 20 test examples, For each example, print the translation result of your model along with the ground truth

### 4.1 Calculate the BLEU score on test set. (10 points)

### 4.2 Translate 20 test exmaples. (10 points)