# Seq2Seq models (Sequence-to-Sequence)

Sequence to sequence models are a variant of deep learning models that consists of an encoder and a decoder. They are used for problems that map an abitrarily long sequence to another arbitrarliy long sequence. For example, in machine translation, you convert a sequence of words in a source language to a sequence of words in a target language. Here we will see how we can use a seq2seq model to solve a machine translation task to convert English to German.


<table align="left">
    <td>
        <a target="_blank" href="https://colab.research.google.com/github/thushv89/manning_tf2_in_action/blob/master/Ch11/11.1_Seq2seq_machine_translation.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
    </td>
</table>



In [1]:
import tensorflow as tf
print(tf.__version__)
import numpy as np
import time

def fix_random_seed(seed):
    """ Setting the random seed of various libraries """
    try:
        np.random.seed(seed)
    except NameError:
        print("Warning: Numpy is not imported. Setting the seed for Numpy failed.")
    try:
        tf.random.set_seed(seed)
    except NameError:
        print("Warning: TensorFlow is not imported. Setting the seed for TensorFlow failed.")
    try:
        random.seed(seed)
    except NameError:
        print("Warning: random module is not imported. Setting the seed for random failed.")
 
# Fixing the random seed
random_seed=4321
fix_random_seed(random_seed)


2.4.1


http://www.manythings.org/anki/
    
german-english

In [2]:
# Not setting this led to the following error
# _Derived_]RecvAsync is cancelled.   
# [[{{node gradient_tape/model_1/embedding_1/embedding_lookup/Reshape/_172}}]] [Op:__inference_train_function_31985]

%env TF_FORCE_GPU_ALLOW_GROWTH=true

env: TF_FORCE_GPU_ALLOW_GROWTH=true


## Loading the data (Requires manual download)

Unfortunately, this dataset **must be manually downloaded** by clicking [this link](http://www.manythings.org/anki/deu-eng.zip). Then place the downloaded `deu-eng.zip` file in the `Ch11/data` folder before running the cell below.


In [3]:
import os
import requests
import zipfile

# Make sure the zip file has been downloaded
if not os.path.exists(os.path.join('data','deu-eng.zip')):
    raise FileNotFoundError(
        "Uh oh! Did you download the deu-eng.zip from http://www.manythings.org/anki/deu-eng.zip manually and place it in the Ch11/data folder?"
    )

else:
    if not os.path.exists(os.path.join('data', 'deu.txt')):
        with zipfile.ZipFile(os.path.join('data','deu-eng.zip'), 'r') as zip_ref:
            zip_ref.extractall('data')
    else:
        print("The extracted data already exists")

The extracted data already exists


## Reading the data

Data is in a single `.txt` file. It is a parallel corpus meaning there is a English sentence/phrase/paragraph and a corresponding German translation of it side-by-side. In the file, the source input and the translation are separated by a tab (i.e. tab-seperated file)

In [4]:
import pandas as pd

# Read the csv file
df = pd.read_csv(os.path.join('data', 'deu.txt'), delimiter='\t', header=None)
# Set column names
df.columns = ["EN", "DE", "Attribution"]
df = df[["EN", "DE"]]
print('df.shape = {}'.format(df.shape))

df.shape = (227080, 2)


In [5]:
df.head()

Unnamed: 0,EN,DE
0,Go.,Geh.
1,Hi.,Hallo!
2,Hi.,Grüß Gott!
3,Run!,Lauf!
4,Run.,Lauf!


## Use a smaller sample for computational speed

In [6]:
df = df.sample(n=50000, random_state=random_seed)

## Introducing the `[SOS]` and `[EOS]` tokens

We will add these special tokens to the translated targets. `[SOS]` indicates the start of the sentence and `[EOS]` marks the end of the sentence.

E.g. `Grüß Gott!` becomes `[SOS] Grüß Gott! [EOS]`

In [7]:
df["DE"] = '[SOS] ' + df["DE"] + ' [EOS]'

## Splitting training/validation/testing data

In [8]:
# Randomly sample 5000 examples from the total 50000 randomly
test_df = df.sample(n=5000, random_state=random_seed)
# Randomly sample 5000 examples from the total 50000 randomly
valid_df = df.loc[~df.index.isin(test_df.index)].sample(n=5000, random_state=random_seed)
# Assign the rest to training data
train_df = df.loc[~(df.index.isin(test_df.index) | df.index.isin(valid_df.index))]

print('test_df.shape = {}'.format(test_df.shape))
print('valid_df.shape = {}'.format(valid_df.shape))
print('train_df.shape = {}'.format(train_df.shape))

test_df.shape = (5000, 2)
valid_df.shape = (5000, 2)
train_df.shape = (40000, 2)


## Analysing the vocabulary sizes (English and German)

In [9]:
from collections import Counter

# Create a flattened list from English words
en_words = train_df["EN"].str.split().sum()
# Create a flattened list of German words
de_words = train_df["DE"].str.split().sum()

# Get the vocabulary size of words appearing more than or equal to 10 times
n=10

def get_vocabulary_size_greater_than(words, n, verbose=True):
    
    """ Get the vocabulary size above a certain threshold """
    
    # Generate a counter object i.e. dict word -> frequency
    counter = Counter(words)
    
    # Create a pandas series from the counter, then sort most frequent to least
    freq_df = pd.Series(list(counter.values()), index=list(counter.keys())).sort_values(ascending=False)
    
    if verbose:
        # Print most common words
        print(freq_df.head(n=10))

    # Count of words >= n frequent    
    n_vocab = (freq_df>=n).sum()
    
    if verbose:
        print("\nVocabulary size (>={} frequent): {}".format(n, n_vocab))
        
    return n_vocab

print("English corpus")
print('='*50)
en_vocab = get_vocabulary_size_greater_than(en_words, n)

print("\nGerman corpus")
print('='*50)
de_vocab = get_vocabulary_size_greater_than(de_words, n)

English corpus
Tom    9427
to     8673
I      8436
the    6999
you    6125
a      5680
is     4374
in     2664
of     2613
was    2298
dtype: int64

Vocabulary size (>=10 frequent): 2238

German corpus
[SOS]    40000
[EOS]    40000
Tom       9928
Ich       7749
ist       4753
nicht     4414
zu        3583
Sie       3465
du        3112
das       2909
dtype: int64

Vocabulary size (>=10 frequent): 2497


## Analysing the sequence length (English and German)

In [38]:
def print_sequence_length(str_ser):
    
    """ Print the summary stats of the sequence length """
    
    # Create a pd.Series, which contain the sequence length for each review
    seq_length_ser = str_ser.str.split(' ').str.len()

    # Get the median as well as summary statistics of the sequence length
    print("\nSome summary statistics")
    print("Median length: {}\n".format(seq_length_ser.median()))
    print(seq_length_ser.describe())
    
    # Get the quantiles at given marks
    print("\nComputing the statistics between the 1% and 99% quantiles (to ignore outliers)")
    p_01 = seq_length_ser.quantile(0.01)
    p_99 = seq_length_ser.quantile(0.99)
    
    # Print the summary stats of the data between the defined quantlies
    print(seq_length_ser[(seq_length_ser >= p_01) & (seq_length_ser < p_99)].describe())

print("English corpus")
print('='*50)
print_sequence_length(train_df["EN"])

print("\nGerman corpus")
print('='*50)
print_sequence_length(train_df["DE"])

English corpus

Some summary statistics
Median length: 6.0

count    40000.000000
mean         6.360650
std          2.667726
min          1.000000
25%          5.000000
50%          6.000000
75%          8.000000
max        101.000000
Name: EN, dtype: float64

Computing the statistics between the 1% and 99% quantiles (to ignore outliers)
count    39504.000000
mean         6.228002
std          2.328172
min          2.000000
25%          5.000000
50%          6.000000
75%          8.000000
max         14.000000
Name: EN, dtype: float64

German corpus

Some summary statistics
Median length: 8.0

count    40000.000000
mean         8.397875
std          2.652027
min          3.000000
25%          7.000000
50%          8.000000
75%         10.000000
max         77.000000
Name: DE, dtype: float64

Computing the statistics between the 1% and 99% quantiles (to ignore outliers)
count    39166.000000
mean         8.299035
std          2.291474
min          5.000000
25%          7.000000
50%    

## Printing the vocabulary size and sequence length

In [40]:
print("EN vocabulary size: {}".format(en_vocab))
print("DE vocabulary size: {}".format(de_vocab))

# Define sequence lengths with some extra space for longer sequences
en_seq_length = 19
de_seq_length = 21

print("EN max sequence length: {}".format(en_seq_length))
print("DE max sequence length: {}".format(de_seq_length))

EN vocabulary size: 2238
DE vocabulary size: 2497
EN max sequence length: 19
DE max sequence length: 21


## TensorFlow `TextVectorization` layer

In [12]:
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization

print("Defined the vectorization layer for English")

# Create the text vectorization layer (English)
en_vectorize_layer = TextVectorization(
    max_tokens=en_vocab,
    output_mode='int',
    output_sequence_length=None
)

print("Fitting the EN vectorization layer on data")
# Here we are calling adapt to fit the vectorization layer with text
# so that it learns the vocabulary
en_vectorize_layer.adapt(np.array(train_df["EN"].tolist()).astype('str'))
print("\tDone")

print("\nDefined the vectorization layer for German")
# Create the text vectorization layer (German)
de_vectorize_layer = TextVectorization(
    max_tokens=de_vocab,
    output_mode='int',
    output_sequence_length=de_seq_length,
    pad_to_max_tokens=False
)

print("Fitting the DE vectorization layer on data")
de_vectorize_layer.adapt(np.array(train_df["DE"].tolist()).astype('str'))
print("\tDone")

Defined the vectorization layer for English
Fitting the EN vectorization layer on data
	Done

Defined the vectorization layer for German
Fitting the DE vectorization layer on data
	Done


## `TextVectorization` layer in action

### How to use the layer

In [42]:
import tensorflow.keras.backend as K
K.clear_session()

# Create the model that uses the vectorize text layer
toy_model = tf.keras.models.Sequential()

# Start by creating an explicit input layer. It needs to have a shape of
# (1,) (because we need to guarantee that there is exactly one string
# input per batch), and the dtype needs to be 'string'.
toy_model.add(tf.keras.Input(shape=(1,), dtype=tf.string))

# The first layer in our model is the vectorization layer. After this
# layer, we have a tensor of shape (batch_size, max_len) containing vocab
# indices.
toy_model.add(en_vectorize_layer)

# Now, the model can map strings to integers, 
input_data = [["run"], ["how are you"],["ectoplasmic residue"]]
pred = toy_model.predict(input_data)

print("Input data: \n{}\n".format(input_data))
print("\nToken IDs: \n{}".format(pred))

Input data: 
[['run'], ['how are you'], ['ectoplasmic residue']]


Token IDs: 
[[427   0   0]
 [ 40  23   4]
 [  1   1   0]]


### Sample of the vocabulary

In [14]:
# Print first few words in the vocabulary
print(en_vectorize_layer.get_vocabulary()[:10])
# Print the size of the vocabulary
print(len(en_vectorize_layer.get_vocabulary()))

['', '[UNK]', 'tom', 'to', 'you', 'the', 'i', 'a', 'is', 'that']
2238


In [48]:
# Print first few words in the vocabulary
print(de_vectorize_layer.get_vocabulary()[:10])
# Print the size of the vocabulary
print(len(de_vectorize_layer.get_vocabulary()))

['', '[UNK]', 'sos', 'eos', 'ich', 'tom', 'nicht', 'ist', 'das', 'du']
2497


## Defining the Seq2Seq model

Encoder decoder
Uses internal vectorizers in both encoder and decoder
Bidirectional encoder

In [43]:
import tensorflow.keras.backend as K
K.clear_session()

def get_vectorizer(list_of_strings, n_vocab, max_length=None, return_vocabulary=True, name=None):
    
    """ Return a text vectorization layer or a model """
    
    # Definie an input layer that takes a list of strings (or an array of strings)
    inp = tf.keras.Input(shape=(1,), dtype=tf.string, name='encoder_input')
    
    # When defining the vocab size, we'd add two for special tokens '' (Padding) and '[UNK]' (Oov tokens)
    vectorize_layer = tf.keras.layers.experimental.preprocessing.TextVectorization(
        max_tokens=n_vocab+2,
        output_mode='int',
        output_sequence_length=max_length,        
        name=name
    )
    
    # Fit the vectorizer layer on the data
    vectorize_layer.adapt(list_of_strings)
        
    # Get the token IDs
    vectorized_out = vectorize_layer(inp)
        
    if not return_vocabulary: 
        return tf.keras.models.Model(inputs=inp, outputs=vectorized_out)    
    else:
        # Returns the vocabulary in addition to the model
        return tf.keras.models.Model(inputs=inp, outputs=vectorized_out), vectorize_layer.get_vocabulary()        
    
        
def get_encoder_and_state(n_vocab, vectorizer):
    """ Define the encoder of the seq2seq model"""
    
    # The input is (None,1) shaped and accepts an array of strings
    inp = tf.keras.Input(shape=(1,), dtype=tf.string, name='e_input')

    # Vectorize the data (assign token IDs)
    vectorized_out = vectorizer(inp)
    
    # Define an embedding layer to convert IDs to word vectors
    emb_layer = tf.keras.layers.Embedding(n_vocab+2, 128, mask_zero=True, name='e_embedding')
    # Get the embeddings of the token IDs
    emb_out = emb_layer(vectorized_out)
    
    # Define a bidirectional GRU layer
    # Encoder looks at the english text (i.e. the input) both backwards and forward
    # this leads to better performance
    gru_layer = tf.keras.layers.Bidirectional(tf.keras.layers.GRU(128))
    
    # Get the output of the gru layer
    gru_out = gru_layer(emb_out)
    
    # Define the encoder model
    encoder = tf.keras.models.Model(inputs=inp, outputs=gru_out)
        
    # We are also returning the final state of the encoder as that
    # is passed as the initial state to the decoder
    return encoder, gru_out


def get_final_seq2seq_model(n_vocab, encoder, init_state, vectorizer):
    """ Define the final encoder-decoder model """
    
    # The input is (None,1) shaped and accepts an array of strings
    # This input layer is used to train the seq2seq model with teacher-forcing
    # we feed the German sequence as the input and ask the model to predict 
    # it with the words offset by 1 (i.e. next word)
    inp = tf.keras.Input(shape=(1,), dtype=tf.string, name='d_input')
    
    # Vectorize the data (assign token IDs)
    vectorized_out = vectorizer(inp)
    
    # Define an embedding layer to convert IDs to word vectors
    # Note that this is a different embedding layer to the encoder's embedding layer
    emb_layer = tf.keras.layers.Embedding(n_vocab+2, 128, mask_zero=True, name='d_embedding')
    
    # Get the embeddings of the token IDs
    emb_out = emb_layer(vectorized_out)
    
    # Define a GRU layer
    # Unlike the encoder, we cannot define a bidirectional GRU for the decoder
    # Why?
    gru_layer = tf.keras.layers.GRU(256, return_sequences=True)
    
    # Get the output of the gru layer
    gru_out = gru_layer(emb_out, initial_state=init_state)
    
    # Define an intermediate dense layer
    dense_layer_1 = tf.keras.layers.Dense(512, activation='relu')
    dense1_out = dense_layer_1(gru_out)
    
    # The final prediction layer with softmax
    dense_layer_final = tf.keras.layers.Dense(n_vocab+2, activation='softmax')
    dense_final_out = dense_layer_final(dense1_out)
    
    # Define the full model
    seq2seq = tf.keras.models.Model(inputs=[encoder.input, inp], outputs=dense_final_out)
    
    return seq2seq

# Get the English vectorizer/vocabulary
en_vectorizer, en_vocabulary = get_vectorizer(np.array(train_df["EN"].tolist()), en_vocab, max_length=en_seq_length, name='en_vectorizer')
# Get the German vectorizer/vocabulary
de_vectorizer, de_vocabulary = get_vectorizer(np.array(train_df["DE"].tolist()), de_vocab, max_length=de_seq_length-1, name='de_vectorizer')

# Define the final model
encoder, enc_final_state = get_encoder_and_state(en_vocab, en_vectorizer)
final_model = get_final_seq2seq_model(de_vocab, encoder, enc_final_state, de_vectorizer)


## Compile the model

In [44]:
from tensorflow.keras.metrics import SparseCategoricalAccuracy

# Compile the model
final_model.compile(
    loss='sparse_categorical_crossentropy', 
    optimizer='adam', 
    metrics=['accuracy']
)
final_model.summary()

Model: "model_3"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
e_input (InputLayer)            [(None, 1)]          0                                            
__________________________________________________________________________________________________
d_input (InputLayer)            [(None, 1)]          0                                            
__________________________________________________________________________________________________
model (Functional)              (None, 19)           0           e_input[0][0]                    
__________________________________________________________________________________________________
model_1 (Functional)            (None, 20)           0           d_input[0][0]                    
____________________________________________________________________________________________

## Evaluating MT models - BLEU metric

https://github.com/tensorflow/nmt/blob/master/nmt/scripts/bleu.py

### Defining the BLEU metric

In [49]:
from tensorflow.keras.layers.experimental.preprocessing import StringLookup
from bleu import compute_bleu

class BLEUMetric(object):
    
    def __init__(self, vocabulary, name='perplexity', **kwargs):
      """ Computes the BLEU score (Metric for machine translation) """
      super().__init__()
      self.vocab = vocabulary
      self.id_to_token_layer = StringLookup(vocabulary=self.vocab, invert=True)
    
    def calculate_bleu_from_predictions(self, real, pred):
        """ Calculate the BLEU score for targets and predictions """
        
        # Get the predicted token IDs
        pred_argmax = tf.argmax(pred, axis=-1)  
        
        # Convert token IDs to words using the vocabulary and the StringLookup
        pred_tokens = self.id_to_token_layer(pred_argmax)
        real_tokens = self.id_to_token_layer(real)
        
        def clean_text(tokens):
            
            """ Clean padding and [SOS]/[EOS] tokens to only keep meaningful words """
            
            # 3. Strip the string of any extra white spaces
            t = tf.strings.strip(
                        # 2. Replace everything after the eos token with blank
                        tf.strings.regex_replace(
                            # 1. Join all the tokens to one string in each sequence
                            tf.strings.join(
                                tf.transpose(tokens), separator=' '
                            ),
                        "eos.*", ""),
                   )
            
            # Decode the byte stream to a string
            t = np.char.decode(t.numpy().astype(np.bytes_), encoding='utf-8')
            
            # If the string is empty, add a [UNK] token
            # Otherwise get a Division by zero error
            t = [doc if len(doc)>0 else '[UNK]' for doc in t ]
            
            # Split the sequences to individual tokens 
            t = np.char.split(t).tolist()
            
            return t
        
        # Get the clean versions of the predictions and real seuqences
        pred_tokens = clean_text(pred_tokens)
        # We have to wrap each real sequence in a list to make use of a function to compute bleu
        real_tokens = [[r] for r in clean_text(real_tokens)]
        
        # The compute_bleu method accpets the translations and references in the following format
        # tranlation - list of list of tokens
        # references - list of list of list of tokens
        bleu, precisions, bp, ratio, translation_length, reference_length = compute_bleu(real_tokens, pred_tokens, smooth=False)

        return bleu

### Using the BLEU metric

In [50]:
translation = [['[UNK]', 'einmal', 'mÃ¼ssen', 'wir', 'in', 'erfahrung', 'bringen', 'wo', 'sie', 'wohnen']]
reference = [[['als', 'mÃ¼ssen', 'mÃ¼ssen', 'wir', 'in', 'erfahrung', 'bringen', 'wo', 'sie', 'wohnen']]]

compute_bleu(reference, translation)

(0.7598356856515925,
 [0.8, 0.7777777777777778, 0.75, 0.7142857142857143],
 1.0,
 1.0,
 10,
 10)

## Training the model with a custom loop

In [51]:
epochs = 5
batch_size = 128


def prepare_data(train_df, valid_df, test_df):
    """ Create a data dictionary from the dataframes containing data """
    
    data_dict = {}
    for label, df in zip(['train', 'valid', 'test'], [train_df, valid_df, test_df]):
        en_inputs = np.array(df["EN"].tolist())
        de_inputs = np.array(df["DE"].str.rsplit(n=1, expand=True).iloc[:,0].tolist())
        de_labels = np.array(df["DE"].str.split(n=1, expand=True).iloc[:,1].tolist())
        data_dict[label] = {'encoder_inputs': en_inputs, 'decoder_inputs': de_inputs, 'decoder_labels': de_labels}
    
    return data_dict


def shuffle_data(en_inputs, de_inputs, de_labels, shuffle_inds=None): 
    """ Shuffle the data randomly (but all of inputs and labels at ones)"""
        
    if shuffle_inds is None:
        # If shuffle_inds are not passed create a shuffling automatically
        shuffle_inds = np.random.permutation(np.arange(en_inputs.shape[0]))
    else:
        # Shuffle the provided shuffle_inds
        shuffle_inds = np.random.permutation(shuffle_inds)
    
    # Return shuffled data
    return (en_inputs[shuffle_inds], de_inputs[shuffle_inds], de_labels[shuffle_inds]), shuffle_inds

# Define the metric
bleu_metric = BLEUMetric(de_vocabulary)

# Define the data
data_dict = prepare_data(train_df, valid_df, test_df)

shuffle_inds = None

for epoch in range(epochs):
    
    # Reset metric logs every epoch
    bleu_log, val_bleu_log = [],[]
    accuracy_log, val_accuracy_log = [],[]
    loss_log, val_loss_log = [], []
        
    # =================================================================== #
    #                         Train Phase                                 #
    # =================================================================== #
    
    # Shuffle data at the beginning of every epoch
    (en_inputs_raw,de_inputs_raw,de_labels_raw), shuffle_inds  = shuffle_data(
        data_dict['train']['encoder_inputs'],
        data_dict['train']['decoder_inputs'],
        data_dict['train']['decoder_labels'],
        shuffle_inds
    )
    
    # Get the number of training batches
    n_train_batches = en_inputs_raw.shape[0]//batch_size
    
    # Train one batch at a time
    for i in range(n_train_batches):
        # Status update
        print("Training batch {}/{}".format(i+1, n_train_batches), end='\r')
        
        # Get a batch of inputs (english and german sequences)
        x = [en_inputs_raw[i*batch_size:(i+1)*batch_size], de_inputs_raw[i*batch_size:(i+1)*batch_size]]
        # Get a batch of targets (german sequences offset by 1)
        y = de_vectorizer(de_labels_raw[i*batch_size:(i+1)*batch_size])
        
        # Train for a single step
        final_model.train_on_batch(x, y)        
        # Evaluate the model to get the metrics
        loss, accuracy = final_model.evaluate(x, y, verbose=0)
        # Get the final prediction to compute BLEU
        pred_y = final_model.predict(x)
        
        # Update the epoch's log records of the metrics
        loss_log.append(loss)
        accuracy_log.append(accuracy)
        bleu_log.append(bleu_metric.calculate_bleu_from_predictions(y, pred_y))
    
    # =================================================================== #
    #                      Validation Phase                               #
    # =================================================================== #
    
    # Get the validation data
    val_en_inputs_raw = data_dict['valid']['encoder_inputs']
    val_de_inputs_raw = data_dict['valid']['decoder_inputs']
    val_de_labels_raw = data_dict['valid']['decoder_labels']
    
    # Get the number of validation batches
    n_valid_batches = val_en_inputs_raw.shape[0]//batch_size
    print(" ", end='\r')
    
    # Evaluate one validation batch at a time
    for i in range(n_valid_batches):
        # Status update
        print("Evaluating batch {}/{}".format(i+1, n_valid_batches), end='\r')
        
        # Get the inputs and targers
        x = [val_en_inputs_raw[i*batch_size:(i+1)*batch_size], val_de_inputs_raw[i*batch_size:(i+1)*batch_size]]
        y = de_vectorizer(val_de_labels_raw[i*batch_size:(i+1)*batch_size])
        
        # Get the evaluation metrics
        loss, accuracy = final_model.evaluate(x, y, verbose=0)
        # Get the predictions to compute BLEU
        pred_y = final_model.predict(x)
        
        # Update validation logs
        val_loss_log.append(loss)
        val_accuracy_log.append(accuracy)
        val_bleu_log.append(bleu_metric.calculate_bleu_from_predictions(y, pred_y))
    
    # Print the evaluation metrics of each epoch
    print("\nEpoch {}/{}".format(epoch+1, epochs))
    print("\t(train) loss: {} - accuracy: {} - bleu: {}".format(np.mean(loss_log), np.mean(accuracy_log), np.mean(bleu_log)))
    print("\t(valid) loss: {} - accuracy: {} - bleu: {}".format(np.mean(val_loss_log), np.mean(val_accuracy_log), np.mean(val_bleu_log)))

Evaluating batch 39/39
Epoch 1/10
	(train) loss: 0.41324986469669217 - accuracy: 0.7189025989709756 - bleu: 0.35723207206008084
	(valid) loss: 0.811834665445181 - accuracy: 0.5698876319787441 - bleu: 0.18294922356038132
Evaluating batch 39/39
Epoch 2/10
	(train) loss: 0.37136018696503764 - accuracy: 0.7441589690935917 - bleu: 0.39565518355736073
	(valid) loss: 0.8277023021991436 - accuracy: 0.5713850213931158 - bleu: 0.1789816527107478
Evaluating batch 39/39
Epoch 3/10
	(train) loss: 0.33318772687552833 - accuracy: 0.768034258332008 - bleu: 0.43357104161468396
	(valid) loss: 0.8534994400464572 - accuracy: 0.5705839701187916 - bleu: 0.1800040226641332
Evaluating batch 39/39
Epoch 4/10
	(train) loss: 0.29957190447319776 - accuracy: 0.7894737848486656 - bleu: 0.47271358934189067
	(valid) loss: 0.8748689416127328 - accuracy: 0.5686705769636692 - bleu: 0.18438219690140717
Evaluating batch 39/39
Epoch 5/10
	(train) loss: 0.2685514767773641 - accuracy: 0.8104559049392358 - bleu: 0.51227594913

KeyboardInterrupt: 

## Save the trained model

In [54]:
## Save the model
os.makedirs('models', exist_ok=True)
tf.keras.models.save_model(final_model, os.path.join('models', 'seq2seq'))



INFO:tensorflow:Assets written to: models/seq2seq/assets


INFO:tensorflow:Assets written to: models/seq2seq/assets


In [47]:
i=345
batch_size = 1
x = [en_inputs_raw[i*batch_size:(i+1)*batch_size], de_inputs_raw[i*batch_size:(i+1)*batch_size]]
y = de_vectorizer(de_labels_raw[i*batch_size:(i+1)*batch_size])

pred_y = final_model.predict(x)
print(y)
print(tf.constant(np.argmax(pred_y, axis=-1)))

tf.Tensor(
[[  1  41   4 104   1 274   3   0   0   0   0   0   0   0   0   0   0   0
    0   0]], shape=(1, 20), dtype=int64)
tf.Tensor(
[[  1   7   4 505   1 274   3   1   1   1   1   1   1   1   1   1   1   1
    1   1]], shape=(1, 20), dtype=int64)
