# TRANSFORMER para la Traducción de Texto

**Implementación de un modelo _Transformer_ para la traducción de texto.**


## Introducción

Los Transformers están diseñados para manejar datos secuenciales, como el lenguaje natural, para tareas como la traducción y el resumen de texto. Sin embargo, a diferencia de los RNN, los Transformers no requieren que los datos secuenciales se procesen en orden. Por ejemplo, si los datos de entrada son una oración en lenguaje natural, el Transformer no necesita procesar el principio antes del final. Debido a esta característica, el Transformer permite mucha más paralelización que los RNN y, por lo tanto, reduce los tiempos de entrenamiento.

Los Transformers se han convertido en el modelo de elección para abordar muchos problemas en la PNL, reemplazando los modelos de redes neuronales recurrentes más antiguos, como la memoria a corto plazo (LSTM). Dado que el modelo Transformer facilita una mayor paralelización durante el entrenamiento, ha permitido el entrenamiento en conjuntos de datos más grandes de lo que era posible antes de su introducción.

Para más información sobre los transformers: https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)


## Importar las dependencias

In [None]:
import warnings

warnings.filterwarnings("ignore")

In [None]:
!pip install tensorflow tensorflow-datasets nltk



In [None]:
import tensorflow as tf

# Verificar que estamos utilizando una GPU
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

Num GPUs Available:  1


In [None]:
import numpy as np
import pandas as pd

# import tensorflow as tf
from tensorflow.keras.layers import Dense, LayerNormalization, Embedding, Dropout
from tensorflow.keras.models import Model

## Montar directorio en Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


## Carga y Preprocesamiento de Datos
Vamos a usar el dataset ***ted_hrlr_translate/pt_to_en*** de TensorFlow Datasets como un ejemplo. Este dataset contiene pares de traducciones portugués-inglés.

In [None]:
# pip install tensorflow-datasets
import tensorflow_datasets as tfds
import tensorflow as tf



# Cargar el dataset wmt13_translate/es-en
#examples, metadata = tfds.load('wmt13_translate/es-en', with_info=True, as_supervised=True)
#train_examples, val_examples = examples['train'], examples['validation']


# Cargar el dataset ted_hrlr_translate/es_to_pt
examples, metadata = tfds.load('ted_hrlr_translate/es_to_pt', with_info=True, as_supervised=True)
train_examples, val_examples = examples['train'], examples['validation']

Downloading and preparing dataset 124.94 MiB (download: 124.94 MiB, generated: Unknown size, total: 124.94 MiB) to /root/tensorflow_datasets/ted_hrlr_translate/es_to_pt/1.0.0...


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Extraction completed...: 0 file [00:00, ? file/s]

Generating splits...:   0%|          | 0/3 [00:00<?, ? splits/s]

Generating train examples...:   0%|          | 0/44938 [00:00<?, ? examples/s]

Shuffling /root/tensorflow_datasets/ted_hrlr_translate/es_to_pt/1.0.0.incompleteEOOHLL/ted_hrlr_translate-trai…

Generating validation examples...:   0%|          | 0/1016 [00:00<?, ? examples/s]

Shuffling /root/tensorflow_datasets/ted_hrlr_translate/es_to_pt/1.0.0.incompleteEOOHLL/ted_hrlr_translate-vali…

Generating test examples...:   0%|          | 0/1763 [00:00<?, ? examples/s]

Shuffling /root/tensorflow_datasets/ted_hrlr_translate/es_to_pt/1.0.0.incompleteEOOHLL/ted_hrlr_translate-test…

Dataset ted_hrlr_translate downloaded and prepared to /root/tensorflow_datasets/ted_hrlr_translate/es_to_pt/1.0.0. Subsequent calls will reuse this data.


### TOKENIZACIÓN

Usaremos ***SubwordTextEncoder*** para la tokenización, que es adecuado para la traducción de texto.

In [None]:
tokenizer_es = tfds.deprecated.text.SubwordTextEncoder.build_from_corpus(
    (es.numpy() for es, pt in train_examples), target_vocab_size=2**13)
tokenizer_pt = tfds.deprecated.text.SubwordTextEncoder.build_from_corpus(
    (pt.numpy() for es, pt in train_examples), target_vocab_size=2**13)

In [None]:
sample_string = 'El transformer es increíble.'

tokenized_string = tokenizer_es.encode(sample_string)
print('La cadena tokenizada es: {}'.format(tokenized_string))

original_string = tokenizer_es.decode(tokenized_string)
print('La cadena original es: {}'.format(original_string))


La cadena tokenizada es: [7940, 231, 4175, 4632, 32, 10, 799, 7917]
La cadena original es: El transformer es increíble.


### Preprocesamiento
Definimos las funciones de preprocesamiento para preparar los datos de entrenamiento y validación.

In [None]:
BUFFER_SIZE = 20000
BATCH_SIZE = 64
MAX_LENGTH = 40

def encode(lang1, lang2):
    lang1 = [tokenizer_es.vocab_size] + tokenizer_es.encode(lang1.numpy()) + [tokenizer_es.vocab_size+1]
    lang2 = [tokenizer_pt.vocab_size] + tokenizer_pt.encode(lang2.numpy()) + [tokenizer_pt.vocab_size+1]
    return lang1, lang2


def tf_encode(es, pt):
    result_es, result_pt = tf.py_function(encode, [es, pt], [tf.int64, tf.int64])
    result_es.set_shape([None])
    result_pt.set_shape([None])
    return result_es, result_pt

def filter_max_length(x, y, max_length=MAX_LENGTH):
    return tf.logical_and(tf.size(x) <= max_length,
                          tf.size(y) <= max_length)

train_dataset = train_examples.map(tf_encode)
train_dataset = train_dataset.filter(filter_max_length)
train_dataset = train_dataset.cache()
train_dataset = train_dataset.shuffle(BUFFER_SIZE).padded_batch(BATCH_SIZE, padded_shapes=([None], [None]))
train_dataset = train_dataset.prefetch(tf.data.experimental.AUTOTUNE)

val_dataset = val_examples.map(tf_encode)
val_dataset = val_dataset.filter(filter_max_length).padded_batch(BATCH_SIZE, padded_shapes=([None], [None]))


### Definición del Transformer
Vamos a definir el modelo Transformer

## Implementación del TRANSFORMER

### Funciones Utilitarias
#### Función de Escalado de Producto Punto de Atención

Esta función calcula la atención de producto punto escalado, que es el corazón del mecanismo de atención en los Transformers. La matriz de atención se obtiene multiplicando la consulta (query) y la clave (key), y luego escalándola. Si hay un mask, se aplica para evitar mirar hacia ciertos tokens. Finalmente, se aplica softmax para obtener los pesos de atención y se multiplica por los valores (value).

In [None]:
import tensorflow as tf
from tensorflow.keras.layers import Dense, LayerNormalization, Embedding, Dropout

def scaled_dot_product_attention(q, k, v, mask):
    matmul_qk = tf.matmul(q, k, transpose_b=True)

    # Escalar matmul_qk
    dk = tf.cast(tf.shape(k)[-1], tf.float32)
    scaled_attention_logits = matmul_qk / tf.math.sqrt(dk)

    # Añadir el mask al producto punto escalado
    if mask is not None:
        scaled_attention_logits += (mask * -1e9)

    # Softmax en la última dimensión (seq_len_k)
    attention_weights = tf.nn.softmax(scaled_attention_logits, axis=-1)

    output = tf.matmul(attention_weights, v)

    return output, attention_weights


### Capa de MULTI-HEAD ATTENTION (Atención Multi-Cabeza)

 La clase MultiHeadAttention implementa la atención multi-cabeza. Divide las dimensiones del modelo en varias cabezas, realiza atención en paralelo y luego concatena los resultados. Esta técnica permite al modelo enfocarse en diferentes partes de la secuencia al mismo tiempo.

In [None]:
class MultiHeadAttention(tf.keras.layers.Layer):
    def __init__(self, d_model, num_heads):
        super(MultiHeadAttention, self).__init__()
        self.num_heads = num_heads
        self.d_model = d_model

        assert d_model % self.num_heads == 0

        self.depth = d_model // self.num_heads

        self.wq = Dense(d_model)
        self.wk = Dense(d_model)
        self.wv = Dense(d_model)

        self.dense = Dense(d_model)

    def split_heads(self, x, batch_size):
        x = tf.reshape(x, (batch_size, -1, self.num_heads, self.depth))
        return tf.transpose(x, perm=[0, 2, 1, 3])

    def call(self, v, k, q, mask):
        batch_size = tf.shape(q)[0]

        q = self.wq(q)
        k = self.wk(k)
        v = self.wv(v)

        q = self.split_heads(q, batch_size)
        k = self.split_heads(k, batch_size)
        v = self.split_heads(v, batch_size)

        scaled_attention, attention_weights = scaled_dot_product_attention(q, k, v, mask)

        scaled_attention = tf.transpose(scaled_attention, perm=[0, 2, 1, 3])

        concat_attention = tf.reshape(scaled_attention, (batch_size, -1, self.d_model))

        output = self.dense(concat_attention)

        return output, attention_weights



### Capa de Codificador (Encoder Layer)

 La clase **EncoderLayer** implementa una capa del codificador del Transformer. Esta capa contiene una atención multi-cabeza y una red feed-forward. Se aplican normalización de capas (layer normalization) y dropout para regularizar y estabilizar el entrenamiento.

In [None]:
class EncoderLayer(tf.keras.layers.Layer):
    def __init__(self, d_model, num_heads, dff, rate=0.1):
        super(EncoderLayer, self).__init__()

        self.mha = MultiHeadAttention(d_model, num_heads)
        self.ffn = tf.keras.Sequential([
            Dense(dff, activation='relu'),
            Dense(d_model)
        ])

        self.layernorm1 = LayerNormalization(epsilon=1e-6)
        self.layernorm2 = LayerNormalization(epsilon=1e-6)

        self.dropout1 = Dropout(rate)
        self.dropout2 = Dropout(rate)

    def call(self, x, training, mask):
        attn_output, _ = self.mha(x, x, x, mask)
        attn_output = self.dropout1(attn_output, training=training)
        out1 = self.layernorm1(x + attn_output)

        ffn_output = self.ffn(out1)
        ffn_output = self.dropout2(ffn_output, training=training)
        out2 = self.layernorm2(out1 + ffn_output)

        return out2


### Capa de Decodificador (Decoder Layer)

La clase **DecoderLayer** implementa una capa del decodificador del Transformer. Esta capa contiene dos bloques de atención multi-cabeza: uno para la auto-atención en el decodificador y otro para la atención sobre la salida del codificador. También incluye una red feed-forward, normalización de capas y dropout.

In [None]:
class DecoderLayer(tf.keras.layers.Layer):
    def __init__(self, d_model, num_heads, dff, rate=0.1):
        super(DecoderLayer, self).__init__()

        self.mha1 = MultiHeadAttention(d_model, num_heads)
        self.mha2 = MultiHeadAttention(d_model, num_heads)

        self.ffn = tf.keras.Sequential([
            Dense(dff, activation='relu'),
            Dense(d_model)
        ])

        self.layernorm1 = LayerNormalization(epsilon=1e-6)
        self.layernorm2 = LayerNormalization(epsilon=1e-6)
        self.layernorm3 = LayerNormalization(epsilon=1e-6)

        self.dropout1 = Dropout(rate)
        self.dropout2 = Dropout(rate)
        self.dropout3 = Dropout(rate)

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        attn1, attn_weights_block1 = self.mha1(x, x, x, look_ahead_mask)
        attn1 = self.dropout1(attn1, training=training)
        out1 = self.layernorm1(x + attn1)

        attn2, attn_weights_block2 = self.mha2(enc_output, enc_output, out1, padding_mask)
        attn2 = self.dropout2(attn2, training=training)
        out2 = self.layernorm2(out1 + attn2)

        ffn_output = self.ffn(out2)
        ffn_output = self.dropout3(ffn_output, training=training)
        out3 = self.layernorm3(out2 + ffn_output)

        return out3, attn_weights_block1, attn_weights_block2



## Modelo Completo del Transformer

### CODIFICADOR

La clase Encoder implementa el codificador completo del Transformer. Combina embeddings y codificaciones posicionales, aplica una serie de capas de codificación y usa dropout para regularizar el modelo. La función positional_encoding genera las codificaciones posicionales necesarias para que el modelo tenga en cuenta la posición de cada token en la secuencia.

In [None]:
class Encoder(tf.keras.layers.Layer):
    def __init__(self, num_layers, d_model, num_heads, dff, input_vocab_size, maximum_position_encoding, rate=0.1):
        super(Encoder, self).__init__()

        self.d_model = d_model
        self.num_layers = num_layers

        self.embedding = Embedding(input_vocab_size, d_model)
        self.pos_encoding = self.positional_encoding(maximum_position_encoding, self.d_model)

        self.enc_layers = [EncoderLayer(d_model, num_heads, dff, rate) for _ in range(num_layers)]
        self.dropout = Dropout(rate)

    def positional_encoding(self, position, d_model):
        angle_rads = self.get_angles(np.arange(position)[:, np.newaxis], np.arange(d_model)[np.newaxis, :], d_model)
        angle_rads[:, 0::2] = np.sin(angle_rads[:, 0::2])
        angle_rads[:, 1::2] = np.cos(angle_rads[:, 1::2])
        pos_encoding = angle_rads[np.newaxis, ...]
        return tf.cast(pos_encoding, dtype=tf.float32)

    def get_angles(self, pos, i, d_model):
        angle_rates = 1 / np.power(10000, (2 * (i // 2)) / np.float32(d_model))
        return pos * angle_rates

    def call(self, x, training, mask):
        seq_len = tf.shape(x)[1]

        x = self.embedding(x)
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            x = self.enc_layers[i](x, training=training, mask=mask)

        return x


### DECODIFICADOR

La clase Decoder implementa el decodificador completo del Transformer. Combina embeddings y codificaciones posicionales, aplica una serie de capas de decodificación y usa dropout para regularizar el modelo. La función positional_encoding genera las codificaciones posicionales necesarias para que el modelo tenga en cuenta la posición de cada token en la secuencia.

In [None]:
class Decoder(tf.keras.layers.Layer):
    def __init__(self, num_layers, d_model, num_heads, dff, target_vocab_size, maximum_position_encoding, rate=0.1):
        super(Decoder, self).__init__()

        self.d_model = d_model
        self.num_layers = num_layers

        self.embedding = Embedding(target_vocab_size, d_model)
        self.pos_encoding = self.positional_encoding(maximum_position_encoding, d_model)

        self.dec_layers = [DecoderLayer(d_model, num_heads, dff, rate) for _ in range(num_layers)]
        self.dropout = Dropout(rate)

    def positional_encoding(self, position, d_model):
        angle_rads = self.get_angles(np.arange(position)[:, np.newaxis], np.arange(d_model)[np.newaxis, :], d_model)
        angle_rads[:, 0::2] = np.sin(angle_rads[:, 0::2])
        angle_rads[:, 1::2] = np.cos(angle_rads[:, 1::2])
        pos_encoding = angle_rads[np.newaxis, ...]
        return tf.cast(pos_encoding, dtype=tf.float32)

    def get_angles(self, pos, i, d_model):
        angle_rates = 1 / np.power(10000, (2 * (i // 2)) / np.float32(d_model))
        return pos * angle_rates

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        seq_len = tf.shape(x)[1]
        attention_weights = {}

        x = self.embedding(x)
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            x, block1, block2 = self.dec_layers[i](x, enc_output, training=training, look_ahead_mask=look_ahead_mask, padding_mask=padding_mask)

            attention_weights[f'decoder_layer{i+1}_block1'] = block1
            attention_weights[f'decoder_layer{i+1}_block2'] = block2

        return x, attention_weights


### TRANSFORMER COMPLETO

La clase Transformer define el modelo completo del Transformer, combinando el codificador y el decodificador. Se aplican las máscaras necesarias para la atención y se obtiene la salida final a través de una capa densa.

In [None]:
class Transformer(tf.keras.Model):
    def __init__(self, num_layers, d_model, num_heads, dff, input_vocab_size, target_vocab_size, pe_input, pe_target, rate=0.1):
        super(Transformer, self).__init__()

        self.encoder = Encoder(num_layers, d_model, num_heads, dff, input_vocab_size, pe_input, rate)

        self.decoder = Decoder(num_layers, d_model, num_heads, dff, target_vocab_size, pe_target, rate)

        self.final_layer = Dense(target_vocab_size)

    def call(self, inp, tar, training, enc_padding_mask, look_ahead_mask, dec_padding_mask):
        enc_output = self.encoder(inp, training=training, mask=enc_padding_mask)

        dec_output, attention_weights = self.decoder(tar, enc_output, training=training, look_ahead_mask=look_ahead_mask, padding_mask=dec_padding_mask)

        final_output = self.final_layer(dec_output)

        return final_output, attention_weights


In [None]:
# Parámetros del modelo
num_layers = 4
d_model = 128
dff = 512
num_heads = 8
input_vocab_size = tokenizer_es.vocab_size + 2
target_vocab_size = tokenizer_pt.vocab_size + 2
# input_vocab_size = tokenizer_es.vocab_size
# target_vocab_size = tokenizer_en.vocab_size
dropout_rate = 0.1

transformer = Transformer(
    num_layers,
    d_model,
    num_heads,
    dff,
    input_vocab_size,
    target_vocab_size,
    pe_input=10000,
    pe_target=10000,
    # pe_input=input_vocab_size,
    # pe_target=target_vocab_size,
    rate=dropout_rate)

In [None]:
# COMPROBACIÓN DE TAMAÑOS DEL VOCABULARIO
print(f"Input vocab size: {input_vocab_size}")
print(f"Target vocab size: {target_vocab_size}")

Input vocab size: 8129
Target vocab size: 8173


### Definición de la Función de Pérdida y el Optimizador

Aquí definimos la función de pérdida, que ignora los tokens de relleno, y el optimizador que usa un programador de tasa de aprendizaje personalizado.

In [None]:
class CustomSchedule(tf.keras.optimizers.schedules.LearningRateSchedule):
    def __init__(self, d_model, warmup_steps=4000):
        super(CustomSchedule, self).__init__()

        self.d_model = d_model
        self.d_model = tf.cast(self.d_model, tf.float32)
        self.warmup_steps = warmup_steps

    def __call__(self, step):
        step = tf.cast(step, tf.float32)  # Asegurarse de que step es float32
        arg1 = tf.math.rsqrt(step)
        arg2 = step * (self.warmup_steps**-1.5)

        return tf.math.rsqrt(self.d_model) * tf.math.minimum(arg1, arg2)


learning_rate = CustomSchedule(d_model)
optimizer = tf.keras.optimizers.Adam(learning_rate, beta_1=0.9, beta_2=0.98, epsilon=1e-9)


In [None]:
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction='none')

def loss_function(real, pred):
    mask = tf.math.logical_not(tf.math.equal(real, 0))
    loss_ = loss_object(real, pred)

    mask = tf.cast(mask, dtype=loss_.dtype)
    loss_ *= mask

    return tf.reduce_sum(loss_)/tf.reduce_sum(mask)

train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')


### ENTRENAMIENTO DEL MODELO

Este bloque de código entrena el modelo Transformer. Se itera sobre el conjunto de datos de entrenamiento, se calculan las predicciones, la pérdida, y se aplican los gradientes para actualizar los pesos del modelo.


#### Creación de máscaras

In [None]:
def create_padding_mask(seq):
    seq = tf.cast(tf.math.equal(seq, 0), tf.float32)
    return seq[:, tf.newaxis, tf.newaxis, :]

def create_look_ahead_mask(size):
    mask = 1 - tf.linalg.band_part(tf.ones((size, size)), -1, 0)
    return mask

def create_masks(inp, tar):
    enc_padding_mask = create_padding_mask(inp)
    dec_padding_mask = create_padding_mask(inp)

    look_ahead_mask = create_look_ahead_mask(tf.shape(tar)[1])
    dec_target_padding_mask = create_padding_mask(tar)
    combined_mask = tf.maximum(dec_target_padding_mask, look_ahead_mask)

    return enc_padding_mask, combined_mask, dec_padding_mask


#### Checkpoints

In [None]:
checkpoint_path = "/content/drive/My Drive/checkpoints/train"

ckpt = tf.train.Checkpoint(transformer=transformer, optimizer=optimizer)

ckpt_manager = tf.train.CheckpointManager(ckpt, checkpoint_path, max_to_keep=5)

if ckpt_manager.latest_checkpoint:
    ckpt.restore(ckpt_manager.latest_checkpoint)
    print('Latest checkpoint restored!')

Latest checkpoint restored!


### ENTRENAMIENTO DEL MODELO

In [None]:
import time

EPOCHS = 20

for epoch in range(EPOCHS):
    start = time.time()

    train_loss.reset_state()
    train_accuracy.reset_state()

    for (batch, (inp, tar)) in enumerate(train_dataset):
        tar_inp = tar[:, :-1]
        tar_real = tar[:, 1:]

        enc_padding_mask, combined_mask, dec_padding_mask = create_masks(inp, tar_inp)

        with tf.GradientTape() as tape:
            predictions, _ = transformer(
                inp, tar_inp, training=True,
                enc_padding_mask=enc_padding_mask,
                look_ahead_mask=combined_mask,
                dec_padding_mask=dec_padding_mask)
            loss = loss_function(tar_real, predictions)

        gradients = tape.gradient(loss, transformer.trainable_variables)
        optimizer.apply_gradients(zip(gradients, transformer.trainable_variables))

        train_loss(loss)
        train_accuracy(tar_real, predictions)

        if batch % 50 == 0:
            print(f'Epoch {epoch+1} Batch {batch} Loss {train_loss.result():.4f} Accuracy {train_accuracy.result():.4f}')

    if (epoch + 1) % 4 == 0:
        ckpt_save_path = ckpt_manager.save()
        print(f'Saving checkpoint for epoch {epoch+1} at {ckpt_save_path}')

    print(f'Epoch {epoch+1} Loss {train_loss.result():.4f} Accuracy {train_accuracy.result():.4f}')
    print(f'Time taken for 1 epoch: {time.time() - start:.2f} secs\n')




Epoch 1 Batch 0 Loss 9.0105 Accuracy 0.0000
Epoch 1 Batch 50 Loss 8.9603 Accuracy 0.0046
Epoch 1 Batch 100 Loss 8.8754 Accuracy 0.0157
Epoch 1 Batch 150 Loss 8.7840 Accuracy 0.0196
Epoch 1 Batch 200 Loss 8.6738 Accuracy 0.0216
Epoch 1 Batch 250 Loss 8.5381 Accuracy 0.0250
Epoch 1 Batch 300 Loss 8.3832 Accuracy 0.0284
Epoch 1 Batch 350 Loss 8.2186 Accuracy 0.0322
Epoch 1 Batch 400 Loss 8.0547 Accuracy 0.0356
Epoch 1 Batch 450 Loss 7.9047 Accuracy 0.0384
Epoch 1 Batch 500 Loss 7.7736 Accuracy 0.0407
Epoch 1 Batch 550 Loss 7.6563 Accuracy 0.0425
Epoch 1 Batch 600 Loss 7.5495 Accuracy 0.0444
Epoch 1 Loss 7.5334 Accuracy 0.0447
Time taken for 1 epoch: 490.54 secs

Epoch 2 Batch 0 Loss 6.1153 Accuracy 0.0658
Epoch 2 Batch 50 Loss 6.2273 Accuracy 0.0735
Epoch 2 Batch 100 Loss 6.1462 Accuracy 0.0762
Epoch 2 Batch 150 Loss 6.0837 Accuracy 0.0781
Epoch 2 Batch 200 Loss 6.0264 Accuracy 0.0802
Epoch 2 Batch 250 Loss 5.9704 Accuracy 0.0820
Epoch 2 Batch 300 Loss 5.9241 Accuracy 0.0837
Epoch 2 Batch

### EVALUACIÓN DEL MODELO

La función evaluate genera traducciones a partir del modelo entrenado, mientras que translate toma una oración de entrada y devuelve la traducción. Se calcula el BLEU score para evaluar la calidad de las traducciones generadas.


In [None]:
def evaluate(inp_sentence):
    start_token = [tokenizer_es.vocab_size]
    end_token = [tokenizer_es.vocab_size + 1]

    inp_sentence = start_token + tokenizer_es.encode(inp_sentence) + end_token
    encoder_input = tf.expand_dims(inp_sentence, 0)

    decoder_input = [tokenizer_pt.vocab_size]
    output = tf.expand_dims(decoder_input, 0)

    for i in range(MAX_LENGTH):
        enc_padding_mask, combined_mask, dec_padding_mask = create_masks(encoder_input, output)

        predictions, attention_weights = transformer(
            encoder_input, output, training=False,
            enc_padding_mask=enc_padding_mask,
            look_ahead_mask=combined_mask,
            dec_padding_mask=dec_padding_mask)

        predictions = predictions[:, -1:, :]
        predicted_id = tf.cast(tf.argmax(predictions, axis=-1), tf.int32)

        if predicted_id == tokenizer_pt.vocab_size + 1:
            return tf.squeeze(output, axis=0), attention_weights

        output = tf.concat([output, predicted_id], axis=-1)

    return tf.squeeze(output, axis=0), attention_weights

def translate(sentence):
    result, attention_weights = evaluate(sentence)

    predicted_sentence = tokenizer_pt.decode([i for i in result if i < tokenizer_pt.vocab_size])

    print('Input: {}'.format(sentence))
    print('Predicted translation: {}'.format(predicted_sentence))


In [None]:
input_sentence = "Hoy ha salido el sol"
translate(input_sentence)

Input: Hoy ha salido el sol
Predicted translation: `` `` '' é preciso e saída para o sol . ''
