<a href="https://colab.research.google.com/github/youngmook/cheminfo-python/blob/main/BERT_mlm_tutorial_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# BERT(masked language model) tutorial

[Reference URL](https://keras.io/examples/nlp/masked_language_modeling/)

BERT 학습 및 분류 fine-tuning 실습코드 리뷰

220516

고우영

# End-to-end Masked Language Modeling with BERT

**Author:** [Ankur Singh](https://twitter.com/ankur310794)<br>
**Date created:** 2020/09/18<br>
**Last modified:** 2020/09/18<br>
**Description:** Implement a Masked Language Model (MLM) with BERT and fine-tune it on the IMDB Reviews dataset.

## Introduction

Masked Language Modeling is a fill-in-the-blank task,
where a model uses the context words surrounding a mask token to try to predict what the
masked word should be.

For an input that contains one or more mask tokens,
the model will generate the most likely substitution for each.

Example:

- Input: "I have watched this [MASK] and it was awesome."
- Output: "I have watched this movie and it was awesome."

Masked language modeling is a great way to train a language
model in a self-supervised setting (without human-annotated labels).
Such a model can then be fine-tuned to accomplish various supervised
NLP tasks.

This example teaches you how to build a BERT model from scratch,
train it with the masked language modeling task,
and then fine-tune this model on a sentiment classification task.

We will use the Keras `TextVectorization` and `MultiHeadAttention` layers
to create a BERT Transformer-Encoder network architecture.

## Setup

Install `tensorflow 2.8.0` via `pip install tensorflow==2.8.0`.

In [None]:
## 버전 확인
import tensorflow as tf
print('TF version : %s\n'%tf.__version__)

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

TF version : 2.8.0

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 5760430669341043960
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 11320098816
locality {
  bus_id: 1
  links {
  }
}
incarnation: 10069362004369244149
physical_device_desc: "device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7"
xla_global_id: 416903419
]


In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.python.keras.layers.multi_head_attention import MultiHeadAttention
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization
from dataclasses import dataclass
import pandas as pd
import numpy as np
import glob
import re
from pprint import pprint

## Set-up Configuration

In [None]:
@dataclass
class Config:
    MAX_LEN = 256
    BATCH_SIZE = 32
    LR = 0.001
    VOCAB_SIZE = 30000
    EMBED_DIM = 128
    NUM_HEAD = 8  # used in bert model
    FF_DIM = 128  # used in bert model
    NUM_LAYERS = 1

config = Config()

## Load the data

We will first download the IMDB data and load into a Pandas dataframe.

In [None]:
!curl -O https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
!tar -xf aclImdb_v1.tar.gz

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 80.2M  100 80.2M    0     0  26.0M      0  0:00:03  0:00:03 --:--:-- 26.0M


In [None]:

def get_text_list_from_files(files):
    text_list = []
    for name in files:
        with open(name, encoding='UTF-8') as f:
            for line in f:
                text_list.append(line)
                
                if len(text_list) > 256:  break
                    
    return text_list


def get_data_from_text_files(folder_name):

    pos_files = glob.glob("aclImdb/" + folder_name + "/pos/*.txt")
    pos_texts = get_text_list_from_files(pos_files)
    neg_files = glob.glob("aclImdb/" + folder_name + "/neg/*.txt")
    neg_texts = get_text_list_from_files(neg_files)
    df = pd.DataFrame(
        {
            "review": pos_texts + neg_texts,
            "sentiment": [0] * len(pos_texts) + [1] * len(neg_texts),
        }
    )
    df = df.sample(len(df)).reset_index(drop=True)
    return df


train_df = get_data_from_text_files("train")
test_df = get_data_from_text_files("test")

all_data = train_df.append(test_df)
all_data.head()

Unnamed: 0,review,sentiment
0,"When I was very young,on a local tv station,th...",0
1,Romance is in the air and love is in bloom in ...,0
2,"Having loved 'Paris, Je T'aime', I highly anti...",1
3,To call a film about a crippled ghost taking r...,1
4,"Twelve years ago, production stopped on the sl...",0


## Dataset preparation

We will use the `TextVectorization` layer to vectorize the text into integer token ids.
It transforms a batch of strings into either
a sequence of token indices (one sample = 1D array of integer token indices, in order)
or a dense representation (one sample = 1D array of float values encoding an unordered set of tokens).

Below, we define 3 preprocessing functions.

1.  The `get_vectorize_layer` function builds the `TextVectorization` layer.
2.  The `encode` function encodes raw text into integer token ids.
3.  The `get_masked_input_and_labels` function will mask input token ids.
It masks 15% of all input tokens in each sequence at random.

In [None]:

def custom_standardization(input_data):
    lowercase = tf.strings.lower(input_data)  # 소문자화
    stripped_html = tf.strings.regex_replace(lowercase, "<br />", " ")  # html 제거
    # 특수문자 제거
    return tf.strings.regex_replace(stripped_html, "[%s]" % re.escape("!#$%&'()*+,-./:;<=>?@\^_`{|}~"), "")


def get_vectorize_layer(texts, vocab_size, max_seq, special_tokens=["[MASK]"]):
    """Build Text vectorization layer

    Args:
      texts (list): List of string i.e input texts
      vocab_size (int): vocab size
      max_seq (int): Maximum sequence lenght.
      special_tokens (list, optional): List of special tokens. Defaults to ['[MASK]'].

    Returns:
        layers.Layer: Return TextVectorization Keras Layer
    """
    # 텍스트를 벡터화 하는 함수
    # ex) ['i have a dream'] -> [10, 25, 4, 1040]
    vectorize_layer = TextVectorization(
        max_tokens=vocab_size,
        output_mode="int",
        standardize=custom_standardization,  # 따로 정의한 정제함수
        output_sequence_length=max_seq,
    )
    vectorize_layer.adapt(texts)  # texts 적용

    # Insert mask token in vocabulary
    vocab = vectorize_layer.get_vocabulary()
    vocab = vocab[2 : vocab_size - len(special_tokens)] + ["[mask]"]  # [mask] token 추가
    vectorize_layer.set_vocabulary(vocab)
    return vectorize_layer


vectorize_layer = get_vectorize_layer(all_data.review.values.tolist(),
                                      config.VOCAB_SIZE,
                                      config.MAX_LEN,
                                      special_tokens=["[mask]"])

# Get mask token id for masked language model
mask_token_id = vectorize_layer(["[mask]"]).numpy()[0][0]
print('[mask] token id : %s'%mask_token_id)
print(vectorize_layer(['i have a dream']))

[mask] token id : 29999
tf.Tensor(
[[  10   25    4 1040    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0   

In [None]:
# texts -> int로 벡터화
# ['i have a dream'] -> tf.Tensor([[10, 25, 4, 1040, 0]])
def encode(texts):
    encoded_texts = vectorize_layer(texts)
    return encoded_texts.numpy()


def get_masked_input_and_labels(encoded_texts):
    # input(encoded_texts) : array([[10, 25, 4, 1040, 0, ..], [])
    # output 1), masking된 입력
    # encoded_texts_masked = [[    5 29999    10 22697    25]
    #                         [   56    11   277    28 29999]
    #                         [    1  1596     3     2   188]]    
    # output 2), 80%의 masking, random exchange, 원래단어 표시
    # y_labels = [[    5            29999(mask)    10      22697(랜덤교체단어)    25]
    #             [   56(원래단어)    11            277     28                     29999(mask)]
    #             [    1              1596          3       2                     188]]
    # output 3), 계산할 부분만 1로
    # sample_weights = [[0. 1. 0. 1. 0.]
    #                   [1. 0. 0. 0. 1.]
    #                   [0. 0. 0. 0. 0.]]
    
    
    ## 15%는 masking 한다
    # 1. 80%는 단순 masking
    # 2. 10%는 다시 원래단어로 복구
    # 3. 10%는 아무 단어로 변형
    
    # 15% BERT masking
    # encoded_texts : (N, seq_len)
    # np.random.rand(*encoded_texts.shape) : (N, seq_len) 의 0~1로 이루어진 행렬
    # inp_mask : 0.15보다 작으면 True, 크면 False로 이루어진 (N, seq_len) 행렬, array([[False, False, True, False, ..], [])
    # inp_mask : array([[False, False, False, False, False],
    #                   [False,  True, False, False, False],
    #                   [False, False, False, False,  True]])
    inp_mask = np.random.rand(*encoded_texts.shape) < 0.15  
    # Do not mask special tokens
    # inp_mask = [[False, True,  False, True,  False]
    #             [True,  False, False, False, True]
    #             [False, False, False, False, False]]
    inp_mask[encoded_texts <= 2] = False
    # Set targets to -1 by default, it means ignore
    # labels : array([[-1, -1, -1, -1, -1],
    #                 [-1, -1, -1, -1, -1],
    #                 [-1, -1, -1, -1, -1]])
    labels = -1 * np.ones(encoded_texts.shape, dtype=int)
    # Set labels for masked tokens
    # labels : array([[  -1,   259,   -1,   57,   -1],
    #                 [  56,   -1,   -1,   -1,    5],
    #                 [  -1,   -1,   -1,   -1,    -1]])
    
    ## 1. 80%는 단순 masking
    #labels = [[-1  259  -1   57   -1]
    #          [56  -1   -1   -1   5]
    #          [-1  -1   -1   -1   -1]]
    labels[inp_mask] = encoded_texts[inp_mask]

    ## 2. 10%는 다시 원래단어로 복구
    # Prepare input
    # encoded_texts_masked = [[   5  259   10   57   25]
    #                         [  56   11  277   28    5]
    #                         [   1 1596    3    2  188]]
    encoded_texts_masked = np.copy(encoded_texts)
    # Set input to [MASK] which is the last token for the 90% of tokens
    # This means leaving 10% unchanged
    #inp_mask_2mask = [[False  True False  True False]
    #                  [False False False False  True]
    #                  [False False False False False]]
    inp_mask_2mask = inp_mask & (np.random.rand(*encoded_texts.shape) < 0.90)
    # encoded_texts_masked = [[    5 29999    10 29999    25]
    #                          [   56    11   277    28 29999]
    #                          [    1  1596     3     2   188]]
    encoded_texts_masked[inp_mask_2mask] = mask_token_id  # mask token is the last in the dict

    ## 3. 10%는 아무 단어로 변형
    # Set 10% to a random token
    # inp_mask_2random = [[False False False  True False]
    #                     [False False False False False]
    #                     [False False False False False]]
    inp_mask_2random = inp_mask_2mask & (np.random.rand(*encoded_texts.shape) < 1 / 9)
    # encoded_texts_masked = [[    5 29999    10 22697    25]
    #                         [   56    11   277    28 29999]
    #                         [    1  1596     3     2   188]]
    encoded_texts_masked[inp_mask_2random] = np.random.randint(3, mask_token_id, inp_mask_2random.sum())

    # Prepare sample_weights to pass to .fit() method
    # 계산할 부분은 1, 나머지 부분은 0, 15%에 해당하는 부분만 표시, 80%는 mask, 10%는 원래단어, 10%는 랜덤교체단어
    # [[0. 1. 0. 1. 0.]
    #  [1. 0. 0. 0. 1.]
    #  [0. 0. 0. 0. 0.]]
    sample_weights = np.ones(labels.shape)
    sample_weights[labels == -1] = 0

    # y_labels would be same as encoded_texts i.e input tokens
    # [[    5            29999(mask)    10      22697(랜덤교체단어)    25]
    # [   56(원래단어)    11            277     28                     29999(mask)]
    # [    1              1596          3       2                     188]]
    y_labels = np.copy(encoded_texts)
    
    
    return encoded_texts_masked, y_labels, sample_weights

In [None]:
## prepair fine-tuning data
# We have 25000 examples for training
x_train = encode(train_df.review.values)  # encode reviews with vectorizer
print('x_train_before : %s\n'%train_df.review.values[0])
print('x_train_after  : %s\n'%x_train[0])

# 학습데이터의 label(긍정or부정 정보)
y_train = train_df.sentiment.values
train_classifier_ds = (
    tf.data.Dataset.from_tensor_slices((x_train, y_train))
    .shuffle(1000)
    .batch(config.BATCH_SIZE)
)
print('\ny_train plot : %s'%y_train[:5])

# We have 25000 examples for testing
x_test = encode(test_df.review.values)
y_test = test_df.sentiment.values  # 테스트 데이터의 label(긍정or부정 정보)
test_classifier_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(config.BATCH_SIZE)

# Build dataset for end to end model input (will be used at the end)
test_raw_classifier_ds = tf.data.Dataset.from_tensor_slices((test_df.review.values, y_test)).batch(config.BATCH_SIZE)

#######################################################
## Prepare data for masked language model
x_all_review = encode(all_data.review.values)
x_masked_train, y_masked_labels, sample_weights = get_masked_input_and_labels(x_all_review)

mlm_ds = tf.data.Dataset.from_tensor_slices((x_masked_train, y_masked_labels, sample_weights))
mlm_ds = mlm_ds.shuffle(1000).batch(config.BATCH_SIZE)
print('data preparation done')

x_train_before : When I was very young,on a local tv station,they would show kung fu movies of all kinds on Saturdays.I saw lots of Kung Fu movies on weekends.I remember lots of them.I saw great flicks like Crippled Masters,Blind Fist of Bruce,Kung Fu Zombie,Shaolin Drunken Monk,Rage of the Master,Tattoe Dragon,and...Five Deadly Venoms.I remember the day clearly.Me and my dad had just gotten lunch at Burger King.We were racing home to see what movie it would be this saturday.We ran in the house and jumped onto the couch,turned on the set and flicked it onto 56.The usual intro of many kung fu movie clips in the background with the words Kung Fu Saturday over it.Then under that was the Title of the film.It said Five Deadly Venoms.Then the movie began.I bit into my burger amused with the pre-credit sequence.I loved this movie the minute it came on.My favorite character was the Toad Venom.The plot was hard to follow at that age but that wasn't what lured me...it was the fighting.The fights

## Create BERT model (Pretraining Model) for masked language modeling

We will create a BERT-like pretraining model architecture
using the `MultiHeadAttention` layer.
It will take token ids as inputs (including masked tokens)
and it will predict the correct ids for the masked input tokens.

In [None]:

def Transformer_block(query, key, value, i):
    # Multi headed self-attention
    attention_output = layers.MultiHeadAttention(num_heads=config.NUM_HEAD,
                                                 key_dim=config.EMBED_DIM // config.NUM_HEAD,
                                                 name="encoder_{}/multiheadattention".format(i),
                                                 )(query, key, value)
    attention_output = layers.Dropout(0.1, name="encoder_{}/att_dropout".format(i))(
        attention_output)
    attention_output = layers.LayerNormalization(epsilon=1e-6, name="encoder_{}/att_layernormalization".format(i))(
        query + attention_output)

    # Feed-forward layer
    ffn = keras.Sequential([layers.Dense(config.FF_DIM, activation="relu"),
                            layers.Dense(config.EMBED_DIM)],
                            name="encoder_{}/ffn".format(i))
    ffn_output = ffn(attention_output)
    ffn_output = layers.Dropout(0.1, name="encoder_{}/ffn_dropout".format(i))(ffn_output)
    sequence_output = layers.LayerNormalization(epsilon=1e-6, name="encoder_{}/ffn_layernormalization".format(i))(
        attention_output + ffn_output)
    return sequence_output


def get_pos_encoding_matrix(max_len, d_emb):
    pos_enc = np.array(
        [
            [pos / np.power(10000, 2 * (j // 2) / d_emb) for j in range(d_emb)]
            if pos != 0
            else np.zeros(d_emb)
            for pos in range(max_len)
        ]
    )
    pos_enc[1:, 0::2] = np.sin(pos_enc[1:, 0::2])  # dim 2i
    pos_enc[1:, 1::2] = np.cos(pos_enc[1:, 1::2])  # dim 2i+1
    return pos_enc


loss_fn = keras.losses.SparseCategoricalCrossentropy(
    reduction=tf.keras.losses.Reduction.NONE
)
loss_tracker = tf.keras.metrics.Mean(name="loss")

class MaskedLanguageModel(tf.keras.Model):
    def train_step(self, inputs):
        if len(inputs) == 3:
            features, labels, sample_weight = inputs
        else:
            features, labels = inputs
            sample_weight = None

        with tf.GradientTape() as tape:
            predictions = self(features, training=True)
            loss = loss_fn(labels, predictions, sample_weight=sample_weight)

        # Compute gradients
        trainable_vars = self.trainable_variables
        gradients = tape.gradient(loss, trainable_vars)

        # Update weights
        self.optimizer.apply_gradients(zip(gradients, trainable_vars))

        # Compute our own metrics
        loss_tracker.update_state(loss, sample_weight=sample_weight)

        # Return a dict mapping metric names to current value
        return {"loss": loss_tracker.result()}

    @property
    def metrics(self):
        # We list our `Metric` objects here so that `reset_states()` can be
        # called automatically at the start of each epoch
        # or at the start of `evaluate()`.
        # If you don't implement this property, you have to call
        # `reset_states()` yourself at the time of your choosing.
        return [loss_tracker]


def create_masked_language_bert_model():
    inputs = layers.Input((config.MAX_LEN,), dtype=tf.int64)

    word_embeddings = layers.Embedding(config.VOCAB_SIZE, config.EMBED_DIM, name="word_embedding")(
        inputs)
    position_embeddings = layers.Embedding(input_dim=config.MAX_LEN,
                                           output_dim=config.EMBED_DIM,
                                           weights=[get_pos_encoding_matrix(config.MAX_LEN, config.EMBED_DIM)],
                                           name="position_embedding")(
        tf.range(start=0, limit=config.MAX_LEN, delta=1))
    embeddings = word_embeddings + position_embeddings

    encoder_output = embeddings
    for i in range(config.NUM_LAYERS):
        encoder_output = Transformer_block(encoder_output, encoder_output, encoder_output, i)

    mlm_output = layers.Dense(config.VOCAB_SIZE, name="mlm_cls", activation="softmax")(
        encoder_output)
    
    mlm_model = MaskedLanguageModel(inputs, mlm_output, name="masked_bert_model")

    optimizer = keras.optimizers.Adam(learning_rate=config.LR)
    mlm_model.compile(optimizer=optimizer)
    return mlm_model


id2token = dict(enumerate(vectorize_layer.get_vocabulary()))
token2id = {y: x for x, y in id2token.items()}


class MaskedTextGenerator(keras.callbacks.Callback):
    def __init__(self, sample_tokens, top_k=5):
        self.sample_tokens = sample_tokens
        self.k = top_k

    def decode(self, tokens):
        return " ".join([id2token[t] for t in tokens if t != 0])

    def convert_ids_to_tokens(self, id):
        return id2token[id]

    def on_epoch_end(self, epoch, logs=None):
        prediction = self.model.predict(self.sample_tokens)

        masked_index = np.where(self.sample_tokens == mask_token_id)
        masked_index = masked_index[1]
        mask_prediction = prediction[0][masked_index]

        top_indices = mask_prediction[0].argsort()[-self.k :][::-1]
        values = mask_prediction[0][top_indices]

        for i in range(len(top_indices)):
            p = top_indices[i]
            v = values[i]
            tokens = np.copy(sample_tokens[0])
            tokens[masked_index[0]] = p
            result = {
                "input_text": self.decode(sample_tokens[0].numpy()),
                "prediction": self.decode(tokens),
                "probability": v,
                "predicted mask token": self.convert_ids_to_tokens(p),
            }
            pprint(result)


sample_tokens = vectorize_layer(["I have watched this [mask] and it was awesome"])
generator_callback = MaskedTextGenerator(sample_tokens.numpy())

bert_masked_model = create_masked_language_bert_model()
bert_masked_model.summary()

Model: "masked_bert_model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_1 (InputLayer)           [(None, 256)]        0           []                               
                                                                                                  
 word_embedding (Embedding)     (None, 256, 128)     3840000     ['input_1[0][0]']                
                                                                                                  
 tf.__operators__.add (TFOpLamb  (None, 256, 128)    0           ['word_embedding[0][0]']         
 da)                                                                                              
                                                                                                  
 encoder_0/multiheadattention (  (None, 256, 128)    66048       ['tf.__operators_

## Train and Save

In [None]:
%%time
bert_masked_model.fit(mlm_ds, epochs=5, callbacks=[generator_callback])
bert_masked_model.save("bert_mlm_imdb.h5")

Epoch 1/5
 'predicted mask token': 'i',
 'prediction': 'i have watched this i and it was awesome',
 'probability': 0.06641625}
{'input_text': 'i have watched this [mask] and it was awesome',
 'predicted mask token': 'this',
 'prediction': 'i have watched this this and it was awesome',
 'probability': 0.059429348}
{'input_text': 'i have watched this [mask] and it was awesome',
 'predicted mask token': 'a',
 'prediction': 'i have watched this a and it was awesome',
 'probability': 0.033233643}
{'input_text': 'i have watched this [mask] and it was awesome',
 'predicted mask token': 'movie',
 'prediction': 'i have watched this movie and it was awesome',
 'probability': 0.029986538}
{'input_text': 'i have watched this [mask] and it was awesome',
 'predicted mask token': 'it',
 'prediction': 'i have watched this it and it was awesome',
 'probability': 0.022975946}
Epoch 2/5
 'predicted mask token': 'movie',
 'prediction': 'i have watched this movie and it was awesome',
 'probability': 0.3307

## Fine-tune a sentiment classification model

We will fine-tune our self-supervised model on a downstream task of sentiment classification.
To do this, let's create a classifier by adding a pooling layer and a `Dense` layer on top of the
pretrained BERT features.

In [None]:
# Load pretrained bert model
mlm_model = keras.models.load_model("bert_mlm_imdb.h5", custom_objects={"MaskedLanguageModel": MaskedLanguageModel})
pretrained_bert_model = tf.keras.Model(mlm_model.input, mlm_model.get_layer("encoder_0/ffn_layernormalization").output)  # dense 전까지만

# Freeze it
pretrained_bert_model.trainable = False

def create_classifier_bert_model():
    inputs = layers.Input((config.MAX_LEN,), dtype=tf.int64)
    sequence_output = pretrained_bert_model(inputs)
    pooled_output = layers.GlobalMaxPooling1D()(sequence_output)
    hidden_layer = layers.Dense(64, activation="relu")(pooled_output)
    outputs = layers.Dense(1, activation="sigmoid")(hidden_layer)
    classifer_model = keras.Model(inputs, outputs, name="classification")
    optimizer = keras.optimizers.Adam()
    classifer_model.compile(optimizer=optimizer, loss="binary_crossentropy", metrics=["accuracy"])
    return classifer_model

classifer_model = create_classifier_bert_model()
classifer_model.summary()

# Train the classifier with frozen BERT stage
classifer_model.fit(
    train_classifier_ds,
    epochs=5,
    validation_data=test_classifier_ds,
)

# Unfreeze the BERT model for fine-tuning
pretrained_bert_model.trainable = True

optimizer = keras.optimizers.Adam()
classifer_model.compile(optimizer=optimizer, loss="binary_crossentropy", metrics=["accuracy"])
classifer_model.fit(
    train_classifier_ds,
    epochs=5,
    validation_data=test_classifier_ds,
)

Model: "classification"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 256)]             0         
                                                                 
 model (Functional)          (None, 256, 128)          3939584   
                                                                 
 global_max_pooling1d (Globa  (None, 128)              0         
 lMaxPooling1D)                                                  
                                                                 
 dense_2 (Dense)             (None, 64)                8256      
                                                                 
 dense_3 (Dense)             (None, 1)                 65        
                                                                 
Total params: 3,947,905
Trainable params: 8,321
Non-trainable params: 3,939,584
______________________________________

<keras.callbacks.History at 0x7fac2a2be090>

## Create an end-to-end model and evaluate it

When you want to deploy a model, it's best if it already includes its preprocessing
pipeline, so that you don't have to reimplement the preprocessing logic in your
production environment. Let's create an end-to-end model that incorporates
the `TextVectorization` layer, and let's evaluate. Our model will accept raw strings
as input.

In [None]:

def get_end_to_end(model):
    inputs_string = keras.Input(shape=(1,), dtype="string")
    indices = vectorize_layer(inputs_string)
    outputs = model(indices)
    end_to_end_model = keras.Model(inputs_string, outputs, name="end_to_end_model")
    optimizer = keras.optimizers.Adam(learning_rate=config.LR)
    end_to_end_model.compile(
        optimizer=optimizer, loss="binary_crossentropy", metrics=["accuracy"]
    )
    return end_to_end_model


end_to_end_classification_model = get_end_to_end(classifer_model)
end_to_end_classification_model.evaluate(test_raw_classifier_ds)



[0.7073333859443665, 0.8427199721336365]