# BERT Encoder + Classifier Head Prediction

In this final notebook, we add a custom classification neural network head onto the pre-trained BERT encoder and train the model.

In [1]:
# General Imports
import pandas as pd 
from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle
from os.path import join

# NN-related imports
import tensorflow as tf
import tensorflow_hub as hub 
import tensorflow_text as text 

print(tf.test.is_built_with_cuda())
print(tf.config.list_physical_devices('GPU'))

True
[]


In [2]:
bert_preprocess = hub.KerasLayer('https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3')
bert = hub.KerasLayer('https://tfhub.dev/google/experts/bert/wiki_books/sst2/2')

In [3]:
data_dir = "data/"
data = pd.read_csv(join(data_dir, "downsampled_train_50000.csv"))[["Rating", "Review"]]
data["Review"] = data["Review"].apply(str)
display(data)

Unnamed: 0,Rating,Review
0,1,This album is a travesty to the songs of the 5...
1,1,"I found this book a complete waist of time, I ..."
2,1,The product I got had scratches on its surface...
3,1,Ok well it may not be the worst book that I ha...
4,1,It was ok. Could of gotten into the other char...
...,...,...
49995,5,Good little memory stick. Currently using as m...
49996,5,How anyone can write such fun tropical songs a...
49997,5,"Sure, its one of The Great Man's best movies, ..."
49998,5,I finally bought and watched this classic epic...


In [4]:
def build_model():
    text_input = tf.keras.layers.Input(shape=(), dtype=tf.string, name='txt_input')
    bert_input = bert_preprocess(text_input)
    bert_output = bert(bert_input)
    clf_input = bert_output['pooled_output']
    clf = tf.keras.layers.Dropout(0.1)(clf_input)
    clf = tf.keras.layers.Dense(384, activation='sigmoid')(clf)
    clf = tf.keras.layers.Dropout(0.1)(clf)
    clf = tf.keras.layers.Dense(5, activation='sigmoid', name='clf')(clf)
    return tf.keras.Model(text_input, clf)

model = build_model() 

In [5]:
model.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
txt_input (InputLayer)          [(None,)]            0                                            
__________________________________________________________________________________________________
keras_layer (KerasLayer)        {'input_mask': (None 0           txt_input[0][0]                  
__________________________________________________________________________________________________
keras_layer_1 (KerasLayer)      {'sequence_output':  109482241   keras_layer[0][0]                
                                                                 keras_layer[0][1]                
                                                                 keras_layer[0][2]                
______________________________________________________________________________________________

In [6]:
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metrics = [tf.metrics.SparseCategoricalAccuracy()]
optimizer = tf.keras.optimizers.Adam()

model.compile(loss=loss, optimizer=optimizer, metrics=metrics)

In [7]:
train_data = shuffle(data)[:10000]
X = train_data["Review"].to_numpy()
y = train_data["Rating"].to_numpy() - 1
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2)
print(f"X_train: {X_train.shape} | X_val: {X_val.shape} | X_test: {X_test.shape} | \n" +
    f"y_train: {y_train.shape} | y_val: {y_val.shape} | y_test: {y_test.shape} | ")

X_train: (7200,) | X_val: (1800,) | X_test: (1000,) | 
y_train: (7200,) | y_val: (1800,) | y_test: (1000,) | 


In [21]:
EPOCHS = 25
BATCH_SIZE = 64
history = model.fit(X_train, y_train, epochs=EPOCHS, batch_size=BATCH_SIZE, validation_data=(X_val, y_val))

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


In [22]:
tf.saved_model.save(model, "models/bert2")



INFO:tensorflow:Assets written to: models/bert2/assets


INFO:tensorflow:Assets written to: models/bert2/assets


In [12]:
model = tf.keras.models.load_model("models/bert2")













In [13]:
model.evaluate(X_test, y_test)



[1.077552318572998, 0.5339999794960022]

We see this approach achieves a 53% accuracy on the test set, more than doubling the accuracy achieved by th baseline Logistic Regression model. The model's accuracy is still 12% less than the state-of-the-art approach which achieved a 65% accuracy (https://paperswithcode.com/sota/sentiment-analysis-on-amazon-review-full).