## How-to Guide: Using a PIP package for fine-tuning a BERT model

Authors: [Chen Chen](https://github.com/chenGitHuber), [Claire Yao](https://github.com/claireyao-fen)

In this example, we will work through fine-tuning a BERT model using the tensorflow-models PIP package.

## License

Copyright 2020 The TensorFlow Authors. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

## Learning objectives

In this Colab notebook, you will learn how to fine-tune a BERT model using the TensorFlow Model Garden PIP package.

## Enable the GPU acceleration
Please enable GPU for better performance.
*   Navigate to Edit.
*   Find Notebook settings.
*   Select GPU from the "Hardware Accelerator" drop-down list, save it.

##Install and import

### Install the TensorFlow Model Garden pip package

*  tf-models-nightly is the nightly Model Garden package created daily automatically. 
*  pip will install all models and dependencies automatically.

In [0]:
pip install tf-models-nightly

Collecting tf-models-nightly
  Using cached https://files.pythonhosted.org/packages/f5/08/c88a3d54959e037b3a1fd01929b57893f2bac640e3971a16dbd1640b1520/tf_models_nightly-2.2.0.dev20200508-py2.py3-none-any.whl
Collecting mlperf-compliance==0.0.10
  Downloading https://files.pythonhosted.org/packages/f4/08/f2febd8cbd5c9371f7dab311e90400d83238447ba7609b3bf0145b4cb2a2/mlperf_compliance-0.0.10-py3-none-any.whl
Collecting sentencepiece
[?25l  Downloading https://files.pythonhosted.org/packages/98/2c/8df20f3ac6c22ac224fff307ebc102818206c53fc454ecd37d8ac2060df5/sentencepiece-0.1.86-cp36-cp36m-manylinux1_x86_64.whl (1.0MB)
[K     |████████████████████████████████| 1.0MB 6.8MB/s 
Collecting tensorflow-model-optimization>=0.2.1
[?25l  Downloading https://files.pythonhosted.org/packages/09/7e/e94aa029999ec30951e8129fa992fecbbaffda66eba97c65d5a83f8ea96d/tensorflow_model_optimization-0.3.0-py2.py3-none-any.whl (165kB)
[K     |████████████████████████████████| 174kB 23.0MB/s 
[?25hCollecting open

### Import Tensorflow and other libraries

In [0]:
import json
import math

from official.nlp import optimization
from official.nlp.bert import bert_models
from official.nlp.bert import configs as bert_configs
from official.nlp.bert import run_classifier
from official.nlp.bert import tokenization
from official.nlp.data import classifier_data_lib
from official.utils.misc import distribution_utils

import tensorflow as tf

## Get dataset

### Introduction of dataset

The Microsoft Research Paraphrase Corpus (Dolan & Brockett, 2005) is a corpus of sentence pairs automatically extracted from online news sources, with human annotations for whether the sentences in the pair are semantically equivalent.

*   Number of labels: 2.
*   Size of training dataset: 3668.
*   Size of evaluation dataset: 408.
*   Maximum sequence length of training and evaluation dataset: 128.
*   Please refer here for details: https://www.tensorflow.org/datasets/catalog/glue#gluemrpc

### Get dataset from TensorFlow Datasets (TFDS)

For example, we used the GLUE MRPC dataset from TFDS: https://www.tensorflow.org/datasets/catalog/glue#gluemrpc.

### Preprocess the data and write to TensorFlow record file



In [0]:
gs_folder_bert = "gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-12_H-768_A-12"

# Get vocabulary file
vocab_file = gs_folder_bert + "/vocab.txt"

# Set up output of training and evaluation Tensorflow dataset
train_data_output_path="./mrpc_train.tf_record"
eval_data_output_path="./mrpc_eval.tf_record"

# Set up tokenizer to generate Tensorflow dataset
tokenizer = tokenization.FullTokenizer(
    vocab_file=vocab_file, do_lower_case=True)

# Set up processor to generate Tensorflow dataset
processor_text_fn = tokenization.convert_to_unicode
processor = classifier_data_lib.TfdsProcessor(
    tfds_params="dataset=glue/mrpc,text_key=sentence1,text_b_key=sentence2",
    process_text_fn=processor_text_fn)

# Generate and save training data into a tf record file
input_meta_data = classifier_data_lib.generate_tf_record_from_data_file(
    processor,
    None,
    tokenizer,
    train_data_output_path="./mrpc_train.tf_record",
    eval_data_output_path="./mrpc_eval.tf_record",
    max_seq_length=128)

[1mDownloading and preparing dataset glue/mrpc/1.0.0 (download: 1.43 MiB, generated: Unknown size, total: 1.43 MiB) to /root/tensorflow_datasets/glue/mrpc/1.0.0...[0m


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Dl Completed...', max=1.0, style=Progre…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Dl Size...', max=1.0, style=ProgressSty…









HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Shuffling and writing examples to /root/tensorflow_datasets/glue/mrpc/1.0.0.incompleteC1ZQ3K/glue-train.tfrecord


HBox(children=(FloatProgress(value=0.0, max=3668.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Shuffling and writing examples to /root/tensorflow_datasets/glue/mrpc/1.0.0.incompleteC1ZQ3K/glue-validation.tfrecord


HBox(children=(FloatProgress(value=0.0, max=408.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Shuffling and writing examples to /root/tensorflow_datasets/glue/mrpc/1.0.0.incompleteC1ZQ3K/glue-test.tfrecord


HBox(children=(FloatProgress(value=0.0, max=1725.0), HTML(value='')))

[1mDataset glue downloaded and prepared to /root/tensorflow_datasets/glue/mrpc/1.0.0. Subsequent calls will reuse this data.[0m


### Get Tensorflow dataset



In [0]:
# Get dataset information from meta data
max_seq_length = input_meta_data['max_seq_length']
num_classes = input_meta_data['num_labels']

# Set up batch sizes
batch_size = 32
eval_batch_size = 32

# Return Tensorflow dataset
train_input_fn = run_classifier.get_dataset_fn(train_data_output_path, max_seq_length, batch_size, is_training=True)
eval_input_fn = run_classifier.get_dataset_fn(eval_data_output_path, max_seq_length, eval_batch_size, is_training=False)
training_dataset = train_input_fn()
evaluation_dataset = eval_input_fn()

## Create, compile and train the model

### Construct a Bert Model

Here, a Bert Model is constructed from the json file with parameters. The bert_config defines the core Bert Model, which is a Keras model to predict the outputs of *num_classes* from the inputs with maximum sequence length *max_seq_length*. 

In [0]:
bert_config_file = gs_folder_bert + "/bert_config.json"
bert_config = bert_configs.BertConfig.from_json_file(bert_config_file)
classifier_model, encoder = bert_models.classifier_model(
    bert_config, num_classes, max_seq_length)

### Set up an optimizer for the model

In [0]:
# Set up epochs and steps
epochs = 3
train_data_size = input_meta_data['train_data_size']
steps_per_epoch = int(train_data_size / batch_size)
num_train_steps = steps_per_epoch * epochs
warmup_steps = int(epochs * train_data_size * 0.1 / batch_size)

# Set up evaluation batch size and steps
eval_batch_size = 32
eval_data_size = input_meta_data['eval_data_size']
eval_steps = int(eval_data_size / eval_batch_size)

# creates an optimizer with learning rate schedule
optimizer = optimization.create_optimizer(
    2e-5, num_train_steps=num_train_steps, num_warmup_steps=warmup_steps)

### Compile and train the model

In [0]:
# Function: calculates how often predictions matches integer labels.
def metric_fn():
  return tf.keras.metrics.SparseCategoricalAccuracy(
      'test_accuracy', dtype=tf.float32)

# Compile and train the model
classifier_model.compile(optimizer=optimizer,
                          loss=run_classifier.get_loss_fn(num_classes=2),
                          metrics=[metric_fn()])

classifier_model.fit(
      x=training_dataset,
      validation_data=evaluation_dataset,
      steps_per_epoch=steps_per_epoch,
      epochs=epochs,
      validation_steps=int(eval_data_size / eval_batch_size))

Epoch 1/3




Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x7febe2eed128>

### Save the model

In [0]:
classifier_model.save('/tmp/saved_model', include_optimizer=False, save_format='tf')

Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.


Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.


INFO:tensorflow:Assets written to: /tmp/saved_model/assets


INFO:tensorflow:Assets written to: /tmp/saved_model/assets


## Use the trained model


In [0]:
# Set up distribution strategy
strategy = distribution_utils.get_distribution_strategy(
      distribution_strategy='one_device', num_gpus=1)

# Get predictiona and labels for evaluation dataset
eval_predictions, eval_labels = run_classifier.get_predictions_and_labels(strategy, classifier_model, eval_input_fn,
                               eval_steps)
print(eval_predictions)
print(eval_labels)

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 