# Finetuning PyTorch BERT with NGC
The BERT family of models are a powerful set of natural language understanding models based on the transformer architecture from the paper Attention Is All You Need, which you can find here:  https://arxiv.org/abs/1706.03762

These models work by running unsupervised pre-training on massive sets of text data. This process requires an enormous amount of time and compute. Luckily for us, BERT models are built for transfer learning. BERT models are able to be finetuned to perform many different NLU tasks like question answering, sentiment analysis, document summarization, and more.

For this tutorial, we are going to download a BERT base model and finetune this model on the Stanford Question Answering Dataset and walk through the steps necessary to deploy it to a Sagemaker endpoint.

In [1]:
!wget https://api.ngc.nvidia.com/v2/models/nvidia/bert_base_pyt_amp_ckpt_pretraining_lamb/versions/1/files/bert_base.pt -O bert_base.pt

--2020-05-08 20:51:49--  https://api.ngc.nvidia.com/v2/models/nvidia/bert_base_pyt_amp_ckpt_pretraining_lamb/versions/1/files/bert_base.pt
Resolving api.ngc.nvidia.com (api.ngc.nvidia.com)... 52.38.124.212, 54.186.237.130
Connecting to api.ngc.nvidia.com (api.ngc.nvidia.com)|52.38.124.212|:443... connected.
HTTP request sent, awaiting response... 302 
Location: https://s3.us-west-2.amazonaws.com/prod-model-registry-ngc-bucket/org/nvidia/models/bert_base_pyt_amp_ckpt_pretraining_lamb/versions/1/files/bert_base.pt?response-content-disposition=attachment%3B%20filename%3D%22bert_base.pt%22&response-content-type=application%2Foctet-stream&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEKT%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLXdlc3QtMiJGMEQCIDTOcIMEjEOZCz8fE8A9VS52BuQxLfxHV5HjD2XMImILAiBBGPC5D%2FBlfMhB%2Bpe4bYj7mYlpOnoeNb4%2FIHSuAQ7Gpiq9Awjd%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F8BEAIaDDc4OTM2MzEzNTAyNyIMsVQ6WZTk70s2icTCKpED0yPq8kHJUawmVePhsCIr4MUulWsp7dMgEPEvCZp1Ca3BirYXCGkMUo4E82%2Bxf5uMsGdEK2fvxB2rlw5vERMNe

In [2]:
import collections
import math
import torch
import os, tarfile, json
import time, datetime
from io import StringIO
import numpy as np
import sagemaker
from sagemaker.pytorch import estimator, PyTorchModel, PyTorchPredictor, PyTorch
from sagemaker.utils import name_from_base
import boto3
from file_utils import PYTORCH_PRETRAINED_BERT_CACHE
from modeling import BertForQuestionAnswering, BertConfig, WEIGHTS_NAME, CONFIG_NAME
from tokenization import (BasicTokenizer, BertTokenizer, whitespace_tokenize)
from types import SimpleNamespace
from helper_funcs import *

sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()
bucket = sagemaker_session.default_bucket() # can replace with your own S3 bucket 'privisaa-bucket-virginia' # 
prefix = 'bert_pytorch_ngc'
runtime_client = boto3.client('runtime.sagemaker')

with open('s3_bucket.txt','w') as f:
    f.write(f's3://{bucket}')

Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.


## Create our training docker container

Now we are going to create a custom docker container based on the NGC Bert container and push it to AWS Elastic Container Registry (ECR)

In [6]:
%%sh

# The name of our algorithm
algorithm_name=bert-ngc-torch-train

chmod +x train
chmod +x serve

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-east-1}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.

aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build  -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

# some kind of security auth issue with pushing this to ecr, not authorized to perform ecr:InitiateLayerUpload
docker push ${fullname}

Login Succeeded
Sending build context to Docker daemon  1.857GB
Step 1/15 : ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:19.12-py3
Step 2/15 : FROM ${FROM_IMAGE_NAME}
19.12-py3: Pulling from nvidia/pytorch
7ddbc47eeb70: Pulling fs layer
c1bbdc448b72: Pulling fs layer
8c3b70e39044: Pulling fs layer
45d437916d57: Pulling fs layer
9e8447766f7f: Pulling fs layer
fd3f34199730: Pulling fs layer
53722e4d5abe: Pulling fs layer
253e1d42cb67: Pulling fs layer
70bb07a2696f: Pulling fs layer
464656b14c58: Pulling fs layer
13f754fa3551: Pulling fs layer
07c9a9ec72b3: Pulling fs layer
451748edae8b: Pulling fs layer
cc17258a960f: Pulling fs layer
9e8447766f7f: Waiting
127c00c0fee1: Pulling fs layer
8633f42ac180: Pulling fs layer
fd3f34199730: Waiting
03e32fb8ed26: Pulling fs layer
45d437916d57: Waiting
b2d87cc0b3fc: Pulling fs layer
253e1d42cb67: Waiting
37e67971b9ea: Pulling fs layer
3b05d97480c9: Pulling fs layer
71b6b83de43f: Pulling fs layer
70bb07a2696f: Waiting
464656b14c58: Waiting
db4bc151599

https://docs.docker.com/engine/reference/commandline/login/#credentials-store



## Instantiate the model

Now we are going to instantiate our model, here we are going to specify our hyperparameters for training as well as the number of GPUs we are going to use. The ml.p3.16xlarge instances contain 8 V100 volta GPUs, making them ideal for heavy duty deep learning training. 

Once we have set our hyperparameters, we will instantiate a Sagemaker Estimator that we will use to run our training job. We specify the Docker image we just pushed to ECR as well as an entrypoint giving instructions for what operations our container should perform when it starts up.

In [17]:
account=!aws sts get-caller-identity --query Account --output text

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=!aws configure get region

algoname = 'bert-ngc-torch-train'

fullname="{}.dkr.ecr.{}.amazonaws.com/{}".format(account[0], region[0], algoname)

fullname

'497456752804.dkr.ecr.us-east-1.amazonaws.com/bert-ngc-torch-train'

In [18]:
# set our hyperparameters
hyperparameters = {'bert_model': 'bert-base-uncased',  'num_train_epochs': 1, 
                   'vocab_file': '/workspace/bert/data/bert_vocab.txt',
                   'config_file':'/workspace/bert/bert_config.json', 
                  'output_dir': 'opt/ml/model',
                  'train_file': '/workspace/bert/data/squad/v1.1/train-v1.1.json',
                  'num_gpus':8, 'train_batch_size':7, 'max_seq_length':512, 'doc_stride':128, 'seed':1,
                  'learning_rate':3e-5,
                  'save_to_s3':bucket}

# instantiate model
torch_model = PyTorch( role=role,
                      train_instance_count=2,
                      train_instance_type='ml.p3.16xlarge',
                      entry_point='transform_script.py',
                      image_name=fullname,
                      framework_version='1.4.0',
                      hyperparameters=hyperparameters
                     )


## Fine-tune the model

If you use an instance with 4 GPUs and a batch size of 3 this process will take ~15 minutes to complete for this particular finetuning task with 2 epochs. Each additional epoch will add another 7 or so minutes. It's recommended to at minimum use a training instance with 4 GPUs, although you will likely get better performance with one of the ml.p3.16xlarge or ml.p3dn.24xlarge instances. 

In [None]:
torch_model.fit()

2020-05-08 21:31:51 Starting - Starting the training job...
2020-05-08 21:31:52 Starting - Launching requested ML instances.........
2020-05-08 21:33:30 Starting - Preparing the instances for training......
2020-05-08 21:34:45 Downloading - Downloading input data
2020-05-08 21:34:45 Training - Downloading the training image......................

## Deploy our trained model

In [None]:
endpoint_name = 'bert-endpoint-byoc'
#model_data = f's3://{bucket}/model.tar.gz'
bert_end = torch_model.deploy(instance_type='ml.g4dn.4xlarge', initial_instance_count=1, 
                      endpoint_name=endpoint_name)
# try:
#     bert_end = torch_model.deploy(instance_type='ml.g4dn.12xlarge', initial_instance_count=1, 
#                           endpoint_name=endpoint_name)
# except:
    #print('deploy command failed, using backup method')
#     torch_model = PyTorchModel(model_data=model_data,
#                            role=role,
#                           entry_point='transform_script.py',
#                           framework_version='1.4.0')
#     bert_end = torch_model.deploy(instance_type='ml.g4dn.8xlarge', initial_instance_count=1, 
#                               endpoint_name=endpoint_name)

In [None]:
vocab_file='vocab'
tokenizer = BertTokenizer(vocab_file, do_lower_case=True, max_len=512)
max_seq_length, max_query_length, n_best_size, max_answer_length, null_score_diff_threshold= 384, 64, 1, 30, -11.0
do_lower_case, can_give_negative_answer=True, True


In [None]:
t = time.time()
n_best_size=3
context='Danielle is a girl who really loves her cat, Steve. Steve is a large cat with a very furry belly. He gets very excited by the prospect of eating chicken covered in gravy.'
question='who loves Steve?'  # 'What kind of food does Steve like?'
doc_tokens = context.split()
query_tokens = tokenizer.tokenize(question)
feature = preprocess_tokenized_text(doc_tokens, 
                                    query_tokens, 
                                    tokenizer, 
                                    max_seq_length=max_seq_length, 
                                    max_query_length=max_query_length)
tensors_for_inference, tokens_for_postprocessing = feature

input_ids = np.array(tensors_for_inference.input_ids, dtype=np.int64)
segment_ids = np.array(tensors_for_inference.segment_ids, dtype=np.int64)
input_mask = np.array(tensors_for_inference.input_mask, dtype=np.int64)   

payload = np.concatenate([np.expand_dims(input_ids, axis=0), np.expand_dims(segment_ids, axis=0), np.expand_dims(input_mask, axis=0)])
#response = bert_end.predict(payload.tobytes(), initial_args={'ContentType':'application/x-npy'}) 
response = runtime_client.invoke_endpoint(EndpointName='bert-endpoint-byoc-test',
                                           ContentType='application/x-npy',
                                           Body=payload.tobytes())
answer = get_predictions(doc_tokens, tokens_for_postprocessing, 
                         response[0], response[1], n_best_size, 
                         max_answer_length, do_lower_case, 
                         can_give_negative_answer, 
                         null_score_diff_threshold)

# print result
print(f'{question} : {answer[0]["text"]}')
print(f'inference took: {round(time.time()-t,4)} seconds')

In [None]:
pass_in_data = {'context':context, 'question':question}
response = bert_end.predict(json.dump(pass_in_data), initial_args={'ContentType':'application/json'}) 


In [None]:
with open('s3_bucket.txt','r') as f:
    bucket = f.read()

In [None]:
bucket.split('/')[2]

In [None]:
!rm bert_base.pt