# Finetuning PyTorch BERT with NGC
The BERT family of models are a powerful set of natural language understanding models based on the transformer architecture from the paper Attention Is All You Need, which you can find here:  https://arxiv.org/abs/1706.03762

These models work by running unsupervised pre-training on massive sets of text data. This process requires an enormous amount of time and compute. Luckily for us, BERT models are built for transfer learning. BERT models are able to be finetuned to perform many different NLU tasks like question answering, sentiment analysis, document summarization, and more.

For this tutorial, we are going to download a BERT base model and finetune this model on the Stanford Question Answering Dataset and walk through the steps necessary to deploy it to a Sagemaker endpoint.

In [1]:
!wget https://api.ngc.nvidia.com/v2/models/nvidia/bert_base_pyt_amp_ckpt_pretraining_lamb/versions/1/files/bert_base.pt -O bert_base.pt

--2020-05-09 23:47:18--  https://api.ngc.nvidia.com/v2/models/nvidia/bert_base_pyt_amp_ckpt_pretraining_lamb/versions/1/files/bert_base.pt
Resolving api.ngc.nvidia.com (api.ngc.nvidia.com)... 52.41.113.93, 54.186.237.130
Connecting to api.ngc.nvidia.com (api.ngc.nvidia.com)|52.41.113.93|:443... connected.
HTTP request sent, awaiting response... 302 
Location: https://s3.us-west-2.amazonaws.com/prod-model-registry-ngc-bucket/org/nvidia/models/bert_base_pyt_amp_ckpt_pretraining_lamb/versions/1/files/bert_base.pt?response-content-disposition=attachment%3B%20filename%3D%22bert_base.pt%22&response-content-type=application%2Foctet-stream&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEL%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLXdlc3QtMiJIMEYCIQCG4%2BwX%2FrYL79xae8aIaKSzr1SgnP1z0fy1Y%2FrqhhDBwQIhAK4MX%2F%2Bfj%2FgAJ5Tvd716PXGGc5M6ou%2BLt6V4G64n8EqIKr0DCPj%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEQAhoMNzg5MzYzMTM1MDI3Igx%2Bg99AV9kYyfph4GcqkQNYI7s9yTFMoHE7aFiRCNgEHB8fvCWWzeL0Tt5djm%2FjBF9YzvInQqUmz9XuVvg1bMx4LqI5em4

In [2]:
import collections
import math
import torch
import os, tarfile, json
import time, datetime
from io import StringIO
import numpy as np
import sagemaker
from sagemaker.pytorch import estimator, PyTorchModel, PyTorchPredictor, PyTorch
from sagemaker.utils import name_from_base
import boto3
from file_utils import PYTORCH_PRETRAINED_BERT_CACHE
from modeling import BertForQuestionAnswering, BertConfig, WEIGHTS_NAME, CONFIG_NAME
from tokenization import (BasicTokenizer, BertTokenizer, whitespace_tokenize)
from types import SimpleNamespace
from helper_funcs import *

sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()
bucket = sagemaker_session.default_bucket() # can replace with your own S3 bucket 'privisaa-bucket-virginia' # 
prefix = 'bert_pytorch_ngc'
runtime_client = boto3.client('runtime.sagemaker')

with open('s3_bucket.txt','w') as f:
    f.write(f's3://{bucket}')

Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.


## Create our training docker container

Now we are going to create a custom docker container based on the NGC Bert container and push it to AWS Elastic Container Registry (ECR)

In [3]:
%%sh

# The name of our algorithm
algorithm_name=bert-ngc-torch-train

chmod +x train
chmod +x serve

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-east-1}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.

aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build  -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

# some kind of security auth issue with pushing this to ecr, not authorized to perform ecr:InitiateLayerUpload
docker push ${fullname}

Login Succeeded
Sending build context to Docker daemon  1.863GB
Step 1/15 : ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:19.12-py3
Step 2/15 : FROM ${FROM_IMAGE_NAME}
 ---> be021446e08c
Step 3/15 : RUN apt-get update && apt-get install -y pbzip2 pv bzip2 cabextract nginx wget
 ---> Using cache
 ---> 5eba79d37532
Step 4/15 : ENV BERT_PREP_WORKING_DIR /workspace/bert/data
 ---> Using cache
 ---> bc8f375dc9e8
Step 5/15 : WORKDIR /workspace
 ---> Using cache
 ---> 1b91f0743faf
Step 6/15 : RUN git clone https://github.com/attardi/wikiextractor.git
 ---> Using cache
 ---> 1d8e4cb1a7f9
Step 7/15 : RUN git clone https://github.com/soskek/bookcorpus.git
 ---> Using cache
 ---> fd2c36c84f8e
Step 8/15 : WORKDIR /workspace/bert
 ---> Using cache
 ---> dda7e7b51295
Step 9/15 : RUN pip install --upgrade --no-cache-dir pip  && pip install --no-cache-dir  gevent flask pathlib gunicorn tqdm boto3 requests six ipdb h5py html2text nltk progressbar onnxruntime git+https://github.com/NVIDIA/dllogger
 ---> Us

https://docs.docker.com/engine/reference/commandline/login/#credentials-store



## Instantiate the model

Now we are going to instantiate our model, here we are going to specify our hyperparameters for training as well as the number of GPUs we are going to use. The ml.p3.16xlarge instances contain 8 V100 volta GPUs, making them ideal for heavy duty deep learning training. 

Once we have set our hyperparameters, we will instantiate a Sagemaker Estimator that we will use to run our training job. We specify the Docker image we just pushed to ECR as well as an entrypoint giving instructions for what operations our container should perform when it starts up. Our Docker container has two commands, train and serve. When we instantiate a training job, behind the scenes Sagemaker is running our Docker container and telling it to run the train command.

In [None]:
account=!aws sts get-caller-identity --query Account --output text

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=!aws configure get region

algoname = 'bert-ngc-torch-train'

fullname="{}.dkr.ecr.{}.amazonaws.com/{}".format(account[0], region[0], algoname)

fullname

In [5]:
# set our hyperparameters
hyperparameters = {'bert_model': 'bert-base-uncased',  'num_train_epochs': 1, 
                   'vocab_file': '/workspace/bert/data/bert_vocab.txt',
                   'config_file':'/workspace/bert/bert_config.json', 
                  'output_dir': 'opt/ml/model',
                  'train_file': '/workspace/bert/data/squad/v1.1/train-v1.1.json',
                  'num_gpus':8, 'train_batch_size':7, 'max_seq_length':512, 'doc_stride':128, 'seed':1,
                  'learning_rate':3e-5,
                  'save_to_s3':bucket}

# instantiate model
torch_model = PyTorch( role=role,
                      train_instance_count=2,
                      train_instance_type='ml.p3.16xlarge',
                      entry_point='transform_script.py',
                      image_name=fullname,
                      framework_version='1.4.0',
                      hyperparameters=hyperparameters
                     )


## Fine-tune the model

If you use an instance with 4 GPUs and a batch size of 3 this process will take ~15 minutes to complete for this particular finetuning task with 2 epochs. Each additional epoch will add another 7 or so minutes. It's recommended to at minimum use a training instance with 4 GPUs, although you will likely get better performance with one of the ml.p3.16xlarge or ml.p3dn.24xlarge instances. 

In [6]:
torch_model.fit()

2020-05-09 23:59:18 Starting - Starting the training job...
2020-05-09 23:59:22 Starting - Launching requested ML instances.........
2020-05-10 00:00:56 Starting - Preparing the instances for training......
2020-05-10 00:02:17 Downloading - Downloading input data...
2020-05-10 00:02:29 Training - Downloading the training image.....................
[35m== PyTorch ==[0m
[0m
[35mNVIDIA Release 19.12 (build 9142930)[0m
[35mPyTorch Version 1.4.0a0+a5b4d78
[0m
[35mContainer image Copyright (c) 2019, NVIDIA CORPORATION.  All rights reserved.
[0m
[35mCopyright (c) 2014-2019 Facebook Inc.[0m
[35mCopyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)[0m
[35mCopyright (c) 2012-2014 Deepmind Technologies    (Koray Kavukcuoglu)[0m
[35mCopyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)[0m
[35mCopyright (c) 2011-2013 NYU                      (Clement Farabet)[0m
[35mCopyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain M

## Deploy our trained model

Now that we've finetuned our base BERT model, what now? Our Docker image we created previously has code in it to set up a flask app when the serve command is called. Let's deploy our trained model to an endpoint and ask it some questions!

In [7]:
endpoint_name = 'bert-endpoint-byoc'
bert_end = torch_model.deploy(instance_type='ml.g4dn.4xlarge', initial_instance_count=1, 
                      endpoint_name=endpoint_name)
# try:
#     bert_end = torch_model.deploy(instance_type='ml.g4dn.4xlarge', initial_instance_count=1, 
#                           endpoint_name=endpoint_name)
# except:
    #print('deploy command failed, using backup method')
    #model_data = f's3://{bucket}/model.tar.gz'

#     torch_model = PyTorchModel(model_data=model_data,
#                            role=role,
#                           entry_point='transform_script.py',
#                           framework_version='1.4.0')
#     bert_end = torch_model.deploy(instance_type='ml.g4dn.4xlarge', initial_instance_count=1, 
#                               endpoint_name=endpoint_name)

-------------------!

Now that our endpoint has been deployed, let's send it some requests! 

In [15]:
# pass_in_data = {'context':context, 'question':question}
# response = bert_end.predict(json.dumps(pass_in_data), initial_args={'ContentType':'application/json'}) 


In [20]:
t = time.time()

context='Danielle is a girl who really loves her cat, Steve. Steve is a large cat with a very furry belly. He gets very excited by the prospect of eating chicken covered in gravy.'
question='who loves Steve?'  # 'What kind of food does Steve like?'

pass_in_data = {'context':context, 'question':question}
json_data = json.dumps(pass_in_data)

response = runtime_client.invoke_endpoint(EndpointName='bert-endpoint-byoc',
                                           ContentType='application/json',
                                           Body=json_data)
response = eval(response['Body'].read().decode('utf-8'))

# print result
print(f'{question} : {response[0]["text"]}')
print(f'inference took: {round(time.time()-t,4)} seconds')

who loves Steve? : Danielle
inference took: 2.5006 seconds


In [6]:
!rm bert_base.pt
!rm s3_bucket.txt
bert_end.delete_endpoint()

rm: cannot remove ‘bert_base.pt’: No such file or directory


NameError: name 'bert_end' is not defined