# Optimize models using Automatic Model Tuning

### Introduction

When training ML models, hyperparameter tuning is a step to get best quality out of the training model. In this lab you will apply a random algorithm of the Automated Hyperparameter Tuning to train a BERT-based natural language processing (NLP) classifier. The model is analyzing customer feedback and classifying the messages into positive (1), neutral (0) and negative (-1) sentiment.

### Table of Contents

- [1. Configure dataset and Hyperparameter Tuning Job (HTP)](#c3w1-1.)
  - [1.1. Configure dataset](#c3w1-1.1.)
  - [1.2. Configure Hyperparameter Tuning Job](#c3w1-1.2.)
  - [1.3. Setup evaluation metrics](#c3w1-1.3.)
- [2. Run tuning job](#c3w1-2.)
  - [2.1. Setup the RoBERTa and PyTorch script to run on SageMaker](#c3w1-2.1.)
  - [2.2. Launch the Hyperparameter Tuning Job](#c3w1-2.2.)
  - [2.3. Check Tuning Job status](#c3w1-2.3.)
- [3. Evaluate the results](#c3w1-3.)
  - [3.1. Show the best candidate](#c3w1-3.1.)
  - [3.2. Evaluate the best candidate](#c3w1-3.2.)
  - [3.3. Inspect the processed output data](#c3w1-3.3.)

Amazon SageMaker supports Automated Hyperparameter Tuning. It runs multiple training jobs on the training dataset using the hyperparameter ranges specified by user. Then it chooses the combination of the hyperparameters that leads to the best model candidate. The choice is made based on the objective metrics, e.g. maximization of the validation accuracy. 

For the choice of hyperparameters combinations, SageMaker supports two different types of tuning strategies: random and bayesian. This capability can be further extended by provising an implementation of a custom tuning strategy as a docker container.

<img src="images/hpt.png" width="70%" align="center"> 

In this lab you will perform the following three steps:

<img src="images/sagemaker_hpt.png" width="50%" align="center"> 

First, let's install and import required modules.

In [None]:
# please ignore warning messages during the installation
!pip install --disable-pip-version-check -q sagemaker==2.35.0
!conda install -q -y pytorch==1.6.0 -c pytorch
!pip install --disable-pip-version-check -q transformers==3.5.1

In [None]:
import boto3
import sagemaker
import pandas as pd

sess   = sagemaker.Session()
bucket = sess.default_bucket()
role = sagemaker.get_execution_role()
region = boto3.Session().region_name

# low-level service client of the boto3 session
sm = boto3.Session().client(service_name='sagemaker', region_name=region)

<a name='c3w1-1.'></a>
# 1. Configure dataset and Hyperparameter Tuning Job (HTP)

<a name='c3w1-1.1.'></a>
### 1.1. Configure dataset

Let's copy data to S3 bucket. Setup the paths:

In [None]:
processed_train_data_s3_uri = 's3://{}/transformed/data/sentiment-train/'.format(bucket)
processed_validation_data_s3_uri = 's3://{}/transformed/data/sentiment-validation/'.format(bucket)
processed_test_data_s3_uri = 's3://{}/transformed/data/sentiment-test/'.format(bucket)

Upload the data to S3 bucket:

In [None]:
!aws s3 cp --recursive ./data/sentiment-train $processed_train_data_s3_uri
!aws s3 cp --recursive ./data/sentiment-validation $processed_validation_data_s3_uri
!aws s3 cp --recursive ./data/sentiment-test $processed_test_data_s3_uri

Check the existence of those files in the S3 bucket:

In [None]:
!aws s3 ls --recursive $processed_train_data_s3_uri

In [None]:
!aws s3 ls --recursive $processed_validation_data_s3_uri

In [None]:
!aws s3 ls --recursive $processed_test_data_s3_uri

You will need to setup the input data channels, wrapping the S3 locations in a `TrainingInput` object to use with the SageMaker Tuning Job. This can be organized as a dictionary

```python
data_channels = {
    'train': ..., # training data
    'validation': ... # validation data
}
```

where training and validation data are the Amazon SageMaker channels for S3 input data sources.

<a name='c3w1-ex-1'></a>
### Exercise 1

Create a train data channel.

**Instructions**: Pass the S3 input path for training data into the `sagemaker.inputs.TrainingInput` function.

In [None]:
s3_input_train_data = sagemaker.inputs.TrainingInput(
    ### BEGIN SOLUTION - DO NOT delete this comment for grading purposes
    s3_data=None # Replace None
    ### END SOLUTION - DO NOT delete this comment for grading purposes
)

<a name='c3w1-ex-2'></a>
### Exercise 2

Create a validation data channel.

**Instructions**: Pass the S3 input path for validation data into the `sagemaker.inputs.TrainingInput` function.

In [None]:
s3_input_validation_data = sagemaker.inputs.TrainingInput(
    ### BEGIN SOLUTION - DO NOT delete this comment for grading purposes
    s3_data=None # Replace None
    ### END SOLUTION - DO NOT delete this comment for grading purposes
)

<a name='c3w1-ex-3'></a>
### Exercise 3

Organize the defined above train and validation data channels as a dictionary.

In [None]:
data_channels = {
    ### BEGIN SOLUTION - DO NOT delete this comment for grading purposes
    'train': None, # Replace None
    'validation': None # Replace None
    ### END SOLUTION - DO NOT delete this comment for grading purposes
}

In [None]:
s3_input_test_data = TrainingInput(s3_data=processed_test_data_s3_uri)

<a name='c3w1-1.2.'></a>
### 1.2. Configure Hyperparameter Tuning Job

Model hyperparameters need to be set prior to starting the model training as they control the process of learning. Some of the hyperparameters you will set up as static - they will not be explored during the tuning job. For the non-static hyperparametrs you will set the range of possible values to be explored.

First, configure static hyperparameters including the instance type, instance count, maximum sequence length etc.

In [None]:
max_seq_length=128 # maximum number of input tokens passed to BERT model
freeze_bert_layer=False # specifies the depth of training within the network
epochs=3
train_steps_per_epoch=50
validation_batch_size=64
validation_steps_per_epoch=50
seed=42

train_instance_count=1
train_instance_type='ml.c5.9xlarge'
train_volume_size=256
input_mode='File'
run_validation=True

Some of them will be passed into the PyTorch estimator and tuner in the hyperparameters argument. Let's setup the dictionary for that:

In [None]:
hyperparameters={
    'freeze_bert_layer': freeze_bert_layer,
    'max_seq_length': max_seq_length,
    'epochs': epochs,
    'train_steps_per_epoch': train_steps_per_epoch,
    'validation_batch_size': validation_batch_size,
    'validation_steps_per_epoch': validation_steps_per_epoch,
    'seed': seed,
    'run_validation': run_validation
}

Configure hyperparameter ranges to explore in the Tuning Job.

In [None]:
from sagemaker.tuner import IntegerParameter
from sagemaker.tuner import ContinuousParameter
from sagemaker.tuner import CategoricalParameter
                                                
hyperparameter_ranges = {
    'learning_rate': ContinuousParameter(0.00001, 0.00005, scaling_type='Linear'), # specifying continuous variable type, the tuning job will explore the range of values
    'train_batch_size': CategoricalParameter([128, 256]), # specifying categorical variable type, the tuning job will explore only listed values
}

<a name='c3w1-1.3.'></a>
### 1.3. Setup evaluation metrics

Choose loss and accuracy as the evaluation metrics. The regular expressions `Regex` will capture the values of metrics that the algorithm will emit.

In [None]:
metric_definitions = [
     {'Name': 'validation:loss', 'Regex': 'val_loss: ([0-9\\.]+)'},
     {'Name': 'validation:accuracy', 'Regex': 'val_acc: ([0-9\\.]+)'},
]

For example, these sample log lines...
```
[step: 100] val_loss: 0.76 - val_acc: 70.92%
```

...will produce the following metrics in CloudWatch:

`validation:loss` =  0.76

`validation:accuracy` = 70.92

<img src="images/cloudwatch_validation_metrics.png" align="left">

In the Tuning Job you will be maximizing validation accuracy as the objective metrics.

<a name='c3w1-2.'></a>
# 2. Run Tuning Job

<a name='c3w1-2.1.'></a>
### 2.1. Setup the RoBERTa and PyTorch script to run on SageMaker

Prepare the PyTorch model to run as a SageMaker Training Job. The estimator takes into the entry point a separate Python file, which will be called during the training. You can open and review this file [src/train.py](src/train.py).

For more information on the `PyTorchEstimator`, see the documentation here: https://sagemaker.readthedocs.io/

In [None]:
from sagemaker.pytorch import PyTorch as PyTorchEstimator

estimator = PyTorchEstimator(
    entry_point='train.py',
    source_dir='src',
    role=role,
    instance_count=train_instance_count,
    instance_type=train_instance_type,
    volume_size=train_volume_size,
    py_version='py3',
    framework_version='1.6.0',
    hyperparameters=hyperparameters,
    metric_definitions=metric_definitions,
    input_mode=input_mode,
)

<a name='c3w1-2.2.'></a>
### 2.2. Launch the Hyperparameter Tuning Job

A hyperparameter tuning job runs a series of training jobs that each test a combination of hyperparameters for a given objective metric (i.e. `train:accuracy`). In this lab, you will use a `Random` search strategy to determine the combinations of hyperparameters - within the specific ranges - to use for each training job within the tuning job.  For more information on hyperparameter tuning search strategies, please see the following documentation:  https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-how-it-works.html

When the tuning job completes, we can select the hyperparameters used by the best-performing training job relative to the objective metric (i.e. `train:accuracy`). 

The `max_jobs` parameter is a stop criteria that limits the number of overall training jobs (and therefore hyperparameter combinations) to run within the tuning job.

The `max_parallel_jobs` parameter limits the number of training jobs (and therefore hyperparameter combinations) to run in parallel within the tuning job.  This parameter is often used in combination with the `Bayesian` search strategy when we want to test a smaller set of training jobs (less than the `max_jobs`), learn from the smaller set of training jobs, then apply Bayesian methods to determine the next set of hyperparameters used by the next set of training jobs.  Bayesian methods can improve hyperparameter-tuning performance in some cases.


The `early_stopping_type` parameter is used by SageMaker hyper-parameter tuning jobs to automatically stop a training job if the job is not improving the objective metrics (i.e. `train:accuracy`) relative to previous training jobs within the tuning job.  For more information on early stopping, please see the following documentation:  https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-early-stopping.html.

<a name='c3w1-ex-4'></a>
### Exercise 4

Setup the Hyperparameter Tuner.

**Instructions**: Use the function `HyperparameterTuner` passing the variables defined above. Please use tuning strategy `'Random'`.

```python
tuner = HyperparameterTuner(
    estimator=..., # estimator
    hyperparameter_ranges=..., # hyperparameter ranges
    metric_definitions=..., # definition metric
    strategy='...', # tuning strategy
    objective_type='Maximize',
    objective_metric_name='train:accuracy',
    max_jobs=2, # maximum number of jobs to run
    max_parallel_jobs=2, # maximum number of jobs to run in parallel
    early_stopping_type='Auto' # early stopping criteria
)
``` 

In [None]:
from sagemaker.tuner import HyperparameterTuner

tuner = HyperparameterTuner(
    ### BEGIN SOLUTION - DO NOT delete this comment for grading purposes
    estimator=None, # Replace None
    hyperparameter_ranges=None, # Replace None
    metric_definitions=None, # Replace None
    strategy=None, # Replace None
    ### END SOLUTION - DO NOT delete this comment for grading purposes
    objective_type='Maximize',
    objective_metric_name='validation:accuracy',
    max_jobs=2, # maximum number of jobs to run
    max_parallel_jobs=2, # maximum number of jobs to run in parallel
    early_stopping_type='Auto' # early stopping criteria
)

<a name='c3w1-ex-5'></a>
### Exercise 5

Launch the SageMaker Hyper-Parameter Tuning (HPT) Job.

**Instructions**: Use the `tuner.fit` function passing the configured train and validation inputs (data channels).

```python
tuner.fit(
    inputs=..., # train and validation input
    include_cls_metadata=False, # to be set as false if the algorithm cannot handle unknown hyperparameters
    wait=False # do not wait for the job to complete before continuing
)
``` 

In [None]:
tuner.fit(
    ### BEGIN SOLUTION - DO NOT delete this comment for grading purposes
    inputs=None, # Replace None
    ### END SOLUTION - DO NOT delete this comment for grading purposes
    include_cls_metadata=False,
    wait=False
)

<a name='c3w1-2.3.'></a>
### 2.3. Check Tuning Job status
You can see the Tuning Job status in the console. Let's get the Tuning Job name first to construct the link.

In [None]:
tuning_job_name = tuner.latest_tuning_job.job_name
print(tuning_job_name)

Check the status of the Tuning Job.

In [None]:
from IPython.core.display import display, HTML
    
display(HTML('<b>Review <a target="blank" href="https://console.aws.amazon.com/sagemaker/home?region={}#/hyper-tuning-jobs/{}">Hyper-Parameter Tuning Job</a></b>'.format(region, tuning_job_name)))

Wait for the Tuning Job to complete.

### _This cell will take approximately 20-30 minutes to run._

In [None]:
%%time

tuner.wait()

_Wait until the ^^ Tuning Job ^^ completes above_

The results of the SageMaker Hyperparameter Tuning Job are available on the `analytics` of the `tuner object`. `dataframe` function converts the result directly into the dataframe. You can explore the results with the following lines of the code:

In [None]:
import time

time.sleep(10) # slight delay to allow the analytics to be calculated

df_results = tuner.analytics().dataframe()
df_results.shape

In [None]:
df_results.sort_values('FinalObjectiveValue', ascending=0)

<a name='c3w1-3.'></a>
# 3. Evaluate the results

An important part of developing a model is evaluating the model with test data set that the model has never seen during its training process. These final metrics resulting from this evaluation can be used to compare competing machine learning models. The higher the value of these metrics, the better is the ability of the model to generalize.

<a name='c3w1-3.1.'></a>
### 3.1. Show the best candidate

<a name='c3w1-ex-6'></a>
### Exercise 6

Show the best candidate.

**Instructions**: Use the `sort_values` function to sort the results by accuracy which is stored in the column `FinalObjectiveValue`. Put `ascending=0` and `head(1)` for the selection.

```python
df_results.sort_values(
    '...', # column name for sorting
    ascending=0).head(1)
``` 

In [None]:
df_results.sort_values(
    ### BEGIN SOLUTION - DO NOT delete this comment for grading purposes
    None, # Replace None
    ### END SOLUTION - DO NOT delete this comment for grading purposes
    ascending=0).head(1)

<a name='c3w1-3.2.'></a>
### 3.2. Evaluate the best candidate

In [None]:
best_candidate_training_job_name = df_results.sort_values('FinalObjectiveValue', ascending=0).iloc[0]['TrainingJobName']
print('Best candidate training job name: {}'.format(best_candidate_training_job_name))

In [None]:
model_tar_s3_uri = sm.describe_training_job(TrainingJobName=best_candidate_training_job_name)['ModelArtifacts']['S3ModelArtifacts']
print(model_tar_s3_uri)

In [None]:
from sagemaker.sklearn.processing import SKLearnProcessor

processing_instance_type = "ml.c5.2xlarge"
processing_instance_count = 1

processor = SKLearnProcessor(
    framework_version="0.23-1",
    role=role,
    instance_type=processing_instance_type,
    instance_count=processing_instance_count,
    max_runtime_in_seconds=7200,
)

Upload and Configure S3 Path for Raw Test Data

In [None]:
!head -n 5 ./data/sentiment-test/part-algo-1-womens_clothing_ecommerce_reviews.tsv

In [None]:
raw_test_data_s3_uri = 's3://{}/data/sentiment-test/'.format(bucket)
print(raw_test_data_s3_uri)

In [None]:
!aws s3 cp --recursive ./data/sentiment-test/ $raw_test_data_s3_uri

In [None]:
!aws s3 ls --recursive $raw_test_data_s3_uri

You can review the file [src/evaluate_model_metrics.py](src/evaluate_model_metrics.py) which is used for the model evaluation.

In [None]:
from sagemaker.processing import ProcessingInput, ProcessingOutput

processor.run(
    code="src/evaluate_model_metrics.py",
    inputs=[
        ProcessingInput(  
            input_name="model-tar-s3-uri",                        
            source=model_tar_s3_uri,                               
            destination="/opt/ml/processing/input/model/"
        ),
        ProcessingInput(
            input_name="evaluation-data-s3-uri",
            source=raw_test_data_s3_uri,                                    
            destination="/opt/ml/processing/input/data/",
        ),
    ],
    outputs=[
        ProcessingOutput(s3_upload_mode="EndOfJob", output_name="metrics", source="/opt/ml/processing/output/metrics"),
    ],
    arguments=["--max-seq-length", str(max_seq_length)],
    logs=True,
    wait=False,
)

In [None]:
scikit_processing_job_name = processor.jobs[-1].describe()["ProcessingJobName"]
print(scikit_processing_job_name)

In [None]:
from IPython.core.display import display, HTML

display(
    HTML(
        '<b>Review <a target="blank" href="https://console.aws.amazon.com/sagemaker/home?region={}#/processing-jobs/{}">Processing Job</a></b>'.format(
            region, scikit_processing_job_name
        )
    )
)

In [None]:
from IPython.core.display import display, HTML

display(
    HTML(
        '<b>Review <a target="blank" href="https://console.aws.amazon.com/cloudwatch/home?region={}#logStream:group=/aws/sagemaker/ProcessingJobs;prefix={};streamFilter=typeLogStreamPrefix">CloudWatch Logs</a> After About 5 Minutes</b>'.format(
            region, scikit_processing_job_name
        )
    )
)

In [None]:
from IPython.core.display import display, HTML

display(
    HTML(
        '<b>Review <a target="blank" href="https://s3.console.aws.amazon.com/s3/buckets/{}/{}/?region={}&tab=overview">S3 Output Data</a> After The Processing Job Has Completed</b>'.format(
            bucket, scikit_processing_job_name, region
        )
    )
)

Monitor the Processing Job

In [None]:
from pprint import pprint

running_processor = sagemaker.processing.ProcessingJob.from_processing_name(
    processing_job_name=scikit_processing_job_name, sagemaker_session=sess
)

processing_job_description = running_processor.describe()

pprint(processing_job_description)

Wait for the processing job to complete.

### _This cell will take approximately 5-10 minutes to run._

In [None]:
%%time

running_processor.wait(logs=False)

<a name='c3w1-3.3.'></a>
### 3.3. Inspect the processed output data

Take a look at a few rows of the transformed dataset to make sure the processing was successful.

In [None]:
processing_job_description = running_processor.describe()

output_config = processing_job_description["ProcessingOutputConfig"]
for output in output_config["Outputs"]:
    if output["OutputName"] == "metrics":
        processed_metrics_s3_uri = output["S3Output"]["S3Uri"]

print(processed_metrics_s3_uri)

In [None]:
!aws s3 ls $processed_metrics_s3_uri/

Show the test accuracy.

In [None]:
import json
from pprint import pprint

metrics_json = sagemaker.s3.S3Downloader.read_file("{}/evaluation.json".format(
    processed_metrics_s3_uri
))

print('Test accuracy: {}'.format(json.loads(metrics_json)))

In [None]:
!aws s3 cp $processed_metrics_s3_uri/confusion_matrix.png ./generated/

import time
time.sleep(10) # Slight delay for our notebook to recognize the newly-downloaded file

Show the confusion matrix generated during model evaluation.

In [None]:
%%html

<img src='./generated/confusion_matrix.png'>

Upload the notebook into S3 bucket for grading purposes.

**Note:** you may need to click on "Save" button before the upload.

In [None]:
!aws s3 cp ./C3_W1_Assignment.ipynb s3://$bucket/C3_W1_Assignment_Learner.ipynb

### _Graders are not implemented at this stage of testing - please do not submit the assignment for grading_