# Optimize models using Automatic Model Tuning

### Introduction

The model analyzes customer feedback and classifies the messages into positive (1), neutral (0), and negative (-1) sentiments.



install and import the required modules.

In [2]:
# please ignore warning messages during the installation
!pip install --disable-pip-version-check -q sagemaker==2.35.0
!conda install -q -y pytorch==1.6.0 -c pytorch
!pip install --disable-pip-version-check -q transformers==3.5.1

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
sparkmagic 0.20.4 requires nest-asyncio==1.5.5, but you have nest-asyncio 1.5.6 which is incompatible.[0m[31m
[0mCollecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... 
The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:

  - defaults/linux-64::anaconda-client==1.7.2=py37_0
  - defaults/noarch::anaconda-project==0.8.4=py_0
  - defaults/linux-64::bokeh==1.4.0=py37_0
  - defaults/noarch::dask==2.11.0=py_0
  - defaults/linux-64::distributed==2.11.0=py37_0
  - defaults/linux-64::spyder==4.0.1=py37_0
  - defaults/linux-64::wat

In [3]:
import boto3
import sagemaker
import pandas as pd
import botocore

config = botocore.config.Config(user_agent_extra='dlai-pds/c3/w1')

# low-level service client of the boto3 session
sm = boto3.client(service_name='sagemaker', 
                  config=config)

sess = sagemaker.Session(sagemaker_client=sm)

bucket = sess.default_bucket()
role = sagemaker.get_execution_role()
region = sess.boto_region_name

<a name='c3w1-1.'></a>
# 1. Configure dataset and Hyperparameter Tuning Job (HTP)


### 1.1. Configure dataset

set up the paths and copy the data to the S3 bucket:

In [4]:
processed_train_data_s3_uri = 's3://{}/transformed/data/sentiment-train/'.format(bucket)
processed_validation_data_s3_uri = 's3://{}/transformed/data/sentiment-validation/'.format(bucket)
processed_test_data_s3_uri = 's3://{}/transformed/data/sentiment-test/'.format(bucket)

Upload the data to the S3 bucket:

In [5]:
!aws s3 cp --recursive ./data/sentiment-train $processed_train_data_s3_uri
!aws s3 cp --recursive ./data/sentiment-validation $processed_validation_data_s3_uri
!aws s3 cp --recursive ./data/sentiment-test $processed_test_data_s3_uri

upload: data/sentiment-train/part-algo-1-womens_clothing_ecommerce_reviews.tsv to s3://sagemaker-us-east-1-302997832976/transformed/data/sentiment-train/part-algo-1-womens_clothing_ecommerce_reviews.tsv
upload: data/sentiment-validation/part-algo-1-womens_clothing_ecommerce_reviews.tsv to s3://sagemaker-us-east-1-302997832976/transformed/data/sentiment-validation/part-algo-1-womens_clothing_ecommerce_reviews.tsv
upload: data/sentiment-test/part-algo-1-womens_clothing_ecommerce_reviews.tsv to s3://sagemaker-us-east-1-302997832976/transformed/data/sentiment-test/part-algo-1-womens_clothing_ecommerce_reviews.tsv


Check the existence of those files in the S3 bucket:

In [6]:
!aws s3 ls --recursive $processed_train_data_s3_uri

2023-07-18 15:19:57    4894416 transformed/data/sentiment-train/part-algo-1-womens_clothing_ecommerce_reviews.tsv


In [7]:
!aws s3 ls --recursive $processed_validation_data_s3_uri

2023-07-18 15:19:58     276522 transformed/data/sentiment-validation/part-algo-1-womens_clothing_ecommerce_reviews.tsv


In [8]:
!aws s3 ls --recursive $processed_test_data_s3_uri

2023-07-18 15:19:59     273414 transformed/data/sentiment-test/part-algo-1-womens_clothing_ecommerce_reviews.tsv



Set up a dictionary of the input training and validation data channels

In [9]:
from sagemaker.inputs import TrainingInput

data_channels = {
    'train': TrainingInput(s3_data=processed_train_data_s3_uri), 
    'validation':  TrainingInput(s3_data=processed_validation_data_s3_uri) 
    }

### 1.2. Configure Hyperparameter Tuning Job



In [10]:
max_seq_length=128 # maximum number of input tokens passed to BERT model
freeze_bert_layer=False # specifies the depth of training within the network
epochs=3
train_steps_per_epoch=50
validation_batch_size=64
validation_steps_per_epoch=50
seed=42

train_instance_count=1
train_instance_type='ml.c5.9xlarge'
train_volume_size=256
input_mode='File'
run_validation=True

Some of these will be passed into the PyTorch estimator and tuner in the hyperparameters argument. 
set up the dictionary

In [11]:
hyperparameters_static={
    'freeze_bert_layer': freeze_bert_layer,
    'max_seq_length': max_seq_length,
    'epochs': epochs,
    'train_steps_per_epoch': train_steps_per_epoch,
    'validation_batch_size': validation_batch_size,
    'validation_steps_per_epoch': validation_steps_per_epoch,
    'seed': seed,
    'run_validation': run_validation
}

Configure hyperparameter ranges to explore in the Tuning Job. 

In [12]:
from sagemaker.tuner import IntegerParameter
from sagemaker.tuner import ContinuousParameter
from sagemaker.tuner import CategoricalParameter
                                                
hyperparameter_ranges = {
    'learning_rate': ContinuousParameter(0.00001, 0.00005, scaling_type='Linear'), # specifying continuous variable type, the tuning job will explore the range of values
    'train_batch_size': CategoricalParameter([128, 256]), # specifying categorical variable type, the tuning job will explore only listed values
}


### 1.3. Set up evaluation metrics

Choose loss and accuracy as the evaluation metrics. 

In [13]:
metric_definitions = [
     {'Name': 'validation:loss', 'Regex': 'val_loss: ([0-9.]+)'},
     {'Name': 'validation:accuracy', 'Regex': 'val_acc: ([0-9.]+)'},
]

For example, these sample log lines...
```
[step: 100] val_loss: 0.76 - val_acc: 70.92%
```

...will produce the following metrics in CloudWatch:

`validation:loss` =  0.76

`validation:accuracy` = 70.92

<a name='c3w1-2.'></a>
# 2. Run Tuning Job


### 2.1. Set up the RoBERTa and PyTorch script to run on SageMaker

Prepare the PyTorch model to run as a SageMaker Training Job. The estimator takes into the entry point a separate Python file, which will be called during the training. 


In [14]:
from sagemaker.pytorch import PyTorch as PyTorchEstimator

estimator = PyTorchEstimator(
    entry_point='train.py',
    source_dir='src',
    role=role,
    instance_count=train_instance_count,
    instance_type=train_instance_type,
    volume_size=train_volume_size,
    py_version='py3',
    framework_version='1.6.0',
    hyperparameters=hyperparameters_static,
    metric_definitions=metric_definitions,
    input_mode=input_mode,
)


### 2.2. Launch the Hyperparameter Tuning Job

use a `Random` search strategy to determine the combinations of hyperparameters 




Set up the Hyperparameter Tuner.


In [15]:
from sagemaker.tuner import HyperparameterTuner

tuner = HyperparameterTuner(
    estimator=estimator, 
    hyperparameter_ranges=hyperparameter_ranges, 
    metric_definitions=metric_definitions, 
    strategy='Random', 
    objective_type='Maximize',
    objective_metric_name='validation:accuracy',
    max_jobs=2, # maximum number of jobs to run
    max_parallel_jobs=2, # maximum number of jobs to run in parallel
    early_stopping_type='Auto' # early stopping criteria
)



Launch the SageMaker Hyper-Parameter Tuning (HPT) Job.



In [16]:
tuner.fit(
    inputs=data_channels, 
    include_cls_metadata=False,
    wait=False
)


### 2.3. Check Tuning Job status


In [17]:
tuning_job_name = tuner.latest_tuning_job.job_name
print(tuning_job_name)

pytorch-training-230718-1520


Check the status of the Tuning Job.

In [18]:
from IPython.core.display import display, HTML
    
display(HTML('<b>Review <a target="blank" href="https://console.aws.amazon.com/sagemaker/home?region={}#/hyper-tuning-jobs/{}">Hyper-Parameter Tuning Job</a></b>'.format(region, tuning_job_name)))

In [22]:
%%time

tuner.wait()

!
CPU times: user 0 ns, sys: 7.27 ms, total: 7.27 ms
Wall time: 154 ms


In [23]:
import time

time.sleep(10) # slight delay to allow the analytics to be calculated

df_results = tuner.analytics().dataframe()
df_results.shape

(2, 8)

In [24]:
df_results.sort_values('FinalObjectiveValue', ascending=0)

Unnamed: 0,learning_rate,train_batch_size,TrainingJobName,TrainingJobStatus,FinalObjectiveValue,TrainingStartTime,TrainingEndTime,TrainingElapsedTimeSeconds
1,3.8e-05,"""256""",pytorch-training-230718-1520-001-5e34af03,Completed,71.089996,2023-07-18 15:22:00+00:00,2023-07-18 15:43:56+00:00,1316.0
0,3.9e-05,"""256""",pytorch-training-230718-1520-002-f239d159,Completed,68.360001,2023-07-18 15:23:15+00:00,2023-07-18 15:45:52+00:00,1357.0


In [25]:
from IPython.core.display import display, HTML
    
display(HTML('<b>Review Training Jobs of the <a target="blank" href="https://console.aws.amazon.com/sagemaker/home?region={}#/hyper-tuning-jobs/{}">Hyper-Parameter Tuning Job</a></b>'.format(region, tuning_job_name)))

# 3. Evaluate the results




### 3.1.best candidate

In [26]:
df_results.sort_values(
    'FinalObjectiveValue', 
    ascending=0).head(1)

Unnamed: 0,learning_rate,train_batch_size,TrainingJobName,TrainingJobStatus,FinalObjectiveValue,TrainingStartTime,TrainingEndTime,TrainingElapsedTimeSeconds
1,3.8e-05,"""256""",pytorch-training-230718-1520-001-5e34af03,Completed,71.089996,2023-07-18 15:22:00+00:00,2023-07-18 15:43:56+00:00,1316.0


### 3.2. Evaluate the best candidate


In [27]:
best_candidate = df_results.sort_values('FinalObjectiveValue', ascending=0).iloc[0]

best_candidate_training_job_name = best_candidate['TrainingJobName']
print('Best candidate Training Job name: {}'.format(best_candidate_training_job_name))

Best candidate Training Job name: pytorch-training-230718-1520-001-5e34af03




accuracy result for the best candidate.


In [28]:
best_candidate_accuracy = best_candidate['FinalObjectiveValue']

print('Best candidate accuracy result: {}'.format(best_candidate_accuracy))

Best candidate accuracy result: 71.08999633789062


use the function describe_training_job of the service client to get some more information about the best candidate

In [29]:
best_candidate_description = sm.describe_training_job(TrainingJobName=best_candidate_training_job_name)

best_candidate_training_job_name2 = best_candidate_description['TrainingJobName']

print('Training Job name: {}'.format(best_candidate_training_job_name2))

Training Job name: pytorch-training-230718-1520-001-5e34af03



Pull the Tuning Job and Training Job Amazon Resource Name (ARN) from the best candidate training job description


In [30]:
print(best_candidate_description.keys())

dict_keys(['TrainingJobName', 'TrainingJobArn', 'TuningJobArn', 'ModelArtifacts', 'TrainingJobStatus', 'SecondaryStatus', 'HyperParameters', 'AlgorithmSpecification', 'RoleArn', 'InputDataConfig', 'OutputDataConfig', 'ResourceConfig', 'StoppingCondition', 'CreationTime', 'TrainingStartTime', 'TrainingEndTime', 'LastModifiedTime', 'SecondaryStatusTransitions', 'FinalMetricDataList', 'EnableNetworkIsolation', 'EnableInterContainerTrafficEncryption', 'EnableManagedSpotTraining', 'TrainingTimeInSeconds', 'BillableTimeInSeconds', 'ProfilingStatus', 'WarmPoolStatus', 'ResponseMetadata'])


In [31]:
best_candidate_tuning_job_arn = best_candidate_description['TuningJobArn'] 
best_candidate_training_job_arn = best_candidate_description['TrainingJobArn'] 
print('Best candidate Tuning Job ARN: {}'.format(best_candidate_tuning_job_arn))
print('Best candidate Training Job ARN: {}'.format(best_candidate_training_job_arn))

Best candidate Tuning Job ARN: arn:aws:sagemaker:us-east-1:302997832976:hyper-parameter-tuning-job/pytorch-training-230718-1520
Best candidate Training Job ARN: arn:aws:sagemaker:us-east-1:302997832976:training-job/pytorch-training-230718-1520-001-5e34af03


Pull the path of the best candidate model in the S3 bucket. use it to set up the Processing Job for the evaluation.

In [32]:
model_tar_s3_uri = sm.describe_training_job(TrainingJobName=best_candidate_training_job_name)['ModelArtifacts']['S3ModelArtifacts']
print(model_tar_s3_uri)

s3://sagemaker-us-east-1-302997832976/pytorch-training-230718-1520-001-5e34af03/output/model.tar.gz


use a scikit-learn-based Processing Job to perform model evaluation.  

In [33]:
from sagemaker.sklearn.processing import SKLearnProcessor

processing_instance_type = "ml.c5.2xlarge"
processing_instance_count = 1

processor = SKLearnProcessor(
    framework_version="0.23-1",
    role=role,
    instance_type=processing_instance_type,
    instance_count=processing_instance_count,
    max_runtime_in_seconds=7200,
)

The model evaluation Processing Job will run evaluate_model_metrics.py. 

Launch the Processing Job, passing the defined above parameters, custom script, path and the S3 bucket location of the test data.

In [34]:
from sagemaker.processing import ProcessingInput, ProcessingOutput

processor.run(
    code="src/evaluate_model_metrics.py",
    inputs=[
        ProcessingInput(  
            input_name="model-tar-s3-uri",                        
            source=model_tar_s3_uri,                               
            destination="/opt/ml/processing/input/model/"
        ),
        ProcessingInput(
            input_name="evaluation-data-s3-uri",
            source=processed_test_data_s3_uri,                                    
            destination="/opt/ml/processing/input/data/",
        ),
    ],
    outputs=[
        ProcessingOutput(s3_upload_mode="EndOfJob", output_name="metrics", source="/opt/ml/processing/output/metrics"),
    ],
    arguments=["--max-seq-length", str(max_seq_length)],
    logs=True,
    wait=False,
)


Job Name:  sagemaker-scikit-learn-2023-07-18-16-10-27-145
Inputs:  [{'InputName': 'model-tar-s3-uri', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://sagemaker-us-east-1-302997832976/pytorch-training-230718-1520-001-5e34af03/output/model.tar.gz', 'LocalPath': '/opt/ml/processing/input/model/', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}, {'InputName': 'evaluation-data-s3-uri', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://sagemaker-us-east-1-302997832976/transformed/data/sentiment-test/', 'LocalPath': '/opt/ml/processing/input/data/', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}, {'InputName': 'code', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://sagemaker-us-east-1-302997832976/sagemaker-scikit-learn-2023-07-18-16-10-27-145/input/code/evaluate_model_metrics.py', 'LocalPath': '/opt/ml/processing/input/code', 'S3DataType': 'S

see the information about the Processing Jobs using the describe function.

In [35]:
scikit_processing_job_name = processor.jobs[-1].describe()["ProcessingJobName"]

print('Processing Job name: {}'.format(scikit_processing_job_name))

Processing Job name: sagemaker-scikit-learn-2023-07-18-16-10-27-145



Pull the Processing Job status from the Processing Job description.


In [36]:
print(processor.jobs[-1].describe().keys())

dict_keys(['ProcessingInputs', 'ProcessingOutputConfig', 'ProcessingJobName', 'ProcessingResources', 'StoppingCondition', 'AppSpecification', 'RoleArn', 'ProcessingJobArn', 'ProcessingJobStatus', 'LastModifiedTime', 'CreationTime', 'ResponseMetadata'])


In [37]:
scikit_processing_job_status = processor.jobs[-1].describe()['ProcessingJobStatus'] 
print('Processing job status: {}'.format(scikit_processing_job_status))

Processing job status: InProgress


Review the created Processing Job in the AWS console.


In [38]:
from IPython.core.display import display, HTML

display(
    HTML(
        '<b>Review <a target="blank" href="https://console.aws.amazon.com/sagemaker/home?region={}#/processing-jobs/{}">Processing Job</a></b>'.format(
            region, scikit_processing_job_name
        )
    )
)

In [39]:
from IPython.core.display import display, HTML

display(
    HTML(
        '<b>Review <a target="blank" href="https://console.aws.amazon.com/cloudwatch/home?region={}#logStream:group=/aws/sagemaker/ProcessingJobs;prefix={};streamFilter=typeLogStreamPrefix">CloudWatch Logs</a> after about 5 minutes</b>'.format(
            region, scikit_processing_job_name
        )
    )
)

In [40]:
from IPython.core.display import display, HTML

display(
    HTML(
        '<b>Review <a target="blank" href="https://s3.console.aws.amazon.com/s3/buckets/{}/{}/?region={}&tab=overview">S3 output data</a> after the Processing Job has completed</b>'.format(
            bucket, scikit_processing_job_name, region
        )
    )
)

Monitor the Processing Job

In [41]:
from pprint import pprint

running_processor = sagemaker.processing.ProcessingJob.from_processing_name(
    processing_job_name=scikit_processing_job_name, sagemaker_session=sess
)

processing_job_description = running_processor.describe()

pprint(processing_job_description)

{'AppSpecification': {'ContainerArguments': ['--max-seq-length', '128'],
                      'ContainerEntrypoint': ['python3',
                                              '/opt/ml/processing/input/code/evaluate_model_metrics.py'],
                      'ImageUri': '683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.23-1-cpu-py3'},
 'CreationTime': datetime.datetime(2023, 7, 18, 16, 10, 27, 764000, tzinfo=tzlocal()),
 'LastModifiedTime': datetime.datetime(2023, 7, 18, 16, 10, 28, 280000, tzinfo=tzlocal()),
 'ProcessingInputs': [{'AppManaged': False,
                       'InputName': 'model-tar-s3-uri',
                       'S3Input': {'LocalPath': '/opt/ml/processing/input/model/',
                                   'S3CompressionType': 'None',
                                   'S3DataDistributionType': 'FullyReplicated',
                                   'S3DataType': 'S3Prefix',
                                   'S3InputMode': 'File',
                   

In [42]:
%%time

running_processor.wait(logs=False)

......................................................................!CPU times: user 313 ms, sys: 38.9 ms, total: 351 ms
Wall time: 5min 54s


### 3.3. Inspect the processed output data
Get the S3 bucket location of the output metrics

In [43]:
processing_job_description = running_processor.describe()

output_config = processing_job_description["ProcessingOutputConfig"]
for output in output_config["Outputs"]:
    if output["OutputName"] == "metrics":
        processed_metrics_s3_uri = output["S3Output"]["S3Uri"]

print(processed_metrics_s3_uri)

s3://sagemaker-us-east-1-302997832976/sagemaker-scikit-learn-2023-07-18-16-10-27-145/output/metrics


List the content of the folder

In [44]:
!aws s3 ls $processed_metrics_s3_uri/

2023-07-18 16:16:30      21307 confusion_matrix.png
2023-07-18 16:16:30         56 evaluation.json


Pull the test accuracy from evaluation.json file.

In [45]:
import json
from pprint import pprint

metrics_json = sagemaker.s3.S3Downloader.read_file("{}/evaluation.json".format(
    processed_metrics_s3_uri
))

print('Test accuracy: {}'.format(json.loads(metrics_json)))

Test accuracy: {'metrics': {'accuracy': {'value': 0.7411003236245954}}}


Copy image with the confusion matrix into the folder generated.

In [46]:
!aws s3 cp $processed_metrics_s3_uri/confusion_matrix.png ./generated/

import time
time.sleep(10) 

download: s3://sagemaker-us-east-1-302997832976/sagemaker-scikit-learn-2023-07-18-16-10-27-145/output/metrics/confusion_matrix.png to generated/confusion_matrix.png


See the confusion matrix. 

In [47]:
%%html

<img src='./generated/confusion_matrix.png'>