In [None]:
%%sh
pip -q install --upgrade pip
pip -q install sagemaker awscli boto3 --upgrade

In [None]:
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

# Direct Marketing with Keras and Hyperparameter Tuning

Last update: December 2nd, 2019

In this lab, we're going to use a simple neural network implemented with [Keras](https://keras.io), a popular, beginner-friendly deep learning library.

Here's a high-level overview of the Keras code below:
* Read hyperparameters, architecture parameters (number and width of dense layers), and environment variables passed by SageMaker (as per [script mode](https://sagemaker.readthedocs.io/en/stable/using_tf.html))
* Read the full data set from the training channel,
* One-hot encode categorical variables,
* Separate samples (X) and labels (Y),
* Apply [min/max](https://en.wikipedia.org/wiki/Feature_scaling) scaling on numerical features,
* Split data set for training and validation,
* Build the neural network, with 1 to 'layers' dense layers, each one with 'dense_layer' neurons,
* Train the model, displaying precision, recall and f1 score,
* Score the model,
* Save the model.


In [None]:
!pygmentize dm_keras_tf.py

In [None]:
import sagemaker
import boto3

print (sagemaker.__version__)

sess   = sagemaker.Session()
bucket = sess.default_bucket()                     
prefix = 'sagemaker/DEMO-hpo-keras-dm'
region = boto3.Session().region_name

# Role when working on a notebook instance
role = sagemaker.get_execution_role()
# Role when working locally
# role = ROLE_ARN

We upload the raw dataset to S3, as the Keras script itself will perform basic preprocessing.

In [None]:
training_input_path = sess.upload_data('bank-additional/bank-additional-full.csv', key_prefix=prefix+'/training')

print(training_input_path)

## Configure Automatic Model Tuning

In [None]:
from sagemaker.tensorflow import TensorFlow

tf_estimator = TensorFlow(entry_point='dm_keras_tf.py', 
                          role=role,
                          train_instance_count=1, 
                          train_instance_type='ml.c5.2xlarge',
                          framework_version='1.14', 
                          py_version='py3',
                          script_mode=True,
                          train_use_spot_instances=True,        # Use spot instance
                          train_max_run=600,                    # Max training time
                          train_max_wait=3600                   # Max training time + spot waiting time
                         )

Let's try to tune our Keras model on two architecture parameters: number of dense layers, and dense layer width.

We're using the F1 metric again. It's not natively supported in Keras, and requires the addition of the keras-metrics package. Installation is done in the script itself. We also need to pass a regular expression so that SageMaker can locate and extract the metric from the training log.

In [None]:
from sagemaker.tuner import IntegerParameter, ContinuousParameter, HyperparameterTuner

hyperparameter_ranges = {
    'epochs':        IntegerParameter(1, 5),
    'learning-rate': ContinuousParameter(0.001, 0.1, scaling_type='ReverseLogarithmic'), # useful for values<1
    'batch-size':    IntegerParameter(16, 1024, scaling_type='Logarithmic'),
    'layers':        IntegerParameter(1, 4),
    'dense-layer':   IntegerParameter(4, 64)
}

objective_metric_name = 'f1_score'
objective_type = 'Maximize'
metric_definitions = [{'Name': 'f1_score', 'Regex': 'val_f1_score: ([0-9\\.]+)'}]

tuner = HyperparameterTuner(tf_estimator,
                            objective_metric_name,
                            hyperparameter_ranges,
                            metric_definitions,
                            max_jobs=20,
                            max_parallel_jobs=2,
                            objective_type=objective_type)

In [None]:
tuner.fit({'training': training_input_path})

You can repeatedly run the cells below while the job is running.

In [None]:
sagemaker = boto3.Session().client(service_name='sagemaker') 

job_name = tuner.latest_tuning_job.job_name

# run this cell to check current status of hyperparameter tuning job
tuning_job_result = sagemaker.describe_hyper_parameter_tuning_job(HyperParameterTuningJobName=job_name)

status = tuning_job_result['HyperParameterTuningJobStatus']
if status != 'Completed':
    print('Reminder: the tuning job has not been completed.')
    
job_count = tuning_job_result['TrainingJobStatusCounters']['Completed']
print("%d training jobs have completed" % job_count)

## Inspect jobs with Amazon SageMaker Experiments

In [None]:
from sagemaker.analytics import HyperparameterTuningJobAnalytics

exp = HyperparameterTuningJobAnalytics(
    sagemaker_session=sess, 
    hyperparameter_tuning_job_name=tuner.latest_tuning_job.name
)

In [None]:
df = exp.dataframe()

In [None]:
df

'FinalObjectiveValue' is the F1 score. 

In [None]:
df.sort_values('FinalObjectiveValue', ascending=0)[:1]

How does this compare to what you achieved in the first two labs?