# Direct Marketing with Amazon SageMaker Autopilot

This notebook works well with the `Python 3 (Data Science)` kernel on SageMaker Studio.

---

---

## Contents

1. [Introduction](#Introduction)
1. [Setup](#Setup)
1. [AutoPilot Experiment using SageMaker Studio UI](#AutoPilot-Experiment-using-SageMaker-Studio-UI)
 * [Open Amazon SageMaker Studio](#Open-Amazon-SageMaker-Studio)
 * [Create Autopilot Experiment](#Create-Autopilot-Experiment-Job)
 * [Enter information for the AutoPilot Job](#Enter-information-for-the-AutoPilot-Job)
 * [View Autopilot Experiment Job ](#View-Autopilot-Experiment-Job)
 * [Test Deployed Model](#Test-Deployed-Model)
1. [Clean Up](#Cleanup)

## Introduction

SageMaker Studio provide an UI interface to make [Amazon SageMaker Autopilot](https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-automate-model-development-create-experiment.html) experiment easy. 

In the notebook, we will explore the process on how to create a SageMaker Autopilot experiment via Studio UI. 

> **_NOTE_** Please do finish [01_sagemaker_autopilot_data_preparation.ipynb](./01_sagemaker_autopilot_data_preparation.ipynb) notebook first so that we have the training dataset ready on the S3 bucket.

### Why using SageMaker Studio UI?

Studio UI is the fastest and easiest way to kick off [Amazon SageMaker Autopilot](https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-automate-model-development-create-experiment.html) experiment. With a couple of clicks and filling in the experiment details, such as training data S3 URI, output location S3 URI and experiment settings, etc. you can sit down and relax and wait for the AutoML service to figure out the best model candidate for your ML problems on tabular dataset.

## Setup

Retrieve shared variables created by [01_sagemaker_autopilot_data_preparation.ipynb](./01_sagemaker_autopilot_data_preparation.ipynb) notebook and list out the S3 URIs to prepare Autopilot experiment.

In [None]:
%store -r train_data_s3_path
%store -r test_file_label
%store -r bucket
%store -r prefix

try:
  train_data_s3_path
except NameError:
    raise ValueError("Training dataset S3 URI is missing, please execute the data preparation notebook!")

> **__NOTE__** Please note down below variables:
* `train_data_s3_path` for training data input.
* `using_studio_ui_output_path` for Autopilot experiment output.

In [None]:
train_data_s3_path

In [None]:
using_studio_ui_output_path = f"s3://{bucket}/{prefix}/using-studio-ui-output"
using_studio_ui_output_path

## AutoPilot Experiment using SageMaker Studio UI

In this section, we will work through the steps to create AutoPilot experiment job via SageMaker Studio UI, and invoke the deployed model using boto3 API calls.

### Open Amazon SageMaker Studio

Please follow below steps:
* Logon to AWS Management Console
* Select 'Amazon SageMaker' service
* Select 'Studio' under left-hand side menu, which is under 'SageMaker Domain'
* Click 'Launch App' dropdown box under a SageMaker User
* Click 'Studio' item under the dropdown box.

### Create Autopilot Experiment Job

Under 'Launcher' tab, choose the **New autopilot experiment** option from the **Build model automatically** box. If you don't have a 'Launcher' tab, you can open one under Menu 'File' -> 'New Launcher'.
 
![New autoPilot experiment](./image/ap_new_autopilot_experiment.png)

### Enter information for the AutoPilot Job

* **Experiment name** - an unique name to your account in the current AWS Region and contains a maximum of 63 alphanumeric characters. Can include hyphens(-). 
 
 ![experiment name](./image/ap_experiment_name.png)
 
  * Type in 'Experiment name', e.g. 'direct-marketing-autopilot-job'

 * **Connect your data** - Provide the training data S3 URI.
 
 ![experiment name](./image/ap_enter_s3bucket_location.png)
 
  * Select 'Enter S3 bucket location' option
  * Copy & paste the value of `train_data_s3_path` to 'S3 bucket address'. e.g. the value is similar to 's3://sagemaker-ap-southeast-2-123456789012/mlu-workshop/autopilot-dm/train/train_data.csv' 

 * **Is your S3 input a manifest file?** - choose 'off' for the lab given we don't need a manifest file include meta data for our training data.

  * **target** - the target value or label in the training dataset.
  
  ![manifest file](./image/ap_target.png)
  
   * Click the dropdown box and select field 'y', which is the target value.

  * **Output data location** - the name of the S3 bucket and directory where you want to store the output data.
 
  ![manifest file](./image/ap_output_data_location.png)
  
  > **_NOTE_**: You may select a S3 bucket (which is under the AWS Region) and related directory, or provide the S3 folder URI. In our exercise, please use a directory under SageMaker Default S3 bucket. 

  * **Select the machine learning problem type** - Autopilot can automatically select the machine learning problem type and you can specify manually. In our exercise, please choose `Binary classification` in dropdox box.
  
  ![manifest file](./image/ap_ml_problem_type.png)
  
   * Please select [`F1`](https://en.wikipedia.org/wiki/F-score) as Object metric. In general, [`F1`](https://en.wikipedia.org/wiki/F-score) is the harmonic mean of the precision and recall in binary classification problem. 

  * **Do you want to run complete experiment** - You can specify how to run the experiment. 
  
  ![manifest file](./image/ap_complete_experiment.png)
  
   * If you choose **Yes**, Autopilot runs experiments with model training, generates related trials and you will be able to deploy the best model to SageMaker Endpoint service for realtime inference. 
   * If you choose **No**, instead of running the entire experiment, AutoPilot stops running after generating the notebooks for dataset analysis & candidates definitions. 

  * **Auto deploy** - Autopilot can automatically deploy the best model from an Autopilot experiment to an endpoint (for realtime inference), accept the default Auto deploy value **On** when creating the experiment. Also, please provide the endpoint name. 
  
  ![manifest file](./image/ap_auto_deploy.png)
  
   * In our exercise, please input `dm-autopilot-experiment` and the endpoint name will be used later to get predictions from deployed model.

  * **ADVANCED SETTINGS** - The settings allows you to specify how the experiment should be run. Especially, we want to set the max candidate to be experimented as `10` and accept default values for others.
  
  ![manifest file](./image/ap_advanced_settings.png)

  * **Auto deploy the best model confirmation?** - If you choose **On** under 'Auto Deploy', Autopilot will prompt a confirmation dialog to remind you that it will generate cost while deploying the model to SageMaker endpoint. In our exercise, please click `Confirm` button.
  
  ![manifest file](./image/ap_prompt_best_model_deployment.png)

4. **AutoPilot experiment in progress** - Once the Autopilot experiment is kicked off, you will be able to view the progress of the experiment. It may takes 20-40mins until the job is finished, which depends on the amount the training dataset & the number of candidates you want to experiment. (Autopilot supports up to 250 candidates)
  
  ![manifest file](./image/ap_auto_pilot_job_in_progress.png)

### View Autopilot Experiment Job 

Once the experiment is completed, we will be able to view the related trials and access generate notebooks & deployed endpoint.

* **Autopilot Job Detail** - To access the Autopilot job detail, you may wait under the job finished from the previous step. Or,
 1. click ![icon](./image/sm_studio_sagemaker_resources.png) `SageMaker Resources` icon to open resources pane. 
 2. select `Experiments and trials` to list SageMaker experiments. 
 3. right click the experiment object in the list and select `Describe AutoML Job`.
 

  ![To View Autopilot Job Detail](./image/ap_direct_markting_autopilot_job_detail.png)

#### To learn more about the generated notebooks

* Click button `Open candidate generation notebook` to understand more detail on how the model candidates are being explored.
* Click button `Open data exploration notebook` to understand more on how the training data statistics look liks.

#### To view `Best Model`

`Best Model` is the one with the highest performance on the selected `Objective metric`. In our lab, it's the `F1` score.

Please go ahead and right click your mouse on first row with `Best Model` and select `Describe in model details` menu. 

  ![To View Best Model](./image/ap_describe_best_model.png)
  
With that, the model details page will be shown, especially, for the `Best Model`, Autopilot provides reports for `Explainability` and `Performance` tabs. Please select them to understand more about model explainability and model performance.

  ![To View Model Details](./image/ap_model_detail_explainability.png)

### Test Deployed Model

Note down the endpoint's name, which was provided in Autopilot experiment creation.

load the test data set to `Pandas` dataframe.

In [None]:
import pandas as pd

column_label = 'y'

test_data = pd.read_csv(test_file_label)
columns = test_data.columns.tolist()
print(type(columns))
columns.remove(column_label)
columns.insert(0, column_label)

# list the label as first column so that you can verify the prediction result easier
test_data[columns]

In [None]:
X_test_numpy = test_data.drop(["y"], axis=1).values

set the endpoint name, if you are using something different from `dm-autopilot-experiment`, please update the value below:

In [None]:
endpoint_name = 'dm-autopilot-experiment'

In [None]:
import boto3

runtime = boto3.client('sagemaker-runtime')


def predict(payload):
    response = runtime.invoke_endpoint(EndpointName=endpoint_name,
                                           ContentType='text/csv',
                                           Body=payload)
    print(response)
    result = response['Body'].read().decode('utf-8').strip()
    pred, pred_probability = result.split(',')
    print(f"Prediction result: {pred} with probability: {pred_probability}")
    return pred, pred_probability

In [None]:
# update the index to test on different row in the test dataset.
index = 15

label = test_data.iloc[index]['y']
print(f"Row index {index} with Label: {label}")

payload = ','.join(X_test_numpy[index].astype(str).tolist())
predict(payload)

### Cleanup

It's generally a good practice to deactivate all endpoints which are not in use.

Please uncomment the following lines and run the cell in order to deactivate the endpoint that were created before.

In [None]:
# sm_client = boto3.client('sagemaker')
# sm_client.delete_endpoint(EndpointName=endpoint_name)