# Customer Churn Prediction with Amazon SageMaker Autopilot

> *This notebook is designed to work with the `Python 3 (Data Science)` kernel on SageMaker Studio. With the example dataset, the default `ml.t3.medium` (2 vCPU + 4 GiB) instance type should work well.*

<div class="alert alert-info">
    <p>▶️ <b>Workshop fast-start</b></p>
    <p>To get started with this workshop, go to the <a href="#Quickstart"><u>quick-start box below (link)</u></a> and follow the instructions to run the first set of cells.</p>
    <p>This will help ensure your Autopilot job is started promptly and you see results during the session</p>
</div>

## Contents

1. [Introduction](#Introduction)
1. [Setup](#Setup)
1. [Data](#Data)
1. [Train](#Settingup)
1. [Autopilot Results](#Results)
1. [Evaluate Top Candidates](#Evaluation)
1. [Cleanup](#Cleanup)


---
## Introduction <a name="Intro"></a>

[Amazon SageMaker Autopilot](https://aws.amazon.com/sagemaker/autopilot/) is an automated machine learning (commonly referred to as AutoML) solution for tabular datasets. You can use SageMaker Autopilot in different ways:

- On autopilot (hence the name) or with human guidance
- Through the no-code UI (in SageMaker Studio), or via the AWS SDKs.

This notebook will demonstrate Autopilot via code with the high-level [SageMaker Python SDK](https://github.com/aws/sagemaker-python-sdk) - training, evaluating, and deploying an ML model for **churn prediction**.

Losing customers is costly for any business.  Identifying unhappy customers early on gives you a chance to offer them incentives to stay.  This notebook describes using machine learning for the automated identification of unhappy customers, also known as customer churn prediction.

We use an example of churn that is familiar to many: leaving a mobile phone operator.  Seems like I can always find fault with my provider du jour! And if my provider knows that I’m thinking of leaving, it can offer timely incentives – for example a new phone upgrade or activating a new feature.  Incentives like this could keep me around, are often much more cost effective for the business than losing and reacquiring a customer.

---
## Setup

Before getting started, we'll check & upgrade a few installed library versions to avoid some past incompatibilities (see [numpy#18355](https://github.com/numpy/numpy/issues/18355) and [aiobotocore#905](https://github.com/aio-libs/aiobotocore/issues/905)).

> ⚠️ **Note:** If you have any other notebooks running in the same Studio "app" (same kernel and instance type) that have already imported these libraries into memory, you might see unexpected errors and need to restart those notebook kernels after this install.

In [2]:
!pip install "pandas>=1.0.5" "s3fs>=2022.01.0"

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Collecting pandas>=1.0.5
  Downloading pandas-1.3.5-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.3 MB)
     |████████████████████████████████| 11.3 MB 26.6 MB/s            
[?25hCollecting s3fs>=2022.01.0
  Downloading s3fs-2022.2.0-py3-none-any.whl (26 kB)
Collecting aiobotocore~=2.1.0
  Downloading aiobotocore-2.1.2.tar.gz (58 kB)
     |████████████████████████████████| 58 kB 783 kB/s             
[?25h  Preparing metadata (setup.py) ... [?25ldone
[?25hCollecting fsspec==2022.02.0
  Downloading fsspec-2022.2.0-py3-none-any.whl (134 kB)
     |████████████████████████████████| 134 kB 35.2 MB/s            
Collecting botocore<1.23.25,>=1.23.24
  Downloading botocore-1.23.24-py3-none-any.whl (8.4 MB)
     |████████████████████████████████| 8.4 MB 71.7 MB/s            
Building wheels for collected packages: aiobotocore
  Building wheel for aiobotocore (setup.py) ... [?25l

With the right library versions installed, we can import the ones we'll use in the exercise:

In [3]:
# Python Built-Ins:
import io
import json
from pprint import pprint
import time
from urllib.parse import urlparse

# External Dependencies:
import boto3  # The general-purpose AWS SDK for Python
import matplotlib.pyplot as plt  # Graph plotting
import numpy as np  # Numeric & matrix utilities
import pandas as pd  # DataFrame (tabular data) tools
import sagemaker  # High-level Python SDK for Amazon SageMaker
from sagemaker import AutoML
from sklearn import metrics as skmetrics  # Model evaluation metrics

print(f"sagemaker SDK v{sagemaker.__version__}")

sagemaker SDK v2.70.0


Next, we can connect to AWS services and **configure**:

- The [Amazon S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html) and prefix to use for storing data and artifacts.
- The [IAM role](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html) ARN to be used for accessing data and other resources.

In [4]:
sm_session = sagemaker.Session() 

bucket = sm_session.default_bucket()
prefix = "sagemaker/DEMO-autopilot-churn"
print(f"Saving S3 data to: s3://{bucket}/{prefix}")

role = sagemaker.get_execution_role()
print(f"Using IAM role: {role}")

Saving S3 data to: s3://sagemaker-us-east-1-581573883648/sagemaker/DEMO-autopilot-churn
Using IAM role: arn:aws:iam::581573883648:role/mod-43805e59c2374e9b-SageMakerExecutionRole-1J4223PLLF338


---
## Data

Mobile operators have historical records on which customers ultimately ended up churning and which continued using the service. We can use this historical information to construct an ML model of one mobile operator’s churn using a process called training. After training the model, we can pass the profile information of an arbitrary customer (the same profile information that we used to train the model) to the model, and have the model predict whether this customer is going to churn. Of course, we expect the model to make mistakes–after all, predicting the future is tricky business! But I’ll also show how to deal with prediction errors.

The dataset we will use is synthetically generated, but indictive of the types of features you'd see in this use case.

In [5]:
!aws s3 cp s3://sagemaker-sample-files/datasets/tabular/synthetic/churn.txt ./data/churn.txt

download: s3://sagemaker-sample-files/datasets/tabular/synthetic/churn.txt to data/churn.txt


### Inspect your Dataset

Before you run Autopilot on the dataset, first perform a check of the dataset to make sure that it has no obvious errors. The Autopilot process can take long time, and it's generally a good practice to inspect the dataset before you start a job. This particular dataset is small, so you can simply load it into notebook memory using Pandas.

If you have a larger dataset that won't fit into notebook memory, you could inspect the dataset offline using a big data analytics tool like Apache Spark. [Deequ](https://github.com/awslabs/deequ) is a library built on top of Apache Spark that can be helpful for performing checks on large datasets.

In [6]:
raw_data = pd.read_csv("./data/churn.txt")
pd.set_option("display.max_columns", 500)
raw_data

Unnamed: 0,State,Account Length,Area Code,Phone,Int'l Plan,VMail Plan,VMail Message,Day Mins,Day Calls,Day Charge,Eve Mins,Eve Calls,Eve Charge,Night Mins,Night Calls,Night Charge,Intl Mins,Intl Calls,Intl Charge,CustServ Calls,Churn?
0,PA,163,806,403-2562,no,yes,300,8.162204,3,7.579174,3.933035,4,6.508639,4.065759,100,5.111624,4.928160,6,5.673203,3,True.
1,SC,15,836,158-8416,yes,no,0,10.018993,4,4.226289,2.325005,0,9.972592,7.141040,200,6.436188,3.221748,6,2.559749,8,False.
2,MO,131,777,896-6253,no,yes,300,4.708490,3,4.768160,4.537466,3,4.566715,5.363235,100,5.142451,7.139023,2,6.254157,4,False.
3,WY,75,878,817-5729,yes,yes,700,1.268734,3,2.567642,2.528748,5,2.333624,3.773586,450,3.814413,2.245779,6,1.080692,6,False.
4,WY,146,878,450-4942,yes,no,0,2.696177,3,5.908916,6.015337,3,3.670408,3.751673,250,2.796812,6.905545,4,7.134343,6,True.
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4995,NH,4,787,151-3162,yes,yes,800,10.862632,5,7.250969,6.936164,1,8.026482,4.921314,350,6.748489,4.872570,8,2.122530,9,False.
4996,SD,140,836,351-5993,no,no,0,1.581127,8,3.758307,7.377591,7,1.328827,0.939932,300,4.522661,6.938571,2,4.600473,4,False.
4997,SC,32,836,370-3127,no,yes,700,0.163836,5,4.243980,5.841852,3,2.340554,0.939469,450,5.157898,4.388328,7,1.060340,6,False.
4998,MA,142,776,604-2108,yes,yes,600,2.034454,5,3.014859,4.140554,3,3.470372,6.076043,150,4.362780,7.173376,3,4.871900,7,True.


By modern standards, it’s a relatively small dataset, with only 5,000 records, where each record uses 21 attributes to describe the profile of a customer of an unknown US mobile operator. The attributes are:

- `State`: the US state in which the customer resides, indicated by a two-letter abbreviation; for example, OH or NJ
- `Account Length`: the number of days that this account has been active
- `Area Code`: the three-digit area code of the corresponding customer’s phone number
- `Phone`: the remaining seven-digit phone number
- `Int’l Plan`: whether the customer has an international calling plan: yes/no
- `VMail Plan`: whether the customer has a voice mail feature: yes/no
- `VMail Message`: presumably the average number of voice mail messages per month
- `Day Mins`: the total number of calling minutes used during the day
- `Day Calls`: the total number of calls placed during the day
- `Day Charge`: the billed cost of daytime calls
- `Eve Mins, Eve Calls, Eve Charge`: the billed cost for calls placed during the evening
- `Night Mins`, `Night Calls`, `Night Charge`: the billed cost for calls placed during nighttime
- `Intl Mins`, `Intl Calls`, `Intl Charge`: the billed cost for international calls
- `CustServ Calls`: the number of calls placed to Customer Service
- `Churn?`: whether the customer left the service: true/false

The last attribute, `Churn?`, is known as the target attribute–the attribute that we want the ML model to predict. We'll save this information to use later:

In [7]:
# Configure which column is the target and which of the values is 'true':
target_attribute_name = "Churn?"
target_attribute_true_value = "True."

# Print out summary of the target column:
target_attribute_values_counts = raw_data[target_attribute_name].value_counts()
print("Target column value counts:")
print(target_attribute_values_counts)
print(f"\nTarget 'True' value is: '{target_attribute_true_value}'")

# Check the configuration is good:
assert target_attribute_true_value in target_attribute_values_counts, (
    f"Couldn't find '{target_attribute_true_value}' in target column!"
)

Target column value counts:
False.    2502
True.     2498
Name: Churn?, dtype: int64

Target 'True' value is: 'True.'


### Reserve some data for final testing of the model

Divide the data into training and testing splits. The training split is used by SageMaker Autopilot. The testing split is reserved to perform inference using the suggested model.

In [8]:
train_data = raw_data.sample(frac=0.8, random_state=200)

test_data = raw_data.drop(train_data.index)

test_data_no_target = test_data.drop(columns=[target_attribute_name])

Let's save these train and test data splits to csv files locally

In [9]:
train_file = "data/train_data.csv"
train_data.to_csv(train_file, index=False, header=True)

test_file = "data/test_data.csv"
test_data_no_target.to_csv(test_file, index=False, header=False)

---
## Setting up the SageMaker Autopilot Job<a name="Settingup"></a>

We'll use the [`AutoML` class](https://sagemaker.readthedocs.io/en/stable/api/training/automl.html#sagemaker.automl.automl.AutoML) from the SageMaker Python SDK to invoke Autopilot to find the best ML pipeline for this dataset. 

The required inputs for invoking a Autopilot job are:

* Local or s3 location for input dataset (if local, the dataset will be uploaded to S3)
* Name of the column of the dataset you want to predict (`Churn?` in this case) 
* An IAM role

You can find more information about input dataset format requirements for Autopilot [in the SageMaker Developer Guide](https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-datasets-problem-types.html).

> ⏰ **Note:** Below we override the `max_candidates` for Autopilot to generate and train to be significantly lower than the default (250). This allows us to speed up the job run significantly for the workshop (from ~4 hours to ~20 minutes), at the expense of the accuracy of the output model.

In [10]:
# Create a unique job name (in case this experiment is run multiple times)
timestamp_suffix = time.strftime("%d-%H-%M-%S", time.gmtime())
base_job_name = "automl-churn-sdk-" + timestamp_suffix

automl = AutoML(
    role=role,
    target_attribute_name=target_attribute_name,
    base_job_name=base_job_name,
    sagemaker_session=sm_session,
    max_candidates=5,  # Maximum candidates Autopilot will generate (default 250)
)

You can also specify the type of problem you want to solve with your dataset (`Regression, MulticlassClassification, BinaryClassification`) with the `problem_type` keyword argument. In case you are not sure, SageMaker Autopilot will infer the problem type based on statistics of the target column (the column you want to predict). 

Because the target attribute, ```Churn?```, is binary, our model will be performing binary prediction, also known as binary classification. In this example we will let AutoPilot infer the type of problem for us.

You have the option to limit the running time of a SageMaker Autopilot job by providing either the maximum number of pipeline evaluations or "candidates" (one pipeline evaluation is called a `Candidate` because it generates a candidate model) or providing the total time allocated for the overall Autopilot job. Under default settings, this job takes about four hours to run. This varies between runs because of the exploratory nature of the process Autopilot uses to find optimal training parameters.

### Launching the SageMaker Autopilot Job<a name="Launching"></a>

You can now launch the Autopilot job by calling the `fit` method of the `AutoML` estimator.

In [11]:
automl.fit(train_file, job_name=base_job_name, wait=False, logs=False)

> **Note: Using a previous Autopilot Job**
> 
> If you want to retrieve a previous Autopilot job or an Autopilot job launched outside of this notebook, such as from the SageMaker Studio UI, from the CLI, etc, you can use the following lines to prior to the next cell. If you are using a different dataset, you must also override the following variables defined in the [Data](#Data) section in order to run the batch jobs and perform the analysis: `test_data`, `test_data_no_target`, `test_file`, `target_attribute_name`, `target_attribute_values`, and `target_attribute_true_value`.

In [12]:
# automl = AutoML.attach("automl-churn-sdk-21-09-03-39")

### Tracking SageMaker Autopilot Job Progress<a name="Tracking"></a>
SageMaker Autopilot job consists of the following high-level steps : 
* Analyzing Data, where the dataset is analyzed and Autopilot comes up with a list of ML pipelines that should be tried out on the dataset. The dataset is also split into train and validation sets.
* Feature Engineering, where Autopilot performs feature transformation on individual features of the dataset as well as at an aggregate level.
* Model Tuning, where the top performing pipeline is selected along with the optimal hyperparameters for the training algorithm (the last stage of the pipeline). 

We can use the `describe_auto_ml_job` method to check the status of our SageMaker Autopilot job.

In [13]:
print("JobStatus - Secondary Status")
print("----------------------------", end="")

job_run_status = None
secondary_status = None

while job_run_status not in ("Failed", "Completed", "Stopped"):
    if job_run_status is not None:
        time.sleep(60)
    describe_response = automl.describe_auto_ml_job()
    new_status = describe_response["AutoMLJobStatus"]
    new_secondary_status = describe_response["AutoMLJobSecondaryStatus"]
    
    if new_status == job_run_status and new_secondary_status == secondary_status:
        print(".", end="")
    else:
        print(f"\n{new_status} - {new_secondary_status} ", end="")
        job_run_status = new_status
        secondary_status = new_secondary_status

JobStatus - Secondary Status
----------------------------
InProgress - AnalyzingData ........
InProgress - FeatureEngineering ..........
InProgress - ModelTuning ........
InProgress - GeneratingExplainabilityReport .........
InProgress - GeneratingModelInsightsReport .......
Completed - Completed 

<a name="Quickstart"></a>
<div class="alert alert-info">
    <p>▶️ <b>Workshop fast-start</b></p>
    <p>To get started quickly with this workshop:</p>
    <ol>
        <li><b>Click</b> here to select this cell (a blue bar will appear to the left of it)</li>
        <li>In the menu bar above, select <b><i>Run</i></b> and <b><i>Run All Above Selected Cell</i></b></li>
    </ol>
    <p>This will automatically run all code cells above here, to make sure you get your AutoML job kicked off quickly and see results during the workshop.</p>
    <p>Once that's done, head back up to the <a href="#Intro"><u>Introduction</u></a> to explore the code.</p>
</div>

> ▶️ **While the job runs:** Are you able to re-create the same Autopilot job parameters using the SageMaker Studio UI?
>
> You'll find *Create Autopilot Experiment* under the `SageMaker Resources > Experiments and trials` Section of the left sidebar menu. ⬅

---
## Describing the SageMaker Autopilot Job Results <a name="Results"></a>

We can use the `describe_auto_ml_job` method to look up the best candidate generated by the SageMaker Autopilot job. This notebook demonstrate end-to-end Autopilot so that we have a already initialized `automl` object. 

In [14]:
best_candidate = automl.describe_auto_ml_job()["BestCandidate"]
best_candidate_name = best_candidate["CandidateName"]
pprint(best_candidate)
print("\n")
print("CandidateName: " + best_candidate_name)
print(
    "FinalAutoMLJobObjectiveMetricName: "
    + best_candidate["FinalAutoMLJobObjectiveMetric"]["MetricName"]
)
print(
    "FinalAutoMLJobObjectiveMetricValue: "
    + str(best_candidate["FinalAutoMLJobObjectiveMetric"]["Value"])
)

{'CandidateName': 'automl-churn-sdk-24-07-07-07ockN-004-5da525bd',
 'CandidateProperties': {'CandidateArtifactLocations': {'Explainability': 's3://sagemaker-us-east-1-581573883648/automl-churn-sdk-24-07-07-07/documentation/explainability/output'},
                         'CandidateMetrics': [{'MetricName': 'F1',
                                               'Set': 'Validation',
                                               'Value': 0.9363700151443481},
                                              {'MetricName': 'AUC',
                                               'Set': 'Validation',
                                               'Value': 0.9809200167655945},
                                              {'MetricName': 'Accuracy',
                                               'Set': 'Validation',
                                               'Value': 0.9355000257492065}]},
 'CandidateStatus': 'Completed',
 'CandidateSteps': [{'CandidateStepArn': 'arn:aws:sagemaker:us-east-1:5815

Due to some randomness in the algorithms involved, different runs will provide slightly different results, but accuracy will be around or above $93\%$, which is a good result.

### Check Top Candidates

In addition to the `best_candidate`, we can also explore the other top candidates generated by SageMaker Autopilot. 

We use the `list_candidates` method to see our other top candidates.

In [None]:
# Number of Autopilot candidates to evaluate and run batch transform jobs.
# Make sure that you do not put a larger TOP_N_CANDIDATES than the Batch Transform limit for ml.m5.xlarge instances in your account.
TOP_N_CANDIDATES = 5

In [None]:
candidates = automl.list_candidates(
    sort_by="FinalObjectiveMetricValue", sort_order="Descending", max_results=TOP_N_CANDIDATES
)
candidates_df = pd.DataFrame([
    {
        "Candidate Name": c["CandidateName"],
        "Objective Metric Name": c["FinalAutoMLJobObjectiveMetric"]["MetricName"],
        "Objective Metric Value": c["FinalAutoMLJobObjectiveMetric"]["Value"],
    }
    for c in candidates
])

candidates_df.sort_values(["Objective Metric Value"], ascending=False)

---
## Evaluate Top Candidates <a name="Evaluation"></a>

Once our SageMaker Autopilot job has finished, we can start running inference on the top candidates. In SageMaker, you can perform inference in two ways: online endpoint inference or batch transform inference. Lets focus on batch transform inference.

We'll perform batch transform on our top candidates and analyze some custom metrics from our top candidates' prediction results.

### Upload Data for Transform Jobs

We'll use the `test_data` which we defined when we split out data in train and test splits. We need to upload this data to S3. As a refresher, here's `test_data`

In [None]:
test_data.head()

In [None]:
input_data_transform = sm_session.upload_data(path=test_file, bucket=bucket, key_prefix=prefix)
print("Uploaded transform data to {}".format(input_data_transform))

### Customize the Inference Response

For classification problem types, the inference containers generated by SageMaker Autopilot allow you to select the response content for predictions. Valid inference response content are defined below for binary classification and multiclass classification problem types.

- `'predicted_label'` - predicted class
- `'probability'` - In binary classification, the probability that the result is predicted as the second or `True` class in the target column. In multiclass classification, the probability of the winning class.
- `'labels'` - list of all possible classes
- `'probabilities'` - list of all probabilities for all classes (order corresponds with `'labels'`)

By default the inference contianers are configured to generate the `'predicted_label'`.

In this example we use `‘predicted_label’` and `‘probability’` to demonstrate how to evaluate the models with custom metrics. For the Churn dataset, the second or `True` class is the string`'True.'`


In [None]:
inference_response_keys = ["predicted_label", "probability"]

### Create the Models and Tranform Estimators

Let's create our Models and Batch Transform Estimators using the `create_model` method. We can specify our inference response using the `inference_response_keys` keyword argument.

In [None]:
s3_transform_output_path = "s3://{}/{}/inference-results/".format(bucket, prefix)

transformers = []

for candidate in candidates:
    model = automl.create_model(
        name=candidate["CandidateName"],
        candidate=candidate,
        inference_response_keys=inference_response_keys,
    )

    output_path = s3_transform_output_path + candidate["CandidateName"] + "/"

    transformers.append(
        model.transformer(
            instance_count=1,
            instance_type="ml.m5.xlarge",
            assemble_with="Line",
            output_path=output_path,
        )
    )

print("Setting up {} Batch Transform Jobs in `transformers`".format(len(transformers)))

### Start the Transform Jobs

Let's start all the transform jobs.

In [None]:
for transformer in transformers:
    transformer.transform(
        data=input_data_transform, split_type="Line", content_type="text/csv", wait=False
    )
    print("Starting transform job {}".format(transformer._current_job_name))

Now we wait for our transform jobs to finish.

In [None]:
running_job_names = [t.latest_transform_job.name for t in transformers]
print(f"Polling {len(running_job_names)} transform jobs ", end="")

while len(running_job_names):
    time.sleep(30)
    remaining = []
    for name in running_job_names:
        job_desc = sm_session.describe_transform_job(transformers[0].latest_transform_job.name)
        job_status = job_desc["TransformJobStatus"]
        if "fail" in job_status.lower():
            print(job_desc)
            raise RuntimeError(f"Transform job {name} failed with status {job_status}")
        elif job_status == "Completed":
            print(f"\nJob {name} completed ", end="")
        else:
            remaining.append(name)
    if len(remaining) != len(running_job_names):
        print(f"\n{len(remaining)} jobs still running ", end="")
    else:
        print(".", end="")
    running_job_names = remaining

### Evaluate the Inference Results

Now we analyze our inference results. Note that Pandas is able to read files direct from `s3://...` URIs.

> ⚠️ **Note:** Since this is a binary classification task, the `probability` output from Autopilot is the probability of the `True.` label, **not** of whichever label was assigned: So for example the probabilities of `True.` records should all be near 1 and those of `False.` should be near 0.

In [None]:
predictions = []

for transformer in transformers:
    output_path = transformer.output_path + "test_data.csv.out"
    print(output_path)
    predictions.append(
        pd.read_csv(
            transformer.output_path + "test_data.csv.out",
            header=None,
            names=inference_response_keys,
        )
    )

print("Example output:")
predictions[0].head()

Since there may be *different costs* associated with *different kinds of error*, there are several ways to describe the performance of classifier models. Here we'll use some metrics and utilities provided by the `sklearn.metrics` library to drill in to our prediction results.

In [None]:
# Convert messy text target value to true boolean:
labels = test_data[target_attribute_name].apply(
    lambda row: True if row == target_attribute_true_value else False
)

# Calculate candidate AUC and other basic metrics:
for prediction, candidate in zip(predictions, candidates):
    roc_auc = skmetrics.roc_auc_score(labels, prediction["probability"])
    ap = skmetrics.average_precision_score(labels, prediction["probability"])
    print(
        "%s's ROC AUC = %.2f, Average Precision = %.2f" % (candidate["CandidateName"], roc_auc, ap)
    )
    print(
        skmetrics.classification_report(
            test_data[target_attribute_name],
            prediction["predicted_label"],
        )
    )
    print()

Plot the [Receiver Operating Characteristic (ROC)](https://en.wikipedia.org/wiki/Receiver_operating_characteristic) curve for the model candidates:

In [None]:
fpr_tpr = []
for prediction in predictions:
    fpr, tpr, _ = skmetrics.roc_curve(labels, prediction["probability"])
    fpr_tpr.append(fpr)
    fpr_tpr.append(tpr)

plt.figure(num=None, figsize=(16, 9), dpi=160, facecolor="w", edgecolor="k")
plt.plot(*fpr_tpr)
plt.legend([candidate["CandidateName"] for candidate in candidates], loc="lower right")
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.show()

Plot the [precision-recall curve](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_curve.html) for the model candidates:

In [None]:
precision_recall = []
for prediction in predictions:
    precision, recall, _ = skmetrics.precision_recall_curve(labels, prediction["probability"])
    precision_recall.append(recall)
    precision_recall.append(precision)

plt.figure(num=None, figsize=(16, 9), dpi=160, facecolor="w", edgecolor="k")
plt.plot(*precision_recall)
plt.legend([candidate["CandidateName"] for candidate in candidates], loc="lower left")
plt.xlabel("Recall")
plt.ylabel("Precision")
plt.title("Precision-Recall Curve")
plt.show()

To control costs, perhaps we'd like to select a minimum target **[precision](https://en.wikipedia.org/wiki/Precision_and_recall)** (percentage of customers offered an incentive who would actually have gone on to churn) and find the model that can yield the best **recall** (percentage of churning customers offered a retention incentive) at that operating point:

In [None]:
target_min_precision = 0.75

best_recall = 0
best_candidate_idx = -1
best_candidate_threshold = -1
candidate_idx = 0
for prediction in predictions:
    precision, recall, thresholds = skmetrics.precision_recall_curve(
        labels, prediction["probability"],
    )
    threshold_idx = np.argmax(precision >= target_min_precision)
    if recall[threshold_idx] > best_recall:
        best_recall = recall[threshold_idx]
        best_candidate_threshold = thresholds[threshold_idx]
        best_candidate_idx = candidate_idx
    candidate_idx += 1

print("Best Candidate Name: {}".format(candidates[best_candidate_idx]["CandidateName"]))
print("Best Candidate Threshold (Operation Point): {}".format(best_candidate_threshold))
print("Best Candidate Recall: {}".format(best_recall))

Get predictions of the best model based on the selected operating point.

In [None]:
prediction_default = \
    predictions[best_candidate_idx]["predicted_label"] == target_attribute_true_value
prediction_updated = predictions[best_candidate_idx]["probability"] >= best_candidate_threshold

print(
    "Default Operating Point: recall={}, precision={}".format(
        skmetrics.recall_score(labels, prediction_default),
        skmetrics.precision_score(labels, prediction_default)
    )
)
print(
    "Updated Operating Point: recall={}, precision={}".format(
        skmetrics.recall_score(labels, prediction_updated),
        skmetrics.precision_score(labels, prediction_updated)
    )
)

### Deploy the Selected Candidate

After performing the analysis above, we can deploy the candidate that provides the best recall. We will use the `deploy` method to create the online inference endpoint. We'll use the same `inference_response_keys` from out batch transform jobs, but you can customize this as you wish. If `inference_response_keys` is not specified, only the `'predicted_label'` will be returned.


In [None]:
inference_response_keys

In [None]:
from sagemaker.predictor import Predictor
from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import CSVDeserializer

predictor = automl.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.xlarge",
    candidate=candidates[best_candidate_idx],
    inference_response_keys=inference_response_keys,
    predictor_cls=Predictor,
    serializer=CSVSerializer(),
    deserializer=CSVDeserializer(),
)

print("Created endpoint: {}".format(predictor.endpoint_name))

Once we have created our endpoint, we can send real-time predictions to the endpoint. The inference output will contain the model's`predicted_label` and `probability`. We use the custom threshold calculated above to determine our own `custom_predicted_label` based on the probability score in the inference response. If a `probability` is less than the `best_candidate_threshold`, the `custom_predicted_label` is the `False.` class. If a `probability` is greater than of equal to the `best_candidate_threshold`, the `custom_predicted_label` is the `True.` class. 

In [None]:
best_candidate_threshold

In [None]:
%%time
prediction = predictor.predict(test_data_no_target.to_csv(sep=",", header=False, index=False))
prediction_df = pd.DataFrame(prediction, columns=inference_response_keys)
custom_predicted_labels = prediction_df.iloc[:, 1].astype(float).values >= best_candidate_threshold
prediction_df["custom_predicted_label"] = custom_predicted_labels
prediction_df["custom_predicted_label"] = prediction_df["custom_predicted_label"].map(
    {
        False: target_attribute_values_counts.index[0],
        True: target_attribute_values_counts.index[1],
    }
)
prediction_df

---
## Cleanup <a name="Cleanup"></a>

The Autopilot job creates many underlying artifacts such as dataset splits, preprocessing scripts, or preprocessed data, etc. This code, when un-commented, deletes them. This operation deletes all the generated models and the auto-generated notebooks as well. 

In [None]:
# s3 = boto3.resource('s3')
# s3_bucket = s3.Bucket(bucket)

# s3_bucket.objects.filter(Prefix=prefix).delete()

Finally, we delete the created SageMaker Models and the deployed SageMaker Endpoint.

In [None]:
# predictor.delete_endpoint(delete_endpoint_config=True)
# predictor.delete_model()

# for transformer in transformers:
#     transformer.delete_model()