# Challenge 3: Configuring and running an Amazon SageMaker Inference Recommender job

In this notebook, you use SageMaker Inference Recommender, your production model, and a payload of data to find the best instance type for your endpoint.

SageMaker Inference Recommender is a capability of SageMaker that reduces the time required to get machine learning (ML) models in production by automating load tests and optimizing model performance across instance types. You use Inference Recommender to select a real-time inference endpoint that delivers the best performance at the lowest cost.

## Task 3.1: Environment setup

In this task, you set up your environment.

In [None]:
#install-dependencies
%matplotlib inline
import json
import boto3
import time
import pandas as pd
import matplotlib.pyplot as plt
from sagemaker import get_execution_role, session,image_uris

region = boto3.Session().region_name
role = get_execution_role()
sm_session = session.Session(boto3.Session())
sm = boto3.Session().client("sagemaker")
sm_runtime = boto3.Session().client("sagemaker-runtime")
cw = boto3.Session().client("cloudwatch")

bucket = sm_session.default_bucket()
prefix = 'sagemaker/abalone'
payload_archive_name = "payload.tar.gz"


In [None]:
#list-model-name
models = sm.list_models(NameContains='Abalone')
model_details = pd.json_normalize(models['Models'])
model_name = model_details['ModelName'][0]
print (model_name)
#get-model-s3uri
desc_model= sm.describe_model(ModelName=model_name)
model_attr = pd.json_normalize(desc_model['PrimaryContainer'])
model_url = model_attr['ModelDataUrl'][0]
print (model_url)

In [None]:
#list-endpoint-config
endpoint_config= sm.list_endpoint_configs(NameContains='abalone')
print (endpoint_config)

In [None]:
#list-endpoint
endpoint_name= sm.list_endpoints(NameContains='abalone')
print (endpoint_name)

# Task 3.2: Create a payload archive

In this task, you create an archive that contains individual files that Inference Recommender can send to the SageMaker endpoints. 

Inference Recommender randomly samples files from the created archive to ensure it contains a similar distribution of payloads that you would expect in production.

In [None]:
!tar -cvzf {payload_archive_name} "data/abalone_data_new_nolabel.csv"

In [None]:
#upload-payload-archive-to-S3
sample_payload_url = sm_session.upload_data(
    path=payload_archive_name, key_prefix="payload"
)

print (sample_payload_url)

## Task 3.3: Register model in Model Registry

In order to use Inference Recommender, you must have a versioned model in SageMaker Model Registry. To register a model in the Model Registry, you must have a model artifact packaged in a tarball and an inference container image. 

Registering a model includes the following steps:

- **Create Model Group**: This is a one-time task per machine learning use case. A Model Group contains one or more versions of your packaged model.

- **Register Model Version/Package**: This task is performed for each new packaged model version.


In [None]:
#set-image-uri
image_uri = image_uris.retrieve("xgboost", boto3.Session().region_name, "1.5-1")

### Challenge: Model group configuration

In the next cells, you create a model package group. What information is important to include in the model package group so you know the framework and task required?

Troubleshoot the next cells until you can create the model package group.

In [None]:
# ML framework details
#framework = "XGBOOST"
framework_version = "1.5-1"

# ML model details
ml_domain = "MACHINE_LEARNING"
#ml_task = "REGRESSION"

In [None]:
#Create Model Group

model_package_group_name = "{}-cpu-models-".format(framework) + str(round(time.time()))
model_package_group_description = "{} models".format(ml_task.lower())

model_package_group_input_dict = {
    "ModelPackageGroupName": model_package_group_name,
    "ModelPackageGroupDescription": model_package_group_description,
}

create_model_package_group_response = sm.create_model_package_group(
    **model_package_group_input_dict
)
print(
    "ModelPackageGroup ARN : {}".format(create_model_package_group_response["ModelPackageGroupArn"])
)

<i class="far fa-eye" style="color:#262262" aria-hidden="true"></i> **Hint:** Validate the missing parameters and update the **ML Framework Details** cell to declare the missing parameters and re-run the cell again

When the model package group creates successfully, you should see the model package group ARN.

### Register Model

In this step, you register your model that was packaged in the prior steps as a new version in SageMaker Model Registry. 

First, you configure the model package and version identifying which model package group this new model should be registered within as well as identify the initial approval status. You also identify the domain and task for your model. These values were set earlier in the notebook where ml_domain = 'MACHINE_LEARNING' and ml_task = 'REGRESSION'

In [None]:
#register-model-Package

model_package_description = "{} {} inference recommender".format(framework, model_name)

model_approval_status = "PendingManualApproval"

create_model_package_input_dict = {
    "ModelPackageGroupName": model_package_group_name,
    "Domain": ml_domain.upper(),
    "Task": ml_task.upper(),
    "SamplePayloadUrl": sample_payload_url,
    "ModelPackageDescription": model_package_description,
    "ModelApprovalStatus": model_approval_status,
}


## Task 3.4: Set up inference specification

In this task, you set up the inference specification configuration for your model version. This contains information on how the model should be hosted.

Inference Recommender expects a single input MIME type for sending requests.

In [None]:
#set-MIME-type
input_mime_types = ["text/csv"]

Now, you specify a set of instance types. Inference Recommender provides recommendations within the set of instances you select.

In [None]:
#set-inference-types
supported_realtime_inference_types = [
    "ml.m5.large",
    "ml.m5.xlarge",
    "ml.m4.xlarge",
    "ml.m5.2xlarge",
    "ml.m5.4xlarge"
]

In [None]:
#define-inference-specification
modelpackage_inference_specification = {
    "InferenceSpecification": {
        "Containers": [
            {
                "Image": image_uri,
                "Framework": framework.upper(),
                "FrameworkVersion": framework_version,
                "NearestModelName": model_name,
            }
        ],
        "SupportedContentTypes": input_mime_types,  # required, must be non-null
        "SupportedResponseMIMETypes": [],
        "SupportedRealtimeInferenceInstanceTypes": supported_realtime_inference_types,  # optional
    }
}

# Specify the model data
modelpackage_inference_specification["InferenceSpecification"]["Containers"][0][
    "ModelDataUrl"
] = model_url


Now that you configured the model package, the next step is to create the model package/version in SageMaker Model Registry

In [None]:
#create-model-package
create_model_package_input_dict.update(modelpackage_inference_specification)

In [None]:
create_mode_package_response = sm.create_model_package(**create_model_package_input_dict)
model_package_arn = create_mode_package_response["ModelPackageArn"]
print("ModelPackage Version ARN : {}".format(model_package_arn))


## Task 3.5: Create an Inference Recommender Default Job

Now the model is registered in Model Registry, you run a 'Default' job to get instance recommendations. 

This job requires the ModelPackageVersionArn and comes back with recommendations within **45** minutes.

The output is a list of instance type recommendations with associated *environment variables*, *cost*, *throughput* and *latency metrics*.

In [None]:
#set-job-name-job-type
job_name = model_name + "-instance-" + str(round(time.time()))
job_description = "{} {}".format(framework, model_name)
job_type = "Default"
print(job_name)

In [None]:
#create-inference-recommendation-job
response = sm.create_inference_recommendations_job(
    JobName=job_name,
    JobDescription=job_description,  # optional
    JobType=job_type,
    RoleArn=role,
    InputConfig={"ModelPackageVersionArn": model_package_arn},
)

print(response)

## Task 3.6: Instance Recommendation Results

Each inference recommendation includes InstanceType, InitialInstanceCount, EnvironmentParameters which are tuned environment variable parameters for better performance and also includes performance metrics such as CpuUtilization, MemoryUtilization and cost metrics such as MaxInvocations, ModelLatency, CostPerHour and CostPerInference. 

These metrics may help to narrow down to a specific endpoint configuration that suits your use case best.

Since the execution of the inference recommender job you started above does not finish for **45** minutes, view a inference recommendations report from a file that was generated and pre-loaded from an earlier Inference Recommender job run.

In [None]:
df = pd.read_csv ('data/inference_recommendations.csv')
pd.set_option("max_colwidth", 400)
df.head()

Notice that the five instances you included in the *supported_realtime_inference_types* list are included in the Inference Recommender job results. 

Which instance has the smallest *CostPerHour* value?

The **ml.m5.large** instance has the smallest *CostPerHour* value.

Which instance has the lowest *ModelLatency*?

The **ml.m5.4xlarge** instance has the lowest *ModelLatency*.

Take a moment to review the other columns and values for each instance.

<i class="fas fa-sticky-note" style="color:#ff6633"></i> **Note:** You can continue to **Challenge 4** while you are waiting for the Inference Recommender job to complete. You can also monitor the Inference Recommender job status in the SageMaker console under the Inference tab.

<i class="fas fa-sticky-note" style="color:#ff6633" aria-hidden="true"></i> **Note:** The Inference Recommender job takes approximately *45* minutes to generate a report. After **Challenge 4** is complete, you can come back and wait for the job to complete if you want to review the job results.

In [None]:
#Optional: get-inference-recommender-job-status
finished = False
while not finished:
    inference_recommender_job = sm.describe_inference_recommendations_job(JobName=job_name)
    if inference_recommender_job["Status"] in ["COMPLETED", "STOPPED", "FAILED"]:
        finished = True
    else:
        print("In progress")
        time.sleep(300)

if inference_recommender_job["Status"] == "FAILED":
    print("Inference recommender job failed ")
    print("Failed Reason: {}",inference_recommender_job["FailureReason"])
else:
    print("Inference recommender job completed")

In [None]:
#Optional: print-inference-recommender-job-results
data = [
    {**x["EndpointConfiguration"], **x["ModelConfiguration"], **x["Metrics"]}
    for x in inference_recommender_job["InferenceRecommendations"]
]
df = pd.DataFrame(data)
dropFilter = df.filter(["VariantName"])
df.drop(dropFilter, inplace=True, axis=1)
pd.set_option("max_colwidth", 400)
df.head()

In this notebook, you learned how to use SageMaker Inference Recommender with an XGBoost model to help determine the right CPU instance to reduce costs and maximize performance.

### Cleanup

When you have completed this notebook and viewed the Inference Recommender job results, do the following:

- Close this notebook file.
- Return to the lab session and continue with **Challenge 4**.