# Batch Transform for Explanations

In this notebook, we'll use Amazon SageMaker Batch Transform to obtain
explanations for our complete dataset.

**Note**: When running this notebook on SageMaker Studio, you should make
sure the 'SageMaker JumpStart Data Science 1.0' image/kernel is used. You
can run all cells or step through them one at a time.

<p align="center">
  <img src="https://github.com/awslabs/sagemaker-explaining-credit-decisions/raw/master/docs/architecture_diagrams/stage_4.png" width="1000px">
</p>

We then import a variety of packages that will be used throughout
the notebook. One of the most important packages used throughout this
solution is the Amazon SageMaker Python SDK (i.e. `import sagemaker`). We
also import modules from our own custom package that can be found at
`./package`.

This solution relies on a config file to run the provisioned AWS resources. Run the cell below to generate that file.

In [2]:
import boto3
import os
import json
 
client = boto3.client('servicecatalog')
cwd = os.getcwd().split('/')
i= cwd.index('S3Downloads')
pp_name = cwd[i + 1]
pp = client.describe_provisioned_product(Name=pp_name)
record_id = pp['ProvisionedProductDetail']['LastSuccessfulProvisioningRecordId']
record = client.describe_record(Id=record_id)
 
keys = [ x['OutputKey'] for x in record['RecordOutputs'] if 'OutputKey' and 'OutputValue' in x]
values = [ x['OutputValue'] for x in record['RecordOutputs'] if 'OutputKey' and 'OutputValue' in x]
stack_output = dict(zip(keys, values))
 
with open(f'/root/S3Downloads/{pp_name}/stack_outputs.json', 'w') as f:
    json.dump(stack_output, f)

In [3]:
import boto3
from pathlib import Path
import sagemaker
from sagemaker.transformer import Transformer
import sys

sys.path.insert(0, '../package')
from package import config, utils



Up next, we define the current folder, a sagemaker session and a
sagemaker client (from `boto3`).

In [4]:
current_folder = utils.get_current_folder(globals())
sagemaker_session = sagemaker.Session()
sagemaker_client = boto3.client('sagemaker')

We define a function below to retrieve the same model that was created in
last stage. Model refers to the package of model assets and deployment
code. We could have created another model here (using the same model data
from the training stage) but let's use the same model to avoid
duplication.

In [5]:
def get_latest_model(name_contains):
    paginator = sagemaker_client.get_paginator('list_models')
    try:
        for page in paginator.paginate(NameContains=name_contains):
            models = page['Models']
            if len(models):
                return models[0]['ModelName']
    except:
        raise ValueError("Couldn't find any models with '{}' in name.".format(name_contains))

In [6]:
latest_model = get_latest_model(config.SOLUTION_PREFIX)
print("latest model: {}".format(latest_model))
job_name = latest_model

latest model: sagemaker-soln-ecd-js-j4zo3t-explainer


Unlike the last stage, where we deployed an endpoint, we define a
`Transformer` to perform the batch computation. We specify the instance
type that should be used for the computation (i.e. `ml.c5.xlarge`) and a
number of other parameters. `strategy='SingleRecord'` means that records
will be processed by the explainer one at a time. And `output_path`
defines where the Batch Transform output should be saved.

In [7]:
batch_explainer = Transformer(
    model_name=latest_model,
    instance_count=1,
    instance_type='ml.c5.xlarge',
    strategy='SingleRecord',
    assemble_with='Line',
    output_path='s3://' + str(Path(config.S3_BUCKET, 'explanations', job_name)) + '/',
    accept='application/json',
    base_transform_job_name=config.SOLUTION_PREFIX,
    sagemaker_session=sagemaker_session,
    tags=[{'Key': config.TAG_KEY, 'Value': config.SOLUTION_PREFIX}]
)

We haven't yet started the Batch Transform Job. Calling `.transform` does
that below. We also specify the `content_type` at this stage, which gives
us control over what type of entities we want to return from the
explainer. As an example, we have requested SHAP interaction values
during this batch job.

In [8]:
entities = [
    'data',
    'features',
    'prediction',
    'explanation_shap_values',
    'explanation_shap_interaction_values'
]
batch_explainer.transform(
    data='s3://' + str(Path(config.S3_BUCKET, config.DATASETS_S3_PREFIX, 'data_test')) + '/',
    content_type="application/json; entities={}".format(",".join(entities)),
    split_type='Line',
    wait=True
)

INFO:sagemaker:Creating transform job with name: sagemaker-soln-ecd-js-j4zo3t-2023-06-30-22-44-48-934


.........................[34mProcessing /opt/ml/code
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'[0m
[34mBuilding wheels for collected packages: entry-point
  Building wheel for entry-point (setup.py): started
  Building wheel for entry-point (setup.py): finished with status 'done'
  Created wheel for entry-point: filename=entry_point-1.0.0-py2.py3-none-any.whl size=40803 sha256=c95bec84974acfc4d3fba8b35b828264dfa2d5e372d8ae4c984a71795c5dc48e
  Stored in directory: /tmp/pip-ephem-wheel-cache-luf_feoi/wheels/3e/0f/51/2f1df833dd0412c1bc2f5ee56baac195b5be563353d111dca6[0m
[34mSuccessfully built entry-point[0m
[34mInstalling collected packages: entry-point[0m
[34mSuccessfully installed entry-point-1.0.0[0m
  import imp[0m
  LARGE_SPARSE_SUPPORTED = LooseVersion(scipy_version) >= '0.14.0'[0m
[34m[2023-06-30 22:48:54 +0000] [37] [INFO] Starting gunicorn 20.1.0[0m
[34m[2023-06-30 22:48:54 +0000] [37] [INFO] Listening at: 

After the Batch Transform Job has completed successfully, we will have a
complete set of explanations sitting in the Amazon S3 bucket.

## Clean Up

You can choose to delete the model we created now

In [9]:
explainer_name = "{}-explainer".format(config.SOLUTION_PREFIX)
sagemaker_client.delete_model(ModelName=explainer_name)

{'ResponseMetadata': {'RequestId': 'c707f407-81bf-4997-bc6c-4c68dc5f658d',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'c707f407-81bf-4997-bc6c-4c68dc5f658d',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '0',
   'date': 'Fri, 30 Jun 2023 22:49:45 GMT'},
  'RetryAttempts': 0}}


## Next Stage

Up next we'll wrap things up and discuss how to clean up the solution.

[Click here to continue.](./5_conclusion.ipynb)