# Part 3 : Mitigate Bias, Train another unbiased Model and Put in the Model Registry

<a id='aup-overview'></a>

## [Overview](./0-AutoClaimFraudDetection.ipynb)
* [Notebook 0 : Overview, Architecture and Data Exploration](./0-AutoClaimFraudDetection.ipynb)
* [Notebook 1: Data Prep, Process, Store Features](./1-data-prep-e2e.ipynb)
* [Notebook 2: Train, Check Bias, Tune, Record Lineage, and Register a Model](./2-lineage-train-assess-bias-tune-registry-e2e.ipynb)
* **[Notebook 3: Mitigate Bias, Train New Model, Store in Registry](./3-mitigate-bias-train-model2-registry-e2e.ipynb)**
  * **[Architecture](#train2)**
  * **[Develop a second model](#second-model)**
  * **[Analyze the Second Model for Bias](#analyze-second-model)**
  * **[View Results of Clarify Bias Detection Job](#view-second-clarify-job)**
  * **[Configure and Run Clarify Explainability Job](#explainability)**
  * **[Create Model Package for the Second Trained Model](#model-package)**
* [Notebook 4: Deploy Model, Run Predictions](./4-deploy-run-inference-e2e.ipynb)
* [Notebook 5 : Create and Run an End-to-End Pipeline to Deploy the Model](./5-pipeline-e2e.ipynb)

In this notebook, we will describe how to detect bias using Clarify, Mitigate it with SMOTE, train another model, put it in the Model Registry along with all the Lineage of the Artifacts created along the way: data, code and model metadata.

### Install required and/or update third-party libraries

In [2]:
import sys
import IPython
!{sys.executable} -m pip install -U sagemaker smdebug
IPython.Application.instance().kernel.do_shutdown(True)

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Collecting sagemaker
  Using cached sagemaker-2.68.0-py2.py3-none-any.whl
Collecting smdebug
  Downloading smdebug-1.0.12-py2.py3-none-any.whl (270 kB)
     |████████████████████████████████| 270 kB 48.7 MB/s            
Collecting pyinstrument==3.4.2
  Downloading pyinstrument-3.4.2-py2.py3-none-any.whl (83 kB)
     |████████████████████████████████| 83 kB 182 kB/s             
[?25hCollecting pyinstrument-cext>=0.2.2
  Downloading pyinstrument_cext-0.2.4-cp37-cp37m-manylinux2010_x86_64.whl (20 kB)
Installing collected packages: pyinstrument-cext, pyinstrument, smdebug, sagemaker
  Attempting uninstall: sagemaker
    Found existing installation: sagemaker 2.66.0
    Uninstalling sagemaker-2.66.0:
      Successfully uninstalled sagemaker-2.66.0
Successfully installed pyinstrument-3.4.2 pyinstrument-cext-0.2.4 sagemaker-2.68.0 smdebug-1.0.12


{'status': 'ok', 'restart': True}

### Import libraries

In [1]:
import json
import time
import boto3
import sagemaker
import numpy as np
import pandas as pd
import awswrangler as wr
import matplotlib.pyplot as plt
from imblearn.over_sampling import SMOTE
from sagemaker.xgboost.estimator import XGBoost

from model_package_src.inference_specification import InferenceSpecification

%matplotlib inline

### Load stored variables
Run the cell below to load any prevously created variables. You should see a print-out of the existing variables. If you don't see anything you may need to create them again or it may be your first time running this notebook.

In [2]:
%store -r
%store

Stored variables and their in-db values:
bucket                             -> 'sagemaker-us-east-1-875692608981'
claims_fg_name                     -> 'fraud-detect-demo-claims'
claims_preprocessed                ->       policy_id  incident_severity  num_vehicles_i
claims_table                       -> 'fraud-detect-demo-claims-1636518800'
clarify_expl_job_name              -> 'Clarify-Explainability-2021-11-10-14-35-21-747'
col_order                          -> ['fraud', 'authorities_contacted_none', 'policy_li
customers_fg_name                  -> 'fraud-detect-demo-customers'
customers_preprocessed             ->       policy_id  customer_age  customer_education 
customers_table                    -> 'fraud-detect-demo-customers-1636518803'
database_name                      -> 'sagemaker_featurestore'
endpoint_config_name               -> 'fraud-detect-demo-xgboost-post-smote-endpoint-con
endpoint_name                      -> 'fraud-detect-demo-xgboost-post-smote-endpoint'
hyperp

**<font color='red'>Important</font>: You must have run the previous sequential notebooks to retrieve variables using the StoreMagic command.**

### Set region, boto3 and SageMaker SDK variables

In [3]:
# You can change this to a region of your choice
import sagemaker

region = sagemaker.Session().boto_region_name
print("Using AWS Region: {}".format(region))

Using AWS Region: us-east-1


In [4]:
boto3.setup_default_session(region_name=region)

boto_session = boto3.Session(region_name=region)

s3_client = boto3.client("s3", region_name=region)

sagemaker_boto_client = boto_session.client("sagemaker")

sagemaker_session = sagemaker.session.Session(
    boto_session=boto_session, sagemaker_client=sagemaker_boto_client
)

sagemaker_role = sagemaker.get_execution_role()

account_id = boto3.client("sts").get_caller_identity()["Account"]

In [60]:
# variables used for parameterizing the notebook run
model_2_name = f"{prefix}-xgboost-post-smote"

train_data_upsampled_s3_path = f"s3://{bucket}/{prefix}/data/train/upsampled/train.csv"
test_sample = f"s3://{bucket}/{prefix}/data/test/test.csv"
bias_report_2_output_path = f"s3://{bucket}/{prefix}/clarify-output/bias-2"
explainability_output_path = f"s3://{bucket}/{prefix}/clarify-output/explainability"
bucket_path = f"s3://{bucket}/{prefix}/debugger"
train_instance_count = 1
train_instance_type = "ml.m4.xlarge"

claify_instance_count = 1
clairfy_instance_type = "ml.c5.xlarge"

<a id ='train2'> </a>

## Architecture for this ML Lifecycle Stage : Train, Check Bias, Tune, Record Lineage, Register Model
[overview](#aup-overview)
----

![train-assess-tune-register](./images/e2e-2-pipeline-v3b.png)

<a id='second-model'></a>

## Develop a second model

[overview](#aup-overview)
----
In this second model, you will fix the gender imbalance in the dataset using SMOTE and train another model using XGBoost. This model will also be saved to our registry and eventually approved for deployment.

In [7]:
train = pd.read_csv("data/train.csv")
test = pd.read_csv("data/test.csv")

In [8]:
train

Unnamed: 0,fraud,authorities_contacted_none,policy_liability,customer_gender_female,incident_severity,total_claim_amount,driver_relationship_child,policy_state_az,police_report_available,driver_relationship_self,...,policy_state_wa,collision_type_rear,incident_dow,policy_state_id,auto_year,driver_relationship_na,customer_age,authorities_contacted_fire,driver_relationship_other,collision_type_na
0,0,0,3,1,1,9500.0,0,0,1,1,...,0,1,5,0,2012,0,43,0,0,0
1,0,0,2,0,1,13500.0,0,0,1,1,...,0,0,0,0,2009,0,38,0,0,0
2,0,0,3,0,1,9000.0,0,0,0,1,...,0,0,6,0,2020,0,66,0,0,0
3,0,1,1,1,0,15500.0,0,0,0,1,...,0,1,6,0,2019,0,37,0,0,0
4,0,0,3,0,2,47000.0,0,0,1,1,...,0,0,3,0,2018,0,49,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3995,0,0,1,0,1,10500.0,0,1,1,1,...,0,0,2,0,2017,0,45,0,0,0
3996,0,1,2,0,0,14000.0,0,0,0,1,...,0,0,4,0,2018,0,43,0,0,0
3997,0,0,1,0,1,34000.0,0,1,0,0,...,0,0,3,0,2018,0,60,0,0,0
3998,0,1,1,0,0,11500.0,0,0,0,1,...,0,1,3,0,2020,0,36,0,0,0


In [9]:
test

Unnamed: 0,fraud,authorities_contacted_none,policy_liability,customer_gender_female,incident_severity,total_claim_amount,driver_relationship_child,policy_state_az,police_report_available,driver_relationship_self,...,policy_state_wa,collision_type_rear,incident_dow,policy_state_id,auto_year,driver_relationship_na,customer_age,authorities_contacted_fire,driver_relationship_other,collision_type_na
0,1,0,1,1,1,7500.0,0,0,0,1,...,1,1,2,0,2009,0,24,0,0,0
1,0,0,0,0,1,21500.0,0,0,0,1,...,0,0,0,0,2010,0,50,0,0,0
2,0,0,2,1,0,15000.0,0,0,1,1,...,0,0,0,0,2019,0,56,0,0,0
3,0,0,0,0,1,7000.0,0,0,0,0,...,0,0,1,0,2008,1,35,0,0,1
4,0,1,0,0,0,13000.0,0,0,0,0,...,0,1,4,0,2014,0,38,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,0,0,1,0,1,14500.0,0,0,0,1,...,0,0,0,0,2017,0,31,0,0,0
996,0,0,2,0,1,10500.0,0,0,0,0,...,0,0,4,0,2009,1,33,0,0,1
997,0,1,2,0,0,5000.0,0,0,0,1,...,0,0,2,0,2015,0,28,0,0,0
998,1,0,1,1,0,14500.0,0,0,1,0,...,0,0,0,0,2020,1,41,0,0,1


<a id='smote'></a>

### Resolve class imbalance using SMOTE

To handle the imbalance, we can over-sample (i.e. upsample) the minority class using [SMOTE (Synthetic Minority Over-sampling Technique)](https://arxiv.org/pdf/1106.1813.pdf). After installing the imbalanced-learn module, if you receive an ImportError when importing SMOTE, then try restarting the kernel. 

#### Gender balance before SMOTE

In [10]:
gender = train["customer_gender_female"]
gender.value_counts()

0    2803
1    1197
Name: customer_gender_female, dtype: int64

#### Gender balance after SMOTE

In [11]:
sm = SMOTE(random_state=42)
train_data_upsampled, gender_res = sm.fit_resample(train, gender)
train_data_upsampled["customer_gender_female"].value_counts()

1    2803
0    2803
Name: customer_gender_female, dtype: int64

### Train new model


In [12]:
train_data_upsampled.to_csv("data/upsampled_train.csv", index=False)
s3_client.upload_file(
    Filename="data/upsampled_train.csv",
    Bucket=bucket,
    Key=f"{prefix}/data/train/upsampled/train.csv",
)

In [52]:
region = boto3.Session().region_name
container = sagemaker.image_uris.retrieve("xgboost", region, "1.3-1")

In [53]:
from sagemaker.debugger import ProfilerConfig, DebuggerHookConfig, Rule, ProfilerRule, rule_configs, CollectionConfig, FrameworkProfile
from sagemaker.estimator import Estimator


save_interval = 2
profiler_config=ProfilerConfig(
    framework_profile_params=FrameworkProfile()
)
debugger_hook_config=DebuggerHookConfig(
        s3_output_path=bucket_path,  # Required
        collection_configs=[
            CollectionConfig(name="metrics", parameters={"save_interval": str(save_interval)}),
            CollectionConfig(name="feature_importance", parameters={"save_interval": str(save_interval)}),
            CollectionConfig(name="full_shap", parameters={"save_interval": str(save_interval)}),
            CollectionConfig(name="average_shap", parameters={"save_interval": str(save_interval)}),
        ],
    )
rules= [Rule.sagemaker( rule_configs.loss_not_decreasing(), rule_parameters={"collection_names": "metrics", "num_steps": str(save_interval * 2),},),]

xgb_estimator = Estimator(
    hyperparameters=hyperparameters,
    role=sagemaker_role,
    instance_count=train_instance_count,
    instance_type=train_instance_type,
    image_uri=container,
    profiler_config=profiler_config,
    debugger_hook_config=debugger_hook_config,
    rules=rules
)

In [61]:
xgb_estimator.fit(inputs = {'train': train_data_upsampled_s3_path, "validation": test_sample},wait=True)
training_job_3_name = xgb_estimator.latest_training_job.job_name
%store training_job_3_name

2021-11-11 02:27:24 Starting - Starting the training job...
2021-11-11 02:27:50 Starting - Launching requested ML instancesLossNotDecreasing: InProgress
ProfilerReport-1636597644: InProgress
......
2021-11-11 02:28:50 Starting - Preparing the instances for training............
2021-11-11 02:30:50 Downloading - Downloading input data...
2021-11-11 02:31:11 Training - Downloading the training image..[34m[2021-11-11 02:31:32.856 ip-10-0-153-183.ec2.internal:1 INFO utils.py:27] RULE_JOB_STOP_SIGNAL_FILENAME: None[0m
[34m[2021-11-11:02:31:32:INFO] Imported framework sagemaker_xgboost_container.training[0m
[34m[2021-11-11:02:31:32:INFO] Failed to parse hyperparameter sagemaker_program value xgboost_starter_script.py to Json.[0m
[34mReturning the value itself[0m
[34m[2021-11-11:02:31:32:INFO] Failed to parse hyperparameter sagemaker_region value us-east-1 to Json.[0m
[34mReturning the value itself[0m
[34m[2021-11-11:02:31:32:INFO] Failed to parse hyperparameter sagemaker_job_name

In [26]:
if 'training_job_3_name' not in locals():
    
    xgb_estimator.fit(inputs = {'train': train_data_upsampled_s3_path, "validation": test_sample},wait=False)
    training_job_3_name = xgb_estimator.latest_training_job.job_name
    %store training_job_3_name
    
else:
    
    print(f'Using previous training job: {training_job_3_name}')

Using previous training job: sagemaker-xgboost-2021-11-11-01-20-16-590


### 3.4 Result

As a result of the above command, Amazon SageMaker starts **one training job and one rule job** for you. The first one is the job that produces the model parameters to be analyzed. The second one analyzes the model parameters to check if `train-error` and `validation-error` are not decreasing at any point during training.

Check the status of the training job below.
After your training job is started, Amazon SageMaker starts a rule-execution job to run the LossNotDecreasing rule.  

The cell below will block till the training job is complete.

In [41]:
import time

for _ in range(36):
    job_name = xgb_estimator.latest_training_job.name
    client = xgb_estimator.sagemaker_session.sagemaker_client
    description = client.describe_training_job(TrainingJobName=job_name)
    training_job_status = description["TrainingJobStatus"]
    rule_job_summary = xgb_estimator.latest_training_job.rule_job_summary()
    rule_evaluation_status = rule_job_summary[0]["RuleEvaluationStatus"]
    print(
        "Training job status: {}, Rule Evaluation Status: {}".format(
            training_job_status, rule_evaluation_status
        )
    )

    if training_job_status in ["Completed", "Failed"]:
        break

    time.sleep(10)

Training job status: InProgress, Rule Evaluation Status: InProgress
Training job status: InProgress, Rule Evaluation Status: InProgress
Training job status: InProgress, Rule Evaluation Status: InProgress
Training job status: InProgress, Rule Evaluation Status: InProgress
Training job status: InProgress, Rule Evaluation Status: InProgress
Training job status: InProgress, Rule Evaluation Status: InProgress
Training job status: InProgress, Rule Evaluation Status: InProgress
Training job status: InProgress, Rule Evaluation Status: InProgress
Training job status: InProgress, Rule Evaluation Status: InProgress
Training job status: InProgress, Rule Evaluation Status: InProgress
Training job status: InProgress, Rule Evaluation Status: InProgress
Training job status: InProgress, Rule Evaluation Status: InProgress
Training job status: InProgress, Rule Evaluation Status: InProgress
Training job status: InProgress, Rule Evaluation Status: InProgress
Training job status: InProgress, Rule Evaluation

### 3.5 Check the status of the Rule Evaluation Job

To get the rule evaluation job that Amazon SageMaker started for you, run the command below. The results show you the `RuleConfigurationName`, `RuleEvaluationJobArn`, `RuleEvaluationStatus`, `StatusDetails`, and `RuleEvaluationJobArn`.
If the model parameters meet a rule evaluation condition, the rule execution job throws a client error with `RuleEvaluationConditionMet`.

The logs of the rule evaluation job are available in the Cloudwatch Logstream `/aws/sagemaker/ProcessingJobs` with `RuleEvaluationJobArn`.

You can see that once the rule execution job starts, it identifies the loss not decreasing situation in the training job, it raises the `RuleEvaluationConditionMet` exception, and it ends the job.

In [69]:
xgb_estimator.latest_training_job.rule_job_summary()

[{'RuleConfigurationName': 'LossNotDecreasing',
  'RuleEvaluationJobArn': 'arn:aws:sagemaker:us-east-1:875692608981:processing-job/sagemaker-xgboost-2021-11--lossnotdecreasing-ea7013db',
  'RuleEvaluationStatus': 'Error',
  'StatusDetails': 'ClientError: No debugging data was saved by the training job. Check that the debugger hook was configured correctly before starting the training job. Exception: Training job has ended. All the collection files could not be loaded\nTraceback (most recent call last):\n  File "evaluate.py", line 119, in _create_trials\n    range_steps=(self.start_step, self.end_step))\n  File "/usr/local/lib/python3.7/site-packages/smdebug/trials/utils.py", line 25, in create_trial\n    return LocalTrial(name=name, dirname=path, **kwargs)\n  File "/usr/local/lib/python3.7/site-packages/smdebug/trials/local_trial.py", line 36, in __init__\n    self._load_collections()\n  File "/usr/local/lib/python3.7/site-packages/smdebug/trials/trial.py", line 168, in _load_collectio

## Section 4 - Analyze debugger output <a id='analyze-debugger-ouput'></a>

Now that you've trained the system, analyze the data.  Here, you focus on after-the-fact analysis.

You import a basic analysis library, which defines the concept of trial, which represents a single training run.

### Retrieving and Analyzing tensors

Before getting to analysis, here are some notes on concepts being used in Amazon SageMaker Debugger that help with analysis.
- ***Trial*** - Object that is a centerpiece of the SageMaker Debugger API when it comes to getting access to model parameters. It is a top level abstract that represents a single run of a training job. All model parameters emitted by a training job are associated with its trial.
- ***Tensor*** - Object that represents model parameters, such as weights, gradients, accuracy, and loss, that are saved during training job.

For more details on aforementioned concepts as well as on SageMaker Debugger API in general (including examples) see [SageMaker Debugger Analysis API](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/analysis.md) documentation.

In the following code cell, use a ***Trial*** to access model parameters. You can do that by inspecting currently running training job and extract necessary parameters from its debug configuration to instruct SageMaker Debugger where the data you are looking for is located. Keep in mind the following:
- model parameters are being stored in your own S3 bucket to which you can navigate and manually inspect its content if desired.
- You might notice a slight delay before trial object is created. This is normal as SageMaker Debugger monitors the corresponding bucket and waits until model parameters to appear. The delay is introduced by less than instantaneous upload of model parameters from a training container to your S3 bucket. 

In [48]:
from smdebug.trials import create_trial

s3_output_path = xgb_estimator.latest_job_debugger_artifacts_path()
trial = create_trial(s3_output_path)

[2021-11-11 02:00:33.527 datascience-1-0-ml-t3-medium-1abf3407f667f989be9d86559395:556 INFO s3_trial.py:42] Loading trial debug-output at path s3://sagemaker-us-east-1-875692608981/fraud-detect-demo/debugger/sagemaker-xgboost-2021-11-11-01-52-56-032/debug-output


MissingCollectionFiles: Training job has ended. All the collection files could not be loaded

----

### Next Notebook: [Deploy Model, Run Predictions](./06-Deploy-Run-Inference.ipynb)