![experiment example](./images/0.png)

---

## __Amazon SageMaker Studio__

- 功能1 - Amazon SageMaker 笔记本（Notebook）

- 功能2 - Amazon SageMaker 实验（Experiment）

- 功能3 - Amazon SageMaker 训练调试（Debugger）

- 功能4 – Amazon SageMaker 模型监控（Model Monitor）

- 功能5 - Amazon SageMaker 自动机器学习（Autopilot）

---

## __Amazon SageMaker Studio 概览__
![experiment example](./images/1.png)

---

## __功能1 - Amazon SageMaker 笔记本（Notebook）__

- 环境集成（演示）

    - 数据科学
    
    - 基础Python-3.6
    
    - MXNet(CPU优化)-1.6-py36
    
    - MXNet(GPU优化)-1.6-py36
    
    - PyTorch(CPU优化)-1.4-py36
    
    - PyTorch(GPU优化)-1.4-py36
    
    - TensorFlow(CPU优化)-1.15-py36
    
    - TensorFlow(GPU优化)-1.15-py36
    
    - TensorFlow(CPU优化)-2.1-py36
    
    - TensorFlow(GPU优化)-2.1-py36
    
    - __自定义!!__
    

- 快速启动
    
    - 启动笔记本比启动基于实例的笔记本更快（ 5-10 倍）
    
    
- 团队共享（演示）

    - 每个成员拥有一个独立实例的主目录
    
    - 支持共享托管环境 - 分享笔记本的同时，笔记本依赖Kernel配置也包含在笔记本的环境设置中
    
![experiment example](./images/2.png)

---

## __案例演示：使用XGBoost模型来预测用户流失__


- 功能2 - Amazon SageMaker 实验（Experiments）

- 功能3 - Amazon SageMaker 训练调试（Debugger）

- 功能4 – Amazon SageMaker 模型监控（Model Monitor）

---


In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import io
import os
import sys
import time
import json
from IPython.display import display
from time import strftime, gmtime
import boto3
import re

!{sys.executable} -m pip install sagemaker -U
!{sys.executable} -m pip install sagemaker-experiments


import sagemaker
from sagemaker import get_execution_role
from sagemaker.predictor import csv_serializer
from sagemaker.debugger import rule_configs, Rule, DebuggerHookConfig
from sagemaker.model_monitor import DataCaptureConfig, DatasetFormat, DefaultModelMonitor
from sagemaker.s3 import S3Uploader, S3Downloader

from smexperiments.experiment import Experiment
from smexperiments.trial import Trial
from smexperiments.trial_component import TrialComponent
from smexperiments.tracker import Tracker

Collecting sagemaker
  Downloading sagemaker-1.58.4.tar.gz (304 kB)
[K     |████████████████████████████████| 304 kB 514 kB/s eta 0:00:01
Building wheels for collected packages: sagemaker
  Building wheel for sagemaker (setup.py) ... [?25ldone
[?25h  Created wheel for sagemaker: filename=sagemaker-1.58.4-py2.py3-none-any.whl size=400797 sha256=2f200234c2065094958709c78fa24651673beb8d4653dc30b51e75a1bb349a29
  Stored in directory: /root/.cache/pip/wheels/52/c8/7a/3ffe244116cb281fcf99ab2867ff60b85b3017dd6393d3c1c6
Successfully built sagemaker
Installing collected packages: sagemaker
  Attempting uninstall: sagemaker
    Found existing installation: sagemaker 1.58.3
    Uninstalling sagemaker-1.58.3:
      Successfully uninstalled sagemaker-1.58.3
Successfully installed sagemaker-1.58.4


In [4]:
sess = boto3.Session()
sm = sess.client('sagemaker')
role = sagemaker.get_execution_role()

---
## 阶段1 数据准备

#### 1-1 数据展示：用户行为属性及画像

In [5]:
local_data_path = './data/training-dataset-with-header.csv'
data = pd.read_csv(local_data_path)
pd.set_option('display.max_columns', 500)     # Make sure we can see all of the columns
pd.set_option('display.max_rows', 10)         # Keep the output on one page
data

Unnamed: 0,Churn,Account Length,VMail Message,Day Mins,Day Calls,Eve Mins,Eve Calls,Night Mins,Night Calls,Intl Mins,Intl Calls,CustServ Calls,State_AK,State_AL,State_AR,State_AZ,State_CA,State_CO,State_CT,State_DC,State_DE,State_FL,State_GA,State_HI,State_IA,State_ID,State_IL,State_IN,State_KS,State_KY,State_LA,State_MA,State_MD,State_ME,State_MI,State_MN,State_MO,State_MS,State_MT,State_NC,State_ND,State_NE,State_NH,State_NJ,State_NM,State_NV,State_NY,State_OH,State_OK,State_OR,State_PA,State_RI,State_SC,State_SD,State_TN,State_TX,State_UT,State_VA,State_VT,State_WA,State_WI,State_WV,State_WY,Area Code_408,Area Code_415,Area Code_510,Int'l Plan_no,Int'l Plan_yes,VMail Plan_no,VMail Plan_yes
0,0,106,0,274.4,120,198.6,82,160.8,62,6.0,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0
1,0,28,0,187.8,94,248.6,86,208.8,124,10.6,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0,1,0
2,1,148,0,279.3,104,201.6,87,280.8,99,7.9,2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0
3,0,132,0,191.9,107,206.9,127,272.0,88,12.6,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0
4,0,92,29,155.4,110,188.5,104,254.9,118,8.0,4,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2328,0,106,0,194.8,133,213.4,73,190.8,92,11.5,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,1,0
2329,1,125,0,143.2,80,88.1,94,233.2,135,8.8,7,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0
2330,0,129,0,143.7,114,297.8,98,212.6,86,11.4,8,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,0,1,0
2331,0,159,0,198.8,107,195.5,91,213.3,120,16.5,7,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0


#### 1-2 数据上传至s3

In [6]:
account_id = sess.client('sts', region_name=sess.region_name).get_caller_identity()["Account"]
bucket = 'sagemaker-studio-{}-{}'.format(sess.region_name, account_id)
prefix = 'live-coding-xgboost-churn'

try:
    if sess.region_name == "us-east-1":
        sess.client('s3').create_bucket(Bucket=bucket)
    else:
        sess.client('s3').create_bucket(Bucket=bucket, 
                                        CreateBucketConfiguration={'LocationConstraint': sess.region_name})
except Exception as e:
    print("Looks like you already have a bucket of this name. That's good. Uploading the data files...")

s3url = S3Uploader.upload('data/train.csv', 's3://{}/{}/{}'.format(bucket, prefix,'train'))
print(s3url)
s3url = S3Uploader.upload('data/validation.csv', 's3://{}/{}/{}'.format(bucket, prefix,'validation'))
print(s3url)

Looks like you already have a bucket of this name. That's good. Uploading the data files...
s3://sagemaker-studio-cn-northwest-1-542319707026/live-coding-xgboost-churn/train/train.csv
s3://sagemaker-studio-cn-northwest-1-542319707026/live-coding-xgboost-churn/validation/validation.csv



## 阶段2 训练阶段

#### 2-1 算法选择 - 使用SageMaker内置XGBoost

In [7]:
from sagemaker.amazon.amazon_estimator import get_image_uri
docker_image_name = get_image_uri(boto3.Session().region_name, 'xgboost', repo_version='0.90-2')
docker_image_name

	get_image_uri(region, 'xgboost', '1.0-1').


'451049120500.dkr.ecr.cn-northwest-1.amazonaws.com.cn/sagemaker-xgboost:0.90-2-cpu-py3'

#### 2-2 s3对象构建

In [8]:
s3_input_train = sagemaker.s3_input(s3_data='s3://{}/{}/train'.format(bucket, prefix), content_type='csv')
s3_input_validation = sagemaker.s3_input(s3_data='s3://{}/{}/validation/'.format(bucket, prefix), content_type='csv')

### __功能2 演示__ - Amazon SageMaker 实验 （Experiments）
__使用Studio高效观察同一个问题，不同训练任务的指标对比__

![experiment example](./images/4.png)

#### 2-3 创建实验集合 customer_churn_experiment

In [12]:
sess = sagemaker.session.Session()

create_date = strftime("%Y-%m-%d-%H-%M-%S", gmtime())
customer_churn_experiment = Experiment.create(experiment_name="bowl-{}".format(create_date), 
                                              description="Using xgboost to predict customer churn", 
                                              sagemaker_boto_client=boto3.client('sagemaker'))

#### 2-4 超参数设计

In [16]:
hyperparams = {"max_depth":5,
               "subsample":0.8,
               "num_round":200,
               "eta":0.2,
               "gamma":4,
               "min_child_weight":6,
               "silent":0,
               "objective":'binary:logistic'}

#### 2-5 单个Trail，添加入Experiment

In [17]:
trial = Trial.create(trial_name="single-trial-{}".format(strftime("%Y-%m-%d-%H-%M-%S", gmtime())), 
                     experiment_name=customer_churn_experiment.experiment_name,
                     sagemaker_boto_client=boto3.client('sagemaker'))

xgb = sagemaker.estimator.Estimator(image_name=docker_image_name,
                                    role=role,
                                    hyperparameters=hyperparams,
                                    train_instance_count=1,
                                    train_instance_type='ml.m5.xlarge',
                                    output_path='s3://{}/{}/output'.format(bucket, prefix),
                                    base_job_name="demo-xgboost-customer-churn",
                                    sagemaker_session=sess)

xgb.fit({'train': s3_input_train,
         'validation': s3_input_validation}, 
        experiment_config={
            "ExperimentName": customer_churn_experiment.experiment_name, 
            "TrialName": trial.trial_name,
            "TrialComponentDisplayName": "Training",
        }
       )

INFO:sagemaker:Creating training-job with name: demo-xgboost-customer-churn-2020-05-21-01-35-38-393


2020-05-21 01:35:38 Starting - Starting the training job...
2020-05-21 01:35:49 Starting - Launching requested ML instances......
2020-05-21 01:36:47 Starting - Preparing the instances for training......
2020-05-21 01:37:44 Downloading - Downloading input data...
2020-05-21 01:38:30 Training - Downloading the training image..[34mINFO:sagemaker-containers:Imported framework sagemaker_xgboost_container.training[0m
[34mINFO:sagemaker-containers:Failed to parse hyperparameter objective value binary:logistic to Json.[0m
[34mReturning the value itself[0m
[34mINFO:sagemaker-containers:No GPUs detected (normal if no gpus installed)[0m
[34mINFO:sagemaker_xgboost_container.training:Running XGBoost Sagemaker in algorithm mode[0m
[34mINFO:root:Determined delimiter of CSV input is ','[0m
[34mINFO:root:Determined delimiter of CSV input is ','[0m
[34mINFO:root:Determined delimiter of CSV input is ','[0m
[34m[01:38:45] 2333x69 matrix with 160977 entries loaded from /opt/ml/input/data/

### （演示）观察Experiment中该任务的执行情况

#### 2-6 同时启动多个Trail，均加入Experiment

In [35]:
min_child_weights = [1, 5, 8, 10]

for weight in min_child_weights:
    hyperparams["min_child_weight"] = weight
    trial = Trial.create(trial_name="hyper-trial-{}-weight-{}".format(strftime("%Y-%m-%d-%H-%M-%S", gmtime()), weight), 
                         experiment_name=customer_churn_experiment.experiment_name,
                         sagemaker_boto_client=boto3.client('sagemaker'))

    t_xgb = sagemaker.estimator.Estimator(image_name=docker_image_name,
                                          role=role,
                                          hyperparameters=hyperparams,
                                          train_instance_count=1,
                                          train_instance_type='ml.m4.xlarge',
                                          output_path='s3://{}/{}/output'.format(bucket, prefix),
                                          base_job_name="demo-xgboost-customer-churn",
                                          sagemaker_session=sess)

    t_xgb.fit({'train': s3_input_train,
               'validation': s3_input_validation},
                wait=False,
                experiment_config={
                    "ExperimentName": customer_churn_experiment.experiment_name, 
                    "TrialName": trial.trial_name,
                    "TrialComponentDisplayName": "Training",
                }
               )

INFO:sagemaker:Creating training-job with name: demo-xgboost-customer-churn-2020-05-25-07-39-07-339
INFO:sagemaker:Creating training-job with name: demo-xgboost-customer-churn-2020-05-25-07-39-07-581
INFO:sagemaker:Creating training-job with name: demo-xgboost-customer-churn-2020-05-25-07-39-10-365
INFO:sagemaker:Creating training-job with name: demo-xgboost-customer-churn-2020-05-25-07-39-11-541


---

### __（演示）Experiments结果__

---

### __功能3 演示__ - Amazon SageMaker 调试（Debugger）
__通过调试任务及时发现训练过程中的问题__

![debuger example](./images/5.png)

#### 2-7 指定调试规则

In [9]:
from sagemaker.debugger import rule_configs, Rule, DebuggerHookConfig, CollectionConfig

save_interval = 5

_debugger_hook_config=DebuggerHookConfig(
  s3_output_path='s3://{}'.format(bucket),
  collection_configs=[
      CollectionConfig(
          name="metrics",
          parameters={
              "save_interval": str(save_interval)
          }
      ),
      CollectionConfig(
          name="feature_importance",
          parameters={
              "save_interval": str(save_interval)
          }
      ),
      CollectionConfig(
          name="average_shap",
          parameters={
              "save_interval": str(save_interval)
          }
      ),
  ],
)

_rules=[
  Rule.sagemaker(
      rule_configs.loss_not_decreasing(),
      rule_parameters={
          "collection_names": "metrics",
          "num_steps": str(save_interval * 2),
      },
  ),
  Rule.sagemaker(
      rule_configs.overtraining()
  ),
  Rule.sagemaker(
      rule_configs.overfit()
  )
]

#### 2-8 创建Trail试验，添加至Experiment中，启动训练任务，关联调试规则

In [10]:
_hyperparams = {"max_depth":6,
               "subsample":0.8,
               "num_round":5100,
               "eta":0.2,
               "gamma":4,
               "min_child_weight":6,
               "silent":0,
               "objective":'binary:logistic'}

In [13]:
trial = Trial.create(trial_name="debugger-trial-{}".format(strftime("%Y-%m-%d-%H-%M-%S", gmtime())), 
                     experiment_name=customer_churn_experiment.experiment_name,
                     sagemaker_boto_client=boto3.client('sagemaker'))

framework_xgb = sagemaker.estimator.Estimator(role=role,
                                              base_job_name="demo-xgboost-customer-churn",
                                              train_instance_count=1,
                                              train_instance_type='ml.m5.xlarge',
                                              image_name=docker_image_name,
                                              hyperparameters=_hyperparams,
                                              train_max_run=1800,
                                              debugger_hook_config=_debugger_hook_config,
                                              rules=_rules
                                             )

framework_xgb.fit({'train': s3_input_train,
                   'validation': s3_input_validation}, 
                  experiment_config={
                      "ExperimentName": customer_churn_experiment.experiment_name, 
                      "TrialName": trial.trial_name,
                      "TrialComponentDisplayName": "Training",
                  },
                 wait=False)

INFO:sagemaker:Creating training-job with name: demo-xgboost-customer-churn-2020-05-21-01-24-08-535


In [14]:
import time

sns = boto3.client('sns')

for _ in range(360):
    job_name = framework_xgb.latest_training_job.name
    client = framework_xgb.sagemaker_session.sagemaker_client
    description = client.describe_training_job(TrainingJobName=job_name)
    training_job_status = description["TrainingJobStatus"]
    rule_job_summary = framework_xgb.latest_training_job.rule_job_summary()
    
    for i in rule_job_summary:
        print("训练任务状态: {}, 调试'{}'规则状态: {}".format(training_job_status, i["RuleConfigurationName"],i["RuleEvaluationStatus"]))
        if i["RuleEvaluationStatus"] == "IssuesFound":
            sns_message = i['StatusDetails']
            # trigger sns
            response = sns.publish(
                TopicArn='arn:aws-cn:sns:cn-northwest-1:542319707026:training-debug-notification',    
                Message=sns_message
            )
            break
    else:
        print('========================================================')
        time.sleep(10)
        continue
    
    break

训练任务状态: InProgress, 调试'LossNotDecreasing'规则状态: InProgress
训练任务状态: InProgress, 调试'Overtraining'规则状态: InProgress
训练任务状态: InProgress, 调试'Overfit'规则状态: InProgress
训练任务状态: InProgress, 调试'LossNotDecreasing'规则状态: InProgress
训练任务状态: InProgress, 调试'Overtraining'规则状态: InProgress
训练任务状态: InProgress, 调试'Overfit'规则状态: InProgress
训练任务状态: InProgress, 调试'LossNotDecreasing'规则状态: InProgress
训练任务状态: InProgress, 调试'Overtraining'规则状态: InProgress
训练任务状态: InProgress, 调试'Overfit'规则状态: InProgress
训练任务状态: InProgress, 调试'LossNotDecreasing'规则状态: InProgress
训练任务状态: InProgress, 调试'Overtraining'规则状态: InProgress
训练任务状态: InProgress, 调试'Overfit'规则状态: InProgress
训练任务状态: InProgress, 调试'LossNotDecreasing'规则状态: InProgress
训练任务状态: InProgress, 调试'Overtraining'规则状态: InProgress
训练任务状态: InProgress, 调试'Overfit'规则状态: InProgress
训练任务状态: InProgress, 调试'LossNotDecreasing'规则状态: InProgress
训练任务状态: InProgress, 调试'Overtraining'规则状态: InProgress
训练任务状态: InProgress, 调试'Overfit'规则状态: InProgress
训练任务状态: InProgress, 调试'LossNotDecreasing'规则状态:

---

### __（演示）Debugger结果__

---

## 

## 阶段3 部署阶段

### __功能4 演示__ - Amazon SageMaker 模型监控（Model Monitor）
__周期性监控部署到生产环境中的模型输入输出数据漂移__


![model_monitor example](./images/6.png)

#### 3-1 模型部署

In [19]:
data_capture_prefix = '{}/datacapture'.format(prefix)

endpoint_name = "endpoint-customer-churn-" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print("EndpointName = {}".format(endpoint_name))

EndpointName = endpoint-customer-churn-2020-05-21-08-17-20


In [20]:
xgb_predictor = xgb.deploy(initial_instance_count=1, 
                           instance_type='ml.m4.xlarge',
                           endpoint_name=endpoint_name,
                           data_capture_config=DataCaptureConfig(enable_capture=True,
                                                                 sampling_percentage=20,
                                                                 destination_s3_uri='s3://{}/{}'.format(bucket, data_capture_prefix)
                                                                )
                           )

INFO:sagemaker:Creating model with name: demo-xgboost-customer-churn-2020-05-21-01-35-38-393
INFO:sagemaker:Creating endpoint with name endpoint-customer-churn-2020-05-21-08-17-20


-------------!

#### 3-2 调用在线服务接口 - 观察生产流量数据持久化

In [21]:
xgb_predictor.content_type = 'text/csv'
xgb_predictor.serializer = csv_serializer
xgb_predictor.deserializer = None

In [37]:
print("向模型服务集群{}发送生产流量. \n请等待...".format(endpoint_name))

with open('data/test_sample.csv', 'r') as f:
    for row in f:
        payload = row.rstrip('\n')
        response = xgb_predictor.predict(data=payload)
        print('模型推理结果：' + str(response))
        time.sleep(0.5)

向模型服务集群endpoint-customer-churn-2020-05-21-08-17-20发送生产流量. 
请等待...
模型推理结果：b'0.017154159024357796'
模型推理结果：b'0.008340253494679928'
模型推理结果：b'0.009035231545567513'
模型推理结果：b'0.1718788594007492'
模型推理结果：b'0.005973238963633776'
模型推理结果：b'0.024217693135142326'
模型推理结果：b'0.8408358097076416'
模型推理结果：b'0.02202780544757843'
模型推理结果：b'0.1613418161869049'
模型推理结果：b'0.016460275277495384'
模型推理结果：b'0.011541333980858326'
模型推理结果：b'0.03409477323293686'
模型推理结果：b'0.00764442328363657'
模型推理结果：b'0.03019275702536106'
模型推理结果：b'0.014996585436165333'
模型推理结果：b'0.008365938439965248'
模型推理结果：b'0.02354218252003193'
模型推理结果：b'0.025254514068365097'
模型推理结果：b'0.2682512700557709'
模型推理结果：b'0.027242988348007202'
模型推理结果：b'0.005637806374579668'
模型推理结果：b'0.7075712084770203'
模型推理结果：b'0.014109030365943909'
模型推理结果：b'0.022757399827241898'
模型推理结果：b'0.02336835488677025'
模型推理结果：b'0.09939386695623398'
模型推理结果：b'0.011612221598625183'
模型推理结果：b'0.9939847588539124'
模型推理结果：b'0.011626388877630234'
模型推理结果：b'0.11378251761198044'
模型推理结果：b'0.0107175391167

#### 3-3 观察模型输入和输出数据采集至s3

In [38]:
current_endpoint_capture_prefix = '{}/{}'.format(data_capture_prefix, endpoint_name)
print("找到数据采集文件:")
capture_files = S3Downloader.list("s3://{}/{}".format(bucket, current_endpoint_capture_prefix))
print(capture_files)

找到数据采集文件:
['s3://sagemaker-studio-cn-northwest-1-542319707026/live-coding-xgboost-churn/datacapture/endpoint-customer-churn-2020-05-21-08-17-20/AllTraffic/2020/05/21/08/47-19-279-b66cb582-6f15-48f0-97b1-e12f75cb6218.jsonl', 's3://sagemaker-studio-cn-northwest-1-542319707026/live-coding-xgboost-churn/datacapture/endpoint-customer-churn-2020-05-21-08-17-20/AllTraffic/2020/05/21/09/07-50-811-a68278d3-4f8b-410b-b959-77bf21bf5ce5.jsonl', 's3://sagemaker-studio-cn-northwest-1-542319707026/live-coding-xgboost-churn/datacapture/endpoint-customer-churn-2020-05-21-08-17-20/AllTraffic/2020/05/21/09/08-57-193-c6c53f56-d85e-46da-9409-beea2c7e27c5.jsonl', 's3://sagemaker-studio-cn-northwest-1-542319707026/live-coding-xgboost-churn/datacapture/endpoint-customer-churn-2020-05-21-08-17-20/AllTraffic/2020/05/21/09/09-59-837-f3d31ad4-4780-46ba-9f77-1c2c3efa6cd9.jsonl', 's3://sagemaker-studio-cn-northwest-1-542319707026/live-coding-xgboost-churn/datacapture/endpoint-customer-churn-2020-05-21-08-17-20/AllT

In [39]:
capture_file = S3Downloader.read_file(capture_files[-1])

print("=====单条数据====")
print(json.dumps(json.loads(capture_file.split('\n')[0]), indent=2)[:2000])

=====单条数据====
{
  "captureData": {
    "endpointInput": {
      "observedContentType": "text/csv",
      "mode": "INPUT",
      "data": "178,35,175.4,88,190.0,65,138.7,94,10.5,3,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1",
      "encoding": "CSV"
    },
    "endpointOutput": {
      "observedContentType": "text/csv; charset=utf-8",
      "mode": "OUTPUT",
      "data": "0.02202780544757843",
      "encoding": "CSV"
    }
  },
  "eventMetadata": {
    "eventId": "7cae4ba9-080d-45b4-a9a5-18ad14e414d9",
    "inferenceTime": "2020-05-25T08:39:21Z"
  },
  "eventVersion": "0"
}


#### 3-4 构造基准（Baseline）数据路径

In [41]:
baseline_prefix = prefix + '/baselining'
baseline_data_prefix = baseline_prefix + '/data'
baseline_results_prefix = baseline_prefix + '/results'

baseline_data_uri = 's3://{}/{}'.format(bucket,baseline_data_prefix)
baseline_results_uri = 's3://{}/{}'.format(bucket, baseline_results_prefix)
print('基准数据URI: {}'.format(baseline_data_uri))
print('基准数据分析结果URI: {}'.format(baseline_results_uri))
baseline_data_path = S3Uploader.upload("data/training-dataset-with-header.csv", baseline_data_uri)

基准数据URI: s3://sagemaker-studio-cn-northwest-1-542319707026/live-coding-xgboost-churn/baselining/data
基准数据分析结果URI: s3://sagemaker-studio-cn-northwest-1-542319707026/live-coding-xgboost-churn/baselining/results


#### 3-5 创建Baseline分析任务
自动生成constraints规则限定和基准数据的statistics统计结果

In [27]:
my_default_monitor = DefaultModelMonitor(role=role,
                                         instance_count=1,
                                         instance_type='ml.m5.xlarge',
                                         volume_size_in_gb=20,
                                         max_runtime_in_seconds=3600,
                                        )

baseline_job = my_default_monitor.suggest_baseline(baseline_dataset=baseline_data_path,
                                                   dataset_format=DatasetFormat.csv(header=True),
                                                   output_s3_uri=baseline_results_uri,
                                                   wait=True
)

INFO:sagemaker:Creating processing-job with name baseline-suggestion-job-2020-05-21-08-52-15-816



Job Name:  baseline-suggestion-job-2020-05-21-08-52-15-816
Inputs:  [{'InputName': 'baseline_dataset_input', 'S3Input': {'S3Uri': 's3://sagemaker-studio-cn-northwest-1-542319707026/live-coding-xgboost-churn/baselining/data/training-dataset-with-header.csv', 'LocalPath': '/opt/ml/processing/input/baseline_dataset_input', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}]
Outputs:  [{'OutputName': 'monitoring_output', 'S3Output': {'S3Uri': 's3://sagemaker-studio-cn-northwest-1-542319707026/live-coding-xgboost-churn/baselining/results', 'LocalPath': '/opt/ml/processing/output', 'S3UploadMode': 'EndOfJob'}}]
...........................[34m2020-05-21 08:56:27,984 - __main__ - INFO - All params:{'ProcessingJobArn': 'arn:aws-cn:sagemaker:cn-northwest-1:542319707026:processing-job/baseline-suggestion-job-2020-05-21-08-52-15-816', 'ProcessingJobName': 'baseline-suggestion-job-2020-05-21-08-52-15-816', 'Environment': {'d

In [42]:
S3Downloader.list("s3://{}/{}".format(bucket, baseline_results_prefix))

['s3://sagemaker-studio-cn-northwest-1-542319707026/live-coding-xgboost-churn/baselining/results/constraints.json',
 's3://sagemaker-studio-cn-northwest-1-542319707026/live-coding-xgboost-churn/baselining/results/statistics.json']

 - `constraints.json`文件为建议的监控数据漂移的类型和阈值
 - `statistics.json` 文件保存了基准数据的统计指标

In [43]:
baseline_job = my_default_monitor.latest_baselining_job
schema_df = pd.io.json.json_normalize(baseline_job.baseline_statistics().body_dict["features"])
schema_df.head(10)

  


Unnamed: 0,name,inferred_type,numerical_statistics.common.num_present,numerical_statistics.common.num_missing,numerical_statistics.mean,numerical_statistics.sum,numerical_statistics.std_dev,numerical_statistics.min,numerical_statistics.max,numerical_statistics.distribution.kll.buckets,numerical_statistics.distribution.kll.sketch.parameters.c,numerical_statistics.distribution.kll.sketch.parameters.k,numerical_statistics.distribution.kll.sketch.data
0,Churn,Integral,2333,0,0.139306,325.0,0.346265,0.0,1.0,"[{'lower_bound': 0.0, 'upper_bound': 0.1, 'cou...",0.64,2048.0,"[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0,..."
1,Account Length,Integral,2333,0,101.276897,236279.0,39.552442,1.0,243.0,"[{'lower_bound': 1.0, 'upper_bound': 25.2, 'co...",0.64,2048.0,"[[119.0, 100.0, 111.0, 181.0, 95.0, 104.0, 70...."
2,VMail Message,Integral,2333,0,8.214316,19164.0,13.776908,0.0,51.0,"[{'lower_bound': 0.0, 'upper_bound': 5.1, 'cou...",0.64,2048.0,"[[19.0, 0.0, 0.0, 40.0, 36.0, 0.0, 0.0, 24.0, ..."
3,Day Mins,Fractional,2333,0,180.226489,420468.4,53.987179,0.0,350.8,"[{'lower_bound': 0.0, 'upper_bound': 35.08, 'c...",0.64,2048.0,"[[178.1, 160.3, 197.1, 105.2, 283.1, 113.6, 23..."
4,Day Calls,Integral,2333,0,100.259323,233905.0,20.165008,0.0,165.0,"[{'lower_bound': 0.0, 'upper_bound': 16.5, 'co...",0.64,2048.0,"[[110.0, 138.0, 117.0, 61.0, 112.0, 87.0, 122...."
5,Eve Mins,Fractional,2333,0,200.050107,466716.9,50.015928,31.2,361.8,"[{'lower_bound': 31.2, 'upper_bound': 64.26, '...",0.64,2048.0,"[[212.8, 221.3, 227.8, 341.3, 286.2, 158.6, 29..."
6,Eve Calls,Integral,2333,0,99.573939,232306.0,19.675578,12.0,170.0,"[{'lower_bound': 12.0, 'upper_bound': 27.8, 'c...",0.64,2048.0,"[[100.0, 92.0, 128.0, 79.0, 86.0, 98.0, 112.0,..."
7,Night Mins,Fractional,2333,0,201.388598,469839.6,50.627961,23.2,395.0,"[{'lower_bound': 23.2, 'upper_bound': 60.37999...",0.64,2048.0,"[[226.3, 150.4, 214.0, 165.7, 261.7, 187.7, 20..."
8,Night Calls,Integral,2333,0,100.227175,233830.0,19.282029,42.0,175.0,"[{'lower_bound': 42.0, 'upper_bound': 55.3, 'c...",0.64,2048.0,"[[123.0, 120.0, 101.0, 97.0, 129.0, 87.0, 112...."
9,Intl Mins,Fractional,2333,0,10.253065,23920.4,2.778766,0.0,18.4,"[{'lower_bound': 0.0, 'upper_bound': 1.8399999...",0.64,2048.0,"[[10.0, 11.2, 9.3, 6.3, 11.3, 10.5, 0.0, 9.7, ..."


In [47]:
constraints_df = pd.io.json.json_normalize(baseline_job.suggested_constraints().body_dict["monitoring_config"])
# constraints_df.head(20)
constraints_df

  """Entry point for launching an IPython kernel.


Unnamed: 0,evaluate_constraints,emit_metrics,datatype_check_threshold,domain_content_threshold,distribution_constraints.perform_comparison,distribution_constraints.comparison_threshold,distribution_constraints.comparison_method
0,Enabled,Enabled,1.0,1.0,Enabled,0.1,Robust


#### 3-6 Baseline结果关联监控任务，并启动

In [31]:
code_prefix = '{}/code'.format(prefix)
pre_processor_script = S3Uploader.upload('preprocessor.py', 's3://{}/{}'.format(bucket,code_prefix))
s3_code_postprocessor_uri = S3Uploader.upload('postprocessor.py', 's3://{}/{}'.format(bucket,code_prefix))

In [32]:
from sagemaker.model_monitor import CronExpressionGenerator
from time import gmtime, strftime

reports_prefix = '{}/reports'.format(prefix)
s3_report_path = 's3://{}/{}'.format(bucket,reports_prefix)

mon_schedule_name = 'demo-xgboost-customer-churn-model-schedule-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
my_default_monitor.create_monitoring_schedule(monitor_schedule_name=mon_schedule_name,
                                              endpoint_input=xgb_predictor.endpoint,
                                              #record_preprocessor_script=pre_processor_script,
                                              #post_analytics_processor_script=s3_code_postprocessor_uri,
                                              output_s3_uri=s3_report_path,
                                              statistics=my_default_monitor.baseline_statistics(),
                                              constraints=my_default_monitor.suggested_constraints(),
                                              schedule_cron_expression=CronExpressionGenerator.hourly(),
                                              enable_cloudwatch_metrics=True,
                                             )

INFO:sagemaker:Creating monitoring schedule name demo-xgboost-customer-churn-model-schedule-2020-05-21-09-03-21.



Creating Monitoring Schedule with name: demo-xgboost-customer-churn-model-schedule-2020-05-21-09-03-21


#### 3-7 伪造异常流量

In [48]:
from threading import Thread
from time import sleep
import time

runtime_client = boto3.client('runtime.sagemaker')

# (just repeating code from above for convenience/ able to run this section independently)
def invoke_endpoint(ep_name, file_name, runtime_client):
    with open(file_name, 'r') as f:
        for row in f:
            payload = row.rstrip('\n')
            response = runtime_client.invoke_endpoint(EndpointName=ep_name,
                                          ContentType='text/csv', 
                                          Body=payload)
            time.sleep(1)
            
def invoke_endpoint_forever():
    while True:
#         invoke_endpoint(endpoint_name, 'data/test-dataset-input-missing-cols.csv', runtime_client)
        invoke_endpoint(endpoint_name, 'data/test-dataset-input-cols.csv', runtime_client)
        
thread = Thread(target = invoke_endpoint_forever)
thread.start()

---

### __（演示）模型监控结果__

---

In [None]:
mon_executions = my_default_monitor.list_executions()
if len(mon_executions) == 0:
    print("We created a hourly schedule above and it will kick off executions ON the hour.\nWe will have to wait till we hit the hour...")

while len(mon_executions) == 0:
    print("Waiting for the 1st execution to happen...")
    time.sleep(60)
    mon_executions = my_default_monitor.list_executions()  

##### 展示报告

In [None]:
latest_execution = mon_executions[-1]
print("Latest execution result: {}".format(latest_execution.describe()['ExitMessage']))
report_uri = latest_execution.output.destination

print("Found Report Files:")
S3Downloader.list(report_uri)

##### 展示异常情况

In [None]:
violations = my_default_monitor.latest_monitoring_constraint_violations()
pd.set_option('display.max_colwidth', -1)
constraints_df = pd.io.json.json_normalize(violations.body_dict["violations"])
constraints_df.head(10)

### __功能5 演示__ - Amazon SageMaker 自动机器学习（Autopilot）
__从原始数据到模型的全自动 / 半自动开发__


![model_monitor example](./6.png)

---

### __（演示）Autopilot__

---

## 清除在线集群

In [None]:
sess.delete_monitoring_schedule(mon_schedule_name)
sess.delete_endpoint(xgb_predictor.endpoint)
def cleanup(experiment):
    '''Clean up everything in the given experiment object'''
    for trial_summary in experiment.list_trials():
        trial = Trial.load(trial_name=trial_summary.trial_name)
        
        for trial_comp_summary in trial.list_trial_components():
            trial_step=TrialComponent.load(trial_component_name=trial_comp_summary.trial_component_name)
            print('Starting to delete TrialComponent..' + trial_step.trial_component_name)
            sm.disassociate_trial_component(TrialComponentName=trial_step.trial_component_name, TrialName=trial.trial_name)
            trial_step.delete()
            time.sleep(1)
         
        trial.delete()
    
    experiment.delete()

cleanup(customer_churn_experiment)

In [2]:
!tar -zcvf log.tar.gz ./main-demo.ipynb ./images ./data ./postprocessor.py ./preprocessor.py ./xgboost_customer_churn.py

tar: ./main-demo.ipynb: Cannot stat: No such file or directory
tar: ./images: Cannot stat: No such file or directory
tar: ./data: Cannot stat: No such file or directory
tar: ./postprocessor.py: Cannot stat: No such file or directory
tar: ./preprocessor.py: Cannot stat: No such file or directory
tar: ./xgboost_customer_churn.py: Cannot stat: No such file or directory
tar: Exiting with failure status due to previous errors
