Example of ML pipeline with model monitor in AWS SagemMaker for STS
The configuration can be done by environment variables:
PIPELINE_NAME
: the name of the SM pipeline, defaults tostsPipeline
MODEL_PACKAGE_GROUP_NAME
: Model package group name for registering the model, defaults tostsPackageGroup
BASE_JOB_PREFIX
: used as a prefix for varius resources, like job names and S3 buckets keys, defaults tosts
This will use the defaul sagemaker bucket if not exits, a default bucket will be created based on the following format: sagemaker-{region}-{aws-account-id}
.
You can create a .env
file with the variables values (Don't include in the VCS repository):
PIPELINE_NAME=DDDDDDDDDDDDDDD
MODEL_PACKAGE_GROUP_NAME=DDDDDDDDDDDDD
BASE_JOB_PREFIX=DDDDDDDDDDDDDDDDDDDDDDDDD
Example run from 0 to model quality monitor:
-
Create an virtualenv
python3 -m venv env source env/bin/activate
-
Install requirements
pip install -U pip pip install -r dev-requirements.txt
-
Train the model, this will define the ML pipeline and execute it on AWS Sagemaker, after the model is trained it will be added to the model group specified by
MODEL_PACKAGE_GROUP_NAME
.python trainmodel.py
Opcional: inspect the
trainmodel_out.json
file. -
After training the model tou can deploy an AWS Sagemaker Endpoint:
python deploymodel.py --capture
The
--capture
flag is opcional, if not provided the endpoint will not have data capture configured and the model quality monitor will not run.Opcional: inspect the
deploymodel_out.json
file. -
Setup model quality monitor (MQM):
python setupmq.py
By default it will use the JSON files from the previus steps as input, and will store additional information to the
deploymodel_out.json
file. -
Send traffic to the endpoint:
python testendpoint.py
Opcional: inspect the
testendpoint_out.json
file.Wait for 3 to 5 minutes until all the rows in the
test.csv
data set (838) appear in the data capture path in S3 (sees3_capture_upload_path
in thedeploymodel_out.json
file for the S3 Uri) should be several files there. -
Generate fake ground truth labels for the MQM:
python gen_fake_ground_truth.py
This script will read the data captures from the S3 Uri and generate labels for each inferenceId present in the captures.
-
Go to the
Sagemaker Studio
/Sagemaker Components and registries
and selectEnpoints
, select the endpoint created bydeploymodel.py
(should by something likests-sklearn-YYYYMMDDHHMM
). Under Monitoring job history wait until the schedule MQM run (should be more or less than 1 hour) and under Model quality you will se the metrics and chars once the first job run. -
When done cleanup:
python cleanup.py
example_data
: some examples of pipeline definitions, as a form of documentationsts
: main py packagebaseline.py
: a processing script that generates a baseline dataset for the model quality monitor.evaluate.py
: using test.csv dataset evaluates the model metrics for Model registration on AWSpipeline.py
: defines the ML pipeline for sagemakerpreprocess.py
: a processing script for the sts dataset (s3://sts-datwit-dataset/stsmsrpc.txt
)utils.py
: define some usefull functions
trainmodel.py
: sends to AWS SageMaker the ML pipeline definition and wait for the training to be done. It will output some information to the filetrainmodel_out.json
deploymodel.py
: deploys the latest version of the model if any and optionally setup data capture on the endpoint. It will output some information to the filedeploymodel_out.json
.setupmq.py
: example setup of model quality monitor for the endpoint deployed indeploymodel.py
, this require the filestrainmodel_out.json
anddeploymodel_out.json
. It will add information todeploymodel_out.json
.gen_fake_ground_truth.py
: generate fake ground truth for the model quality monitor.cleanup.py
: will remove the schedule model quality monitor, endpoint config, model endpoint and the model from the sagemaker registries.testendpoint.py
: will call the model endpoint passing to it thetest.csv
dataset, it will ouput the inferences to the filetestendpoint_out.json
This section is for admins or infraestructure managers, this example need's the credentials configured for aws cli or the equivalent environment variables.
Anyway, the user should be able tu use/assume a rol with full sagemaker access
For local exceution you must configure the aws credentials, additionally the user must have the permissition to assume a role that should have the following permissions,
- Full access to the S3 bucket that will be used to store training and output data.
- Full access to launch training instances.
- Full access to deploy models.
- Full access to launch monitoring instances and schedules.
- Access to write to CloudWatch logs and metrics.
See Using an IAM role in the AWS CLI to configure this for the local enviroment and aws cli
In order to assume a role, the IAM user for the static credentials must have the following permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"sts:AssumeRole",
"sts:TagSession"
],
"Resource": "<SAGEMAKER_EXECUTION_ROLE_ARN>",
"Effect": "Allow"
}
]
}
The role's trust policy must allow the IAM user to assume the role:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowIamUserAssumeRole",
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Principal": {"AWS": "<MY_USER_IAM_ARN>"},
},
{
"Sid": "AllowPassSessionTags",
"Effect": "Allow",
"Action": "sts:TagSession",
"Principal": {"AWS": "<MY_USER_IAM_ARN>"}
}
]
}