# Challenge 4: SageMaker Model Monitoring misconfiguration

In this notebook, you identify issues with the implementation of SageMaker Model Monitoring with a production model and endpoint. Next, you troubleshoot an Amazon CloudWatch alarm. Then, you troubleshoot an automated solution that retrains the model if there is model drift.

## Task 4.1: Environment setup

In this task, you set up your environment.

In [None]:
#install-dependencies
%matplotlib inline
from datetime import datetime, timedelta
import json
import boto3
import time
import pandas as pd
import matplotlib.pyplot as plt
from sagemaker import get_execution_role, session
from sagemaker.s3 import S3Uploader,S3Downloader
from sagemaker.image_uris import retrieve
from sagemaker.predictor import Predictor
from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import CSVDeserializer
from time import sleep
from sagemaker.model_monitor import DefaultModelMonitor
from sagemaker.model_monitor.dataset_format import DatasetFormat
from sagemaker.model_monitor import CronExpressionGenerator
from sagemaker.model_monitor import ModelQualityMonitor
from sagemaker.model_monitor import EndpointInput
from sagemaker.model_monitor import CronExpressionGenerator
from threading import Thread
import random


region = boto3.Session().region_name
role = get_execution_role()
sm_session = session.Session(boto3.Session())
sm = boto3.Session().client("sagemaker")
sm_runtime = boto3.Session().client("sagemaker-runtime")
cw = boto3.Session().client("cloudwatch")

bucket = sm_session.default_bucket()
prefix = 'sagemaker/abalone'
data_capture_prefix = "{}/datacapture".format(prefix)
s3_capture_upload_path = "s3://{}/{}".format(bucket, data_capture_prefix)
#Available Capture Modes - Input, Output
#capture_modes = [ "Input",  "Output" ]
ground_truth_upload_path = (
    f"s3://{bucket}/{prefix}/ground_truth_data/{datetime.now():%Y-%m-%d-%H-%M-%S}"
)


In [None]:
#list-model
models = sm.list_models(NameContains='abalone',SortOrder='Ascending')
model_details = pd.json_normalize(models['Models'])
model_name = model_details['ModelName'][0]
print (model_name)

#model-s3-uri
desc_model= sm.describe_model(ModelName=model_name)
model_attr = pd.json_normalize(desc_model['PrimaryContainer'])
model_url = model_attr['ModelDataUrl'][0]
print (model_url)

## Task 4.2: Create a production endpoint config with Data Capture enabled

To log the inputs to your endpoint and the inference outputs from your deployed model to Amazon S3, you can enable a feature called Data Capture. Data Capture records information that can be used for training, debugging, and monitoring. Amazon SageMaker Model Monitor automatically parses this captured data and compares metrics from this data with a baseline that you create for the model.

In this task, you create a new endpoint config to enable data capture for monitoring model data quality.

### Challenge: Define the data capture configuration

You can capture the request payload, the response payload, or both with this configuration. The capture config applies to all variants.

There is an issue with this current endpoint configuration. The capture modes are missing from the variables the data scientist team expected to set up. Find the error and address it by creating the correct variable.


<i class="fas fa-info-circle" style="color:#008296"></i> **Learn more:** Refer to [CreateEndpointConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateEndpointConfig.html) for more information about how to correctly set up  an endpoint configuration.

<i class="fas fa-info-circle" style="color:#008296"></i> **Learn more:** Refer to [DataCaptureConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DataCaptureConfig.html) for more information about how to correctly set up a data capture configuration within an endpoint configuration.

<i class="fas fa-info-circle" style="color:#008296"></i> **Learn more:** Refer to [CaptureOption](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CaptureOption.html) for more information about which values are valid for a *CaptureOption*.

In [None]:
#create-endpoint-configuration
variant_name = 'AllTraffic'
endpoint_config_name = f'Abalone-Endpoint-1-{datetime.now():%Y-%m-%d-%H-%M-%S}'
endpoint_config_response = sm.create_endpoint_config(
    EndpointConfigName = endpoint_config_name,
    ProductionVariants=[
        {
            'ModelName':model_name,
            'InstanceType':'ml.t2.large',
            'InitialInstanceCount':1,
            'VariantName':variant_name
        }
    ],
    
        DataCaptureConfig= {
        'EnableCapture': True, # Whether data should be captured or not.
        'InitialSamplingPercentage' : 100,
        'CaptureContentTypeHeader': {'CsvContentTypes': [ 'text/csv' ]},
        'DestinationS3Uri': s3_capture_upload_path,
        'CaptureOptions': [{"CaptureMode" : capture_mode} for capture_mode in capture_modes] # Example - Use list comprehension to capture both Input and Output
    }
)
print(f"Created the Production Model Endpoint Config: {endpoint_config_name}")

<i class="far fa-eye" style="color:#262262" aria-hidden="true"></i> **Hint:** Declare the **capture_modes** to capture both the Input and Output in the endpoint DataCapture configuration and re-run the prior cell.

<i class="fas fa-flag-checkered"></i> **Answer:** Add this variable before the endpoint configuration is set to define the capture modes.

`capture_modes = [ "Input",  "Output" ]`

After the challenge is complete, list the endpoint and update the endpoint with the new endpoint configuration.

In [None]:
#list-endpoint
endpoints= sm.list_endpoints(NameContains='abalone',SortOrder='Ascending')
endpoint_details = pd.json_normalize(endpoints['Endpoints'])
endpoint_name = endpoint_details['EndpointName'][0]
print (endpoint_name)

<i class="fas fa-sticky-note" style="color:#ff6633"></i> **Note:** Endpoint update takes approximately 5 minutes to complete.

In [None]:
#update-endpoint
endpoint_response = sm.update_endpoint(EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name)
print('EndpointArn = {}'.format(endpoint_response['EndpointArn']))

def wait_for_endpoint_update_complete(endpoint):
    """Helper function to wait for the completion of updating an endpoint"""
    response = sm.describe_endpoint(EndpointName=endpoint_name)
    status = response.get("EndpointStatus")
    while status == "Updating":
        print("Waiting for Endpoint to Update")
        time.sleep(15)
        response = sm.describe_endpoint(EndpointName=endpoint_name)
        status = response.get("EndpointStatus")

    if status != "InService":
        print(f"Failed to update endpoint, response: {response}")
        failureReason = response.get("FailureReason", "")
        raise SystemExit(
            f"Failed to update endpoint {endpoint_response['EndpointArn']}, status: {status}, reason: {failureReason}"
        )
    print(f"Endpoint {endpoint_response['EndpointArn']} successfully updated.")

wait_for_endpoint_update_complete(endpoint=endpoint_response)

When the cell completes, an endpoint ARN is returned that looks like *arn:aws:sagemaker:us-west-2:012345678910:endpoint/abalone-2040-10-11-10-11-12*.

Your endpoint is currently configured with one variant, the production model. You can view the endpoint configuration using *describe_endpoint*.

In [None]:
#describe-the-endpoint
sm.describe_endpoint(EndpointName=endpoint_name)

## Task 4.3: Generate a baseline for model quality performance

In this task, you invoke the endpoint created above using validation data. Predictions from the deployed model using this validation data is used as a baseline dataset. You then use SageMaker’s Model Monitoring to execute a baseline job that computes model performance data, and suggest model quality constraints based on the baseline dataset.

In [None]:
#invoke-the-endpoint
cutoff = 0.8
# Create the SageMaker Predictor object
predictor = Predictor(endpoint_name=endpoint_name,
                        serializer=CSVSerializer(),
                        deserializer=CSVDeserializer())

validate_dataset = "validation_with_predictions.csv"

limit = 200  # Need at least 200 samples to compute standard deviations
i = 0
with open(f"data/{validate_dataset}", "w") as validation_file:
    validation_file.write("probability,prediction,label\n")  # CSV header
    with open("data/abalone_data_new.csv", "r") as f:
        for row in f:
            (label, input_cols) = row.split(",", 1)
            probability = float(predictor.predict(input_cols)[0][0])
            prediction = "1" if probability > cutoff else "0"
            validation_file.write(f"{probability},{prediction},{label}\n")
            i += 1
            if i > limit:
                break
            print(".", end="", flush=True)
            sleep(0.1)

print("\nDone!")

In [None]:
#configure-baseline-variables
baseline_prefix = prefix + "/baselining"
baseline_data_prefix = baseline_prefix + "/data"
baseline_results_prefix = baseline_prefix + "/results"

baseline_data_uri = "s3://{}/{}".format(bucket, baseline_data_prefix)
baseline_dataset_uri = "s3://{}/{}".format(bucket, baseline_data_prefix)
baseline_results_uri = "s3://{}/{}".format(bucket, baseline_results_prefix)
print("Baseline data uri: {}".format(baseline_data_uri))
print("Baseline results uri: {}".format(baseline_results_uri))

#### Upload the predictions as a baseline dataset.

Now, you upload the predictions made using validation dataset to S3 which are used for creating model quality baseline statistics and constraints

In [None]:
#upload-predictions-to-s3
baseline_dataset_uri = S3Uploader.upload(f"data/{validate_dataset}", baseline_data_uri)
baseline_dataset_uri

# Task 4.4: Create a baselining job with validation dataset predictions

Now, you define the model quality monitoring object and execute the model quality monitoring baseline job. Model monitor automatically generates baseline statistics and constraints based on the validation dataset provided.

In [None]:
#create-model-quality-monitoring-object
model_quality_monitor = ModelQualityMonitor(
    role=role,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    volume_size_in_gb=20,
    max_runtime_in_seconds=1800,
    sagemaker_session=sm_session
)

<i class="fas fa-sticky-note" style="color:#ff6633" aria-hidden="true"></i> **Note:** The baselining job can take 7-8 minutes to complete.

In [None]:
#suggest-baseline
baseline_job_name = f"model-baseline-job-{datetime.utcnow():%Y-%m-%d-%H%M}"

# specify the problem type, in this case regression, and provide other required attributes.
job = model_quality_monitor.suggest_baseline(
    job_name=baseline_job_name,
    baseline_dataset="data/validation_with_predictions.csv", 
    dataset_format=DatasetFormat.csv(header=True),
    output_s3_uri = baseline_results_uri,  # the S3 location to store the results.
    problem_type="Regression",
    inference_attribute= "prediction",  # the column in the dataset that contains predictions.
    probability_attribute= "probability",  # the column in the dataset that contains probabilities.
    ground_truth_attribute= "label"  # the column in the dataset that contains ground truth labels.
)
job.wait(logs=True)


<i class="fas fa-sticky-note" style="color:#ec7211"></i> **NOTE:** This code returns a lengthy response. You can ignore any warnings or error messages.

When the cell completes, a message is returned that looks like *2025-10-11 12:13:14,156 - DefaultDataAnalyzer - INFO - Spark job completed*.

## Task 4.5: Explore the results of the baselining job

In this task, you explore the results of the baselining job.

The baseline constraints and statistics files are uploaded to the S3 location.

List details from the baseline statistics and constraints files from the latest baselining job.

In [None]:
#view-statistics
baseline_job = model_quality_monitor.latest_baselining_job
binary_metrics = baseline_job.baseline_statistics().body_dict["regression_metrics"]
pd.json_normalize(binary_metrics).T

In [None]:
#view-constraints
pd.DataFrame(baseline_job.suggested_constraints().body_dict["regression_constraints"]).T


Now that you have a baseline created and have viewed the statistics and constraints, create a Model Monitor model quality monitoring job to track new inference records against the baseline.

## Task 4.6: Create model monitoring to identify model quality drift

In addition to the generated baseline, model quality monitoring needs two additional inputs - predictions made by the deployed model endpoint and the ground truth data to be provided by the model consuming application. Since you already enabled data capture on the endpoint, prediction data is captured in S3.

The ground truth data depends on the what your model is predicting and what the business use case is. In this case, since the model is predicting abalone rings, ground truth data indicates the number of rings of an abalone. For the purposes of this notebook, you generate synthetic data as ground truth.

In this task, you send some artificial traffic. If there is no traffic, the monitoring jobs are marked as *Failed* since there is no data to process.

In [None]:
#send-artificial-traffic
endpoint_name = predictor.endpoint_name
runtime_client = sm_session.sagemaker_runtime_client
limit = 200
i = 0

# repeating code from above to run this section independently
def invoke_endpoint(ep_name, file_name, runtime_client):
    i = 0
    with open(file_name, "r") as f:
        for row in f:
            (label, payload) = row.strip("\n").split(",", 1)  

            response = runtime_client.invoke_endpoint(
                EndpointName=ep_name, ContentType="text/csv", Body=payload, InferenceId=str(i),
            )
            response["Body"].read()
            i += 1
            if i > limit:
                break
            print(".", end="", flush=True)
            time.sleep(0.1)


invoke_endpoint(endpoint_name, "data/abalone_data_skewed.csv", runtime_client)
print("\nDone!")

Notice the new attribute *InferenceId*, which you are setting when invoking the endpoint. This is used to join the prediction data with the ground truth data.

## Task 4.7: View captured data

In this task, you list and view the data capture files stored in Amazon S3.

In [None]:
#view-captured-data
print("Waiting for captures to show up \n", end="")
capture_files = sorted(S3Downloader.list(f"{s3_capture_upload_path}/{endpoint_name}"))
if capture_files:
    capture_file = S3Downloader.read_file(capture_files[-1]).split("\n")
    capture_record = json.loads(capture_file[0])
print("Found Capture Files:")
print("\n ".join(capture_files[-3:]))

Next, view the contents of a single capture file. Here you should see all the data captured in an Amazon SageMaker specific JSON-line formatted file. Take a quick peek at the first few lines in the captured file.

In [None]:
print("\n".join(capture_file[-3:-1]))

Finally, the contents of a single line is present below in a formatted JSON file so that you can observe a little better.

In [None]:
print(json.dumps(capture_record, indent=2))

Again, notice the *InferenceId* attribute that is set as part of the *invoke_endpoint* call. If this is present, can be joined with ground truth data.

# Task 4.8: Generate synthetic ground truth

In this task, you generate ground truth data. The model quality job fails if there is no ground truth data to merge.

For the purpose of building and maintaining a more accurate model you sometimes need a combination of ground truth data + synthetic data. This synthetic (fake) data can be created from a baselining job and constraints that closely mimic real ground truth. 

Synthetic data can be a good solution when ground truth data is not enough. Another situation where synthetic data can help is when there are privacy concerns. 

In this task, since our model has not been in production for long time, we have not captured and labeled ground truth data, so we are using synthetic data for our Model Monitoring job.  

In [None]:
#generate-ground-truth
def ground_truth_with_id(inference_id):
    random.seed(inference_id)  # to get consistent results
    rand = random.randrange(20)
    return {
        "groundTruthData": {
            "data": rand,
            "encoding": "CSV",
        },
        "eventMetadata": {
            "eventId": str(inference_id),
        },
        "eventVersion": "0",
    }


def upload_ground_truth(records, upload_time):
    fake_records = [json.dumps(r) for r in records]
    data_to_upload = "\n".join(fake_records)
    target_s3_uri = f"{ground_truth_upload_path}/{upload_time:%Y/%m/%d/%H/%M%S}.jsonl"
    print(f"Uploading {len(fake_records)} records to", target_s3_uri)
    S3Uploader.upload_string_as_file_body(data_to_upload, target_s3_uri)

In [None]:
#upload-ground-truth-data
def generate_fake_ground_truth():
    j = 0
    while True:
        fake_records = [ground_truth_with_id(i) for i in range(200)]
        upload_ground_truth(fake_records, datetime.utcnow())
        j = (j + 1) % 5
        sleep(60 * 60)  # do this once an hour


gt_thread = Thread(target=generate_fake_ground_truth)
gt_thread.start()

# Task 4.9: Create a monitoring schedule

In this task, you create a monitoring schedule to run a model quality monitoring job.

In [None]:
#set-monitoring-schedule-name
monitor_schedule_name = (
    f"monitoring-schedule-{datetime.utcnow():%Y-%m-%d-%H%M}"
)

In [None]:
#create-endpoint-input
endpointInput = EndpointInput(
    endpoint_name=predictor.endpoint_name,
    probability_attribute="0",
    probability_threshold_attribute=0.5,
    destination="/opt/ml/processing/input_data",
    inference_attribute = "0"
)

In [None]:
#create-monitoring-schedule
from sagemaker.model_monitor import CronExpressionGenerator

response = model_quality_monitor.create_monitoring_schedule(
    monitor_schedule_name=monitor_schedule_name,
    endpoint_input=endpointInput,
    output_s3_uri=baseline_results_uri,
    problem_type="Regression",
    ground_truth_input=ground_truth_upload_path,
    constraints=baseline_job.suggested_constraints(),
    schedule_cron_expression=CronExpressionGenerator.hourly(),
    enable_cloudwatch_metrics=True,
)

In [None]:
#describe-monitoring-schedule
model_quality_monitor.describe_schedule()

## Task 4.10: Examine monitoring schedule executions

The monitor schedule starts jobs at the previously specified hourly interval. Even for an hourly schedule, Amazon SageMaker has a buffer period of 20 minutes to schedule your execution. You might see your execution start anywhere from 0 to 20 minutes from the hour boundary. This is expected and done for load balancing in the backend.

This execution takes approximately one hour to be able to generate the violations report. For the purpose of the lab, the next cells have code snippets for you to view and sample output is shared for reference. In the last step of this task, you view the violations report from a file that was generated and pre-loaded from an earlier monitoring run.

In this task, you list the executions of the monitoring schedules and review the status of each of the executions.

- With the hourly schedule that was configured in the previous step, the monitor looks for new data capture files in S3 for the previous hour.
- If no files are found, then the status of that execution is reported as 'Failed'. This can happen if no traffic was sent to the endpoint in the previous hour.
- If the execution status is reported as 'CompletedWithViolations', you should see the corresponding violations file in S3 in the path specified in previous steps.

If you want to list and view the current status of an execution, you can use code similar to this:

```python
# List monitoring schedules for endpoint
print('Monitoring schedules for endpoint \'{}\':\n'.format(endpoint_name))
print(sm.list_monitoring_schedules(EndpointName=endpoint_name))
```

```python
# Model Quality Monitor - monitoring executions
print('Monitoring executions for schedule \'{}\':\n'.format(monitor_schedule_name))
print(sm.list_monitoring_executions(MonitoringScheduleName=monitor_schedule_name,
                                           EndpointName=predictor.endpoint_name,
                                           MonitoringTypeEquals='ModelQuality',
                                           SortBy='CreationTime',
                                           SortOrder='Descending'))
```


```python
# Wait for the first execution of the monitoring_schedule
print("Waiting for first execution", end="")
while True:
    execution = model_quality_monitor.describe_schedule().get(
        "LastMonitoringExecutionSummary"
    )
    if execution:
        break
    print(".", end="", flush=True)
    sleep(10)
print()
print("Execution found!")
```

```python
# Get the status of the Monitoring job and the execution summary with or without violations
status = execution["MonitoringExecutionStatus"]

while status in ["Pending", "InProgress"]:
    print("Waiting for execution to finish", end="")
    latest_execution.wait(logs=False)
    latest_job = latest_execution.describe()
    print()
    print(f"{latest_job['ProcessingJobName']} job status:", latest_job["ProcessingJobStatus"])
    print(
        f"{latest_job['ProcessingJobName']} job exit message, if any:",
        latest_job.get("ExitMessage"),
    )
    print(
        f"{latest_job['ProcessingJobName']} job failure reason, if any:",
        latest_job.get("FailureReason"),
    )
    sleep(
        30
    )  # model quality executions consist of two Processing jobs, wait for second job to start
    latest_execution = model_quality_monitor.list_executions()[-1]
    execution = model_quality_monitor.describe_schedule()["LastMonitoringExecutionSummary"]
    status = execution["MonitoringExecutionStatus"]

print("Execution status is:", status)

if status != "Completed":
    print(execution)
    print(
        "====STOP==== \n No completed executions to inspect further. Please wait till an execution completes or investigate previously reported failures."
    )
```

```python
# Get the violations report S3uri
latest_execution = model_quality_monitor.list_executions()[-1]
report_uri = latest_execution.describe()["ProcessingOutputConfig"]["Outputs"][0]["S3Output"][
    "S3Uri"
]
print("Report Uri:", report_uri)
```

To view the generated violation report, you can use code similar to this:

```python
# View violations generated by monitoring schedule
pd.options.display.max_colwidth = None
violations = latest_execution.constraint_violations().body_dict["violations"]
violations_df = pd.json_normalize(violations)
violations_df.head(10)
```

Since the execution of the monitoring job you started above does not finish for 60-80 minutes, view a violations report from a file that was generated and pre-loaded from an earlier monitoring run.

In [None]:
#print-violations-report
pd.set_option('display.max_colwidth', None)
violations = json.load(open('data/constraint_violations.json'))
constraints_df=pd.json_normalize(violations, record_path=['violations'])
constraints_df.head(10)

## Task 4.11: Analyze model quality CloudWatch metrics

In addition to the violations, the monitoring schedule also emits CloudWatch metrics.

You use the built-in Amazon SageMaker Model Monitor container for CloudWatch metrics. SageMaker emits the metrics for each feature observed in the dataset in the */aws/sagemaker/Endpoints/model-metric* namespace with *EndpointName* and *ScheduleName* dimensions

In this task, you view the metrics generated and set up an CloudWatch alarm to be triggered when the model quality drifts from the baseline thresholds.

In [None]:
#create-CloudWatch-client
cw_client = boto3.Session().client("cloudwatch")

namespace = "aws/sagemaker/Endpoints/model-metrics"

cw_dimensions = [
    {"Name": "Endpoint", "Value": endpoint_name},
    {"Name": "MonitoringSchedule", "Value": monitor_schedule_name},
]

Now, you create an alarm if the r2 value of the model fall below the threshold suggested by the baseline constraints.

In [None]:
#create-CloudWatch-alarm
previous_date = f'{datetime.today() - timedelta(days=1):%Y-%m-%d}'
print (previous_date)
alarm_name = "MODEL_QUALITY_R2_SCORE"
alarm_desc = (
    "Trigger an CloudWatch alarm when the r2 score drifts away from the baseline constraints"
)
model_quality_r2_drift_threshold = (
    0.625  ##Setting this threshold purposefully low to see the alarm quickly.
)
metric_name = "r2"
namespace = "aws/sagemaker/Endpoints/model-metrics"
endpoint_name = 'Abalone-' + previous_date
print (endpoint_name)
monitoring_schedule_name = 'model-monitor-schedule-' + previous_date

cw_client.put_metric_alarm(
    AlarmName=alarm_name,
    AlarmDescription=alarm_desc,
    ActionsEnabled=True,
    MetricName=metric_name,
    Namespace=namespace,
    Statistic="Average",
    Dimensions=[
        {"Name": "Endpoint", "Value": endpoint_name},
        {"Name": "MonitoringSchedule", "Value": monitor_schedule_name},
    ],
    Period=600,
    EvaluationPeriods=1,
    DatapointsToAlarm=1,
    Threshold=model_quality_r2_drift_threshold,
    ComparisonOperator="LessThanOrEqualToThreshold",
    TreatMissingData="breaching",
)

You created a CloudWatch alarm. You can use this alarm to notify you of any model drift issues and trigger automatic model retraining.

### Cleanup

In [None]:
#cleanup
model_quality_monitor.delete_monitoring_schedule()
predictor.delete_model()
predictor.delete_endpoint()

You have completed this notebook. To move to the next part of the lab, do the following:

- Close this notebook file.
- Return to the lab session and continue with the **Conclusion**.