# This is a tutorial on sagemaker pipelines, sagemaker recieved a new feature called sagemaker unified studio. 

# On regular sagemaker, if you are to run this code outside the sagemaker domain (meaning maybe in your local machine or in a notebook instance outside the domain).
# you would be able to see the pipeline in a sagemaker domain that was created separately. However, this is not longer the case with sagemaker unified studio.

# Since sagemaker unified studio is meant to be a unified solution for multiple tools, it has been built to rely entirely on sagemaker unified studio. 
# If you run this notebook's first section inside a sagemaker unified studio project's space (jupyter notebook), sagemaker unified studio would tag the created resources for you so that you can access them conveniently on the console of
# your project. However, if you want to be able to visualize your resources created outside of your unified studio project. Currently, sagemaker unified studio 
# requires custom tagging of resources to be able to show them on the console. 

# you can skip to the second section of this notebook to see what has been done to enable visualization inside a sagemaker's unifieds studio projects console.

#section1: creating a pipeline, executing pipeline to populate resources. that executes the following steps:

1. processsing step
2. training step
3. evaluation step
4. create model step
5. batch transform step
6. register model step
7. conditional step
8. create a model definition step.

In [1]:
from datetime import datetime

def create_name_with_timestamp(prefix):
    timestamp = datetime.now().strftime("%Y%m%d%H%M%S")
    return f"{prefix}{timestamp}"

In [2]:
import sys

import boto3
import sagemaker
from sagemaker.workflow.pipeline_context import PipelineSession

sagemaker_session = sagemaker.session.Session()
region = sagemaker_session.boto_region_name
role = sagemaker.get_execution_role()
pipeline_session = PipelineSession()
default_bucket = sagemaker_session.default_bucket()
model_package_group_name = create_name_with_timestamp("AbaloneModelPackageGroupName")

  from pandas.core.computation.check import NUMEXPR_INSTALLED


sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


In [3]:
!mkdir -p data

In [4]:
local_path = "data/abalone-dataset.csv"

s3 = boto3.resource("s3")
s3.Bucket(f"sagemaker-example-files-prod-{region}").download_file(
    "datasets/tabular/uci_abalone/abalone.csv", local_path
)

base_uri = f"s3://{default_bucket}/abalone"
input_data_uri = sagemaker.s3.S3Uploader.upload(
    local_path=local_path,
    desired_s3_uri=base_uri,
)
print(input_data_uri)

s3://sagemaker-us-east-1-794038231401/abalone/abalone-dataset.csv


In [5]:
local_path = "data/abalone-dataset-batch"

s3 = boto3.resource("s3")
s3.Bucket(f"sagemaker-servicecatalog-seedcode-{region}").download_file(
    "dataset/abalone-dataset-batch", local_path
)

base_uri = f"s3://{default_bucket}/abalone"
batch_data_uri = sagemaker.s3.S3Uploader.upload(
    local_path=local_path,
    desired_s3_uri=base_uri,
)
print(batch_data_uri)

s3://sagemaker-us-east-1-794038231401/abalone/abalone-dataset-batch


In [6]:
from sagemaker.workflow.parameters import (
    ParameterInteger,
    ParameterString,
    ParameterFloat,
)

processing_instance_count = ParameterInteger(name="ProcessingInstanceCount", default_value=1)
instance_type = ParameterString(name="TrainingInstanceType", default_value="ml.m5.xlarge")
model_approval_status = ParameterString(
    name="ModelApprovalStatus", default_value="PendingManualApproval"
)
input_data = ParameterString(
    name="InputData",
    default_value=input_data_uri,
)
batch_data = ParameterString(
    name="BatchData",
    default_value=batch_data_uri,
)
mse_threshold = ParameterFloat(name="MseThreshold", default_value=6.0)

In [7]:
!mkdir code

mkdir: cannot create directory ‘code’: File exists


In [8]:
%%writefile code/preprocessing.py
import argparse
import os
import requests
import tempfile

import numpy as np
import pandas as pd

from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder


# Since we get a headerless CSV file, we specify the column names here.
feature_columns_names = [
    "sex",
    "length",
    "diameter",
    "height",
    "whole_weight",
    "shucked_weight",
    "viscera_weight",
    "shell_weight",
]
label_column = "rings"

feature_columns_dtype = {
    "sex": str,
    "length": np.float64,
    "diameter": np.float64,
    "height": np.float64,
    "whole_weight": np.float64,
    "shucked_weight": np.float64,
    "viscera_weight": np.float64,
    "shell_weight": np.float64,
}
label_column_dtype = {"rings": np.float64}


def merge_two_dicts(x, y):
    z = x.copy()
    z.update(y)
    return z


if __name__ == "__main__":
    base_dir = "/opt/ml/processing"

    df = pd.read_csv(
        f"{base_dir}/input/abalone-dataset.csv",
        header=None,
        names=feature_columns_names + [label_column],
        dtype=merge_two_dicts(feature_columns_dtype, label_column_dtype),
    )
    numeric_features = list(feature_columns_names)
    numeric_features.remove("sex")
    numeric_transformer = Pipeline(
        steps=[("imputer", SimpleImputer(strategy="median")), ("scaler", StandardScaler())]
    )

    categorical_features = ["sex"]
    categorical_transformer = Pipeline(
        steps=[
            ("imputer", SimpleImputer(strategy="constant", fill_value="missing")),
            ("onehot", OneHotEncoder(handle_unknown="ignore")),
        ]
    )

    preprocess = ColumnTransformer(
        transformers=[
            ("num", numeric_transformer, numeric_features),
            ("cat", categorical_transformer, categorical_features),
        ]
    )

    y = df.pop("rings")
    X_pre = preprocess.fit_transform(df)
    y_pre = y.to_numpy().reshape(len(y), 1)

    X = np.concatenate((y_pre, X_pre), axis=1)

    np.random.shuffle(X)
    train, validation, test = np.split(X, [int(0.7 * len(X)), int(0.85 * len(X))])

    pd.DataFrame(train).to_csv(f"{base_dir}/train/train.csv", header=False, index=False)
    pd.DataFrame(validation).to_csv(
        f"{base_dir}/validation/validation.csv", header=False, index=False
    )
    pd.DataFrame(test).to_csv(f"{base_dir}/test/test.csv", header=False, index=False)

Overwriting code/preprocessing.py


In [9]:
from sagemaker.sklearn.processing import SKLearnProcessor


framework_version = "1.2-1"

sklearn_processor = SKLearnProcessor(
    framework_version=framework_version,
    instance_type="ml.m5.xlarge",
    instance_count=processing_instance_count,
    base_job_name="sklearn-abalone-process",
    role=role,
    sagemaker_session=pipeline_session,
)

In [10]:
from sagemaker.processing import ProcessingInput, ProcessingOutput
from sagemaker.workflow.steps import ProcessingStep

processor_args = sklearn_processor.run(
    inputs=[
        ProcessingInput(source=input_data, destination="/opt/ml/processing/input"),
    ],
    outputs=[
        ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
        ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"),
        ProcessingOutput(output_name="test", source="/opt/ml/processing/test"),
    ],
    code="code/preprocessing.py",
)

step_process = ProcessingStep(name="AbaloneProcess", step_args=processor_args)



In [11]:
from sagemaker.estimator import Estimator
from sagemaker.inputs import TrainingInput

model_path = f"s3://{default_bucket}/AbaloneTrain"
image_uri = sagemaker.image_uris.retrieve(
    framework="xgboost",
    region=region,
    version="1.0-1",
    py_version="py3",
    instance_type="ml.m5.xlarge",
)
xgb_train = Estimator(
    image_uri=image_uri,
    instance_type=instance_type,
    instance_count=1,
    output_path=model_path,
    role=role,
    sagemaker_session=pipeline_session,
)
xgb_train.set_hyperparameters(
    objective="reg:linear",
    num_round=50,
    max_depth=5,
    eta=0.2,
    gamma=4,
    min_child_weight=6,
    subsample=0.7,
)

train_args = xgb_train.fit(
    inputs={
        "train": TrainingInput(
            s3_data=step_process.properties.ProcessingOutputConfig.Outputs["train"].S3Output.S3Uri,
            content_type="text/csv",
        ),
        "validation": TrainingInput(
            s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                "validation"
            ].S3Output.S3Uri,
            content_type="text/csv",
        ),
    }
)

In [12]:
from sagemaker.inputs import TrainingInput
from sagemaker.workflow.steps import TrainingStep


step_train = TrainingStep(
    name="AbaloneTrain",
    step_args=train_args,
)

In [13]:
%%writefile code/evaluation.py
import json
import pathlib
import pickle
import tarfile

import joblib
import numpy as np
import pandas as pd
import xgboost

from sklearn.metrics import mean_squared_error


if __name__ == "__main__":
    model_path = f"/opt/ml/processing/model/model.tar.gz"
    with tarfile.open(model_path) as tar:
        tar.extractall(path=".")

    model = pickle.load(open("xgboost-model", "rb"))

    test_path = "/opt/ml/processing/test/test.csv"
    df = pd.read_csv(test_path, header=None)

    y_test = df.iloc[:, 0].to_numpy()
    df.drop(df.columns[0], axis=1, inplace=True)

    X_test = xgboost.DMatrix(df.values)

    predictions = model.predict(X_test)

    mse = mean_squared_error(y_test, predictions)
    std = np.std(y_test - predictions)
    report_dict = {
        "regression_metrics": {
            "mse": {"value": mse, "standard_deviation": std},
        },
    }

    output_dir = "/opt/ml/processing/evaluation"
    pathlib.Path(output_dir).mkdir(parents=True, exist_ok=True)

    evaluation_path = f"{output_dir}/evaluation.json"
    with open(evaluation_path, "w") as f:
        f.write(json.dumps(report_dict))

Overwriting code/evaluation.py


In [14]:
from sagemaker.processing import ScriptProcessor


script_eval = ScriptProcessor(
    image_uri=image_uri,
    command=["python3"],
    instance_type="ml.m5.xlarge",
    instance_count=1,
    base_job_name="script-abalone-eval",
    role=role,
    sagemaker_session=pipeline_session,
)

eval_args = script_eval.run(
    inputs=[
        ProcessingInput(
            source=step_train.properties.ModelArtifacts.S3ModelArtifacts,
            destination="/opt/ml/processing/model",
        ),
        ProcessingInput(
            source=step_process.properties.ProcessingOutputConfig.Outputs["test"].S3Output.S3Uri,
            destination="/opt/ml/processing/test",
        ),
    ],
    outputs=[
        ProcessingOutput(output_name="evaluation", source="/opt/ml/processing/evaluation"),
    ],
    code="code/evaluation.py",
)

In [15]:
from sagemaker.workflow.properties import PropertyFile


evaluation_report = PropertyFile(
    name="EvaluationReport", output_name="evaluation", path="evaluation.json"
)
step_eval = ProcessingStep(
    name="AbaloneEval",
    step_args=eval_args,
    property_files=[evaluation_report],
)

In [16]:
from sagemaker.model import Model

model = Model(
    image_uri=image_uri,
    model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
    sagemaker_session=pipeline_session,
    role=role,
)

In [17]:
from sagemaker.inputs import CreateModelInput
from sagemaker.workflow.model_step import ModelStep

step_create_model = ModelStep(
    name="AbaloneCreateModel",
    step_args=model.create(instance_type="ml.m5.large", accelerator_type="ml.eia1.medium"),
)

In [18]:
from sagemaker.transformer import Transformer


transformer = Transformer(
    model_name=step_create_model.properties.ModelName,
    instance_type="ml.m5.xlarge",
    instance_count=1,
    output_path=f"s3://{default_bucket}/AbaloneTransform",
)

In [19]:
from sagemaker.inputs import TransformInput
from sagemaker.workflow.steps import TransformStep


step_transform = TransformStep(
    name="AbaloneTransform", transformer=transformer, inputs=TransformInput(data=batch_data)
)

In [20]:
from sagemaker.model_metrics import MetricsSource, ModelMetrics

model_metrics = ModelMetrics(
    model_statistics=MetricsSource(
        s3_uri="{}/evaluation.json".format(
            step_eval.arguments["ProcessingOutputConfig"]["Outputs"][0]["S3Output"]["S3Uri"]
        ),
        content_type="application/json",
    )
)

register_args = model.register(
    content_types=["text/csv"],
    response_types=["text/csv"],
    inference_instances=["ml.t2.medium", "ml.m5.xlarge"],
    transform_instances=["ml.m5.xlarge"],
    model_package_group_name=model_package_group_name,
    approval_status=model_approval_status,
    model_metrics=model_metrics,
)

step_register = ModelStep(name="AbaloneRegisterModel", step_args=register_args)



In [21]:
from sagemaker.workflow.fail_step import FailStep
from sagemaker.workflow.functions import Join

step_fail = FailStep(
    name="AbaloneMSEFail",
    error_message=Join(on=" ", values=["Execution failed due to MSE >", mse_threshold]),
)

In [22]:
from sagemaker.workflow.conditions import ConditionLessThanOrEqualTo
from sagemaker.workflow.condition_step import ConditionStep
from sagemaker.workflow.functions import JsonGet


cond_lte = ConditionLessThanOrEqualTo(
    left=JsonGet(
        step_name=step_eval.name,
        property_file=evaluation_report,
        json_path="regression_metrics.mse.value",
    ),
    right=mse_threshold,
)

step_cond = ConditionStep(
    name="AbaloneMSECond",
    conditions=[cond_lte],
    if_steps=[step_register, step_create_model, step_transform],
    else_steps=[step_fail],
)

In [23]:
from sagemaker.workflow.pipeline import Pipeline


pipeline_name = create_name_with_timestamp("AbalonePipeline")
pipeline = Pipeline(
    name=pipeline_name,
    parameters=[
        processing_instance_count,
        instance_type,
        model_approval_status,
        input_data,
        batch_data,
        mse_threshold,
    ],
    steps=[step_process, step_train, step_eval, step_cond],
)

In [24]:
import json


definition = json.loads(pipeline.definition())
definition



{'Version': '2020-12-01',
 'Metadata': {},
 'Parameters': [{'Name': 'ProcessingInstanceCount',
   'Type': 'Integer',
   'DefaultValue': 1},
  {'Name': 'TrainingInstanceType',
   'Type': 'String',
   'DefaultValue': 'ml.m5.xlarge'},
  {'Name': 'ModelApprovalStatus',
   'Type': 'String',
   'DefaultValue': 'PendingManualApproval'},
  {'Name': 'InputData',
   'Type': 'String',
   'DefaultValue': 's3://sagemaker-us-east-1-794038231401/abalone/abalone-dataset.csv'},
  {'Name': 'BatchData',
   'Type': 'String',
   'DefaultValue': 's3://sagemaker-us-east-1-794038231401/abalone/abalone-dataset-batch'},
  {'Name': 'MseThreshold', 'Type': 'Float', 'DefaultValue': 6.0}],
 'PipelineExperimentConfig': {'ExperimentName': {'Get': 'Execution.PipelineName'},
  'TrialName': {'Get': 'Execution.PipelineExecutionId'}},
 'Steps': [{'Name': 'AbaloneProcess',
   'Type': 'Processing',
   'Arguments': {'ProcessingResources': {'ClusterConfig': {'InstanceType': 'ml.m5.xlarge',
      'InstanceCount': {'Get': 'Para

In [25]:
pipeline_arn = pipeline.upsert(role_arn=role)['PipelineArn']



In [26]:
##### section 2
# tagging reosurces to enable their visualization on the sagemaker's unified studio project

In [27]:
tags = [
    {'Key': 'AmazonDataZoneDomain', 'Value': 'dzd_amkwelbodmx2av'},
    {'Key': 'AmazonDataZoneProject', 'Value': '55v4yp8pzuqs6f'},
]

In [28]:
# for us

client = boto3.client('sagemaker')

response = client.add_tags(
    ResourceArn=pipeline_arn,
    Tags=tags
)

print(f"Tags added: {response}")

Tags added: {'Tags': [{'Key': 'AmazonDataZoneDomain', 'Value': 'dzd_amkwelbodmx2av'}, {'Key': 'AmazonDataZoneProject', 'Value': '55v4yp8pzuqs6f'}], 'ResponseMetadata': {'RequestId': 'f75848ed-d42d-48f5-87d3-49e227eaafe9', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'f75848ed-d42d-48f5-87d3-49e227eaafe9', 'content-type': 'application/x-amz-json-1.1', 'content-length': '127', 'date': 'Sat, 28 Jun 2025 06:48:18 GMT'}, 'RetryAttempts': 0}}


In [29]:
# now the pipeline is showing in the sagemaker unified studio, also the pipelines executions are showing.

In [30]:
# now we will look to visualize
# we can run the pipeline to get the arn using pipeline.run()
# so to test this, we will start a pipeline execution
execution = pipeline.start()

In [31]:
execution.describe()

{'PipelineArn': 'arn:aws:sagemaker:us-east-1:794038231401:pipeline/AbalonePipeline20250628064817',
 'PipelineExecutionArn': 'arn:aws:sagemaker:us-east-1:794038231401:pipeline/AbalonePipeline20250628064817/execution/9s75hla164dq',
 'PipelineExecutionDisplayName': 'execution-1751093298240',
 'PipelineExecutionStatus': 'Executing',
 'CreationTime': datetime.datetime(2025, 6, 28, 6, 48, 18, 183000, tzinfo=tzlocal()),
 'LastModifiedTime': datetime.datetime(2025, 6, 28, 6, 48, 18, 183000, tzinfo=tzlocal()),
 'CreatedBy': {'IamIdentity': {'Arn': 'arn:aws:sts::794038231401:assumed-role/SageMaker-ExecutionRole-20250401T103257/SageMaker',
   'PrincipalId': 'AROA3RYC54FU7YQXFCZ2W:SageMaker'}},
 'LastModifiedBy': {'IamIdentity': {'Arn': 'arn:aws:sts::794038231401:assumed-role/SageMaker-ExecutionRole-20250401T103257/SageMaker',
   'PrincipalId': 'AROA3RYC54FU7YQXFCZ2W:SageMaker'}},
 'ResponseMetadata': {'RequestId': '1dcb1518-8c68-471e-b8d8-4c8ecdd825c9',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {

In [32]:
execution.wait()

In [33]:
execution.list_steps()

[{'StepName': 'AbaloneTransform',
  'StartTime': datetime.datetime(2025, 6, 28, 6, 55, 46, 615000, tzinfo=tzlocal()),
  'EndTime': datetime.datetime(2025, 6, 28, 7, 0, 10, 893000, tzinfo=tzlocal()),
  'StepStatus': 'Succeeded',
  'Metadata': {'TransformJob': {'Arn': 'arn:aws:sagemaker:us-east-1:794038231401:transform-job/pipelines-9s75hla164dq-AbaloneTransform-CDq0FZt7Hy'}},
  'AttemptCount': 1},
 {'StepName': 'AbaloneRegisterModel-RegisterModel',
  'StartTime': datetime.datetime(2025, 6, 28, 6, 55, 44, 611000, tzinfo=tzlocal()),
  'EndTime': datetime.datetime(2025, 6, 28, 6, 55, 46, 135000, tzinfo=tzlocal()),
  'StepStatus': 'Succeeded',
  'Metadata': {'RegisterModel': {'Arn': 'arn:aws:sagemaker:us-east-1:794038231401:model-package/AbaloneModelPackageGroupName20250628064815/1'}},
  'AttemptCount': 1},
 {'StepName': 'AbaloneCreateModel-CreateModel',
  'StartTime': datetime.datetime(2025, 6, 28, 6, 55, 44, 611000, tzinfo=tzlocal()),
  'EndTime': datetime.datetime(2025, 6, 28, 6, 55, 46,

In [34]:
execution_arn = 'arn:aws:sagemaker:us-east-1:794038231401:pipeline/AbalonePipeline20250628054840/execution/j87a80i5euq3'

In [35]:
def get_pipeline_resource_arns(execution_arn):
    sm = boto3.client('sagemaker')
    try:
        response = sm.list_pipeline_execution_steps(PipelineExecutionArn=execution_arn)
        steps = response['PipelineExecutionSteps']
        arns = []
        
        for step in steps:
            metadata = step['Metadata']
            step_name = step['StepName']
            if 'ProcessingJob' in metadata:
                arns.append({'Step': step_name, 'Type': 'ProcessingJob', 'ARN': metadata['ProcessingJob']['Arn']})
            elif 'TrainingJob' in metadata:
                arns.append({'Step': step_name, 'Type': 'TrainingJob', 'ARN': metadata['TrainingJob']['Arn']})
            elif 'Model' in metadata:
                arns.append({'Step': step_name, 'Type': 'Model', 'ARN': metadata['Model']['Arn']})
            elif 'RegisterModel' in metadata:
                arns.append({'Step': step_name, 'Type': 'ModelPackage', 'ARN': metadata['RegisterModel']['Arn']})
            elif 'TransformJob' in metadata:
                arns.append({'Step': step_name, 'Type': 'TransformJob', 'ARN': metadata['TransformJob']['Arn']})
        
        return arns
    except Exception as e:
        print(f"Error retrieving pipeline steps: {e}")
        return []

In [36]:
arns = get_pipeline_resource_arns(execution_arn)
resources = {}
for arn in arns:
    resources 
    print(f"Step: {arn['Step']}, Type: {arn['Type']}, ARN: {arn['ARN']}")

Step: AbaloneTransform, Type: TransformJob, ARN: arn:aws:sagemaker:us-east-1:794038231401:transform-job/pipelines-j87a80i5euq3-AbaloneTransform-3RGSkDjiWx
Step: AbaloneRegisterModel-RegisterModel, Type: ModelPackage, ARN: arn:aws:sagemaker:us-east-1:794038231401:model-package/AbaloneModelPackageGroupName20250628054838/1
Step: AbaloneCreateModel-CreateModel, Type: Model, ARN: arn:aws:sagemaker:us-east-1:794038231401:model/pipelines-j87a80i5euq3-AbaloneCreateModel-C-UOTS6cdK8r
Step: AbaloneEval, Type: ProcessingJob, ARN: arn:aws:sagemaker:us-east-1:794038231401:processing-job/pipelines-j87a80i5euq3-AbaloneEval-4WblKETpH7
Step: AbaloneTrain, Type: TrainingJob, ARN: arn:aws:sagemaker:us-east-1:794038231401:training-job/pipelines-j87a80i5euq3-AbaloneTrain-6ouSfLb17D
Step: AbaloneProcess, Type: ProcessingJob, ARN: arn:aws:sagemaker:us-east-1:794038231401:processing-job/pipelines-j87a80i5euq3-AbaloneProcess-RFnBTffMAz


In [37]:
resources = {}
for arn in arns:
    resources[arn['Type']] = arn['ARN']
    
print(resources)

{'TransformJob': 'arn:aws:sagemaker:us-east-1:794038231401:transform-job/pipelines-j87a80i5euq3-AbaloneTransform-3RGSkDjiWx', 'ModelPackage': 'arn:aws:sagemaker:us-east-1:794038231401:model-package/AbaloneModelPackageGroupName20250628054838/1', 'Model': 'arn:aws:sagemaker:us-east-1:794038231401:model/pipelines-j87a80i5euq3-AbaloneCreateModel-C-UOTS6cdK8r', 'ProcessingJob': 'arn:aws:sagemaker:us-east-1:794038231401:processing-job/pipelines-j87a80i5euq3-AbaloneProcess-RFnBTffMAz', 'TrainingJob': 'arn:aws:sagemaker:us-east-1:794038231401:training-job/pipelines-j87a80i5euq3-AbaloneTrain-6ouSfLb17D'}


In [38]:
# now we get the model package group name
sagemaker_client = boto3.client('sagemaker')

# Your model package ARN
model_package_arn = resources['ModelPackage']

# Describe the model package to get its details
response = sagemaker_client.describe_model_package(
    ModelPackageName=model_package_arn
)

ModelPackageGroupName = response['ModelPackageGroupName']

response = client.describe_model_package_group(
    ModelPackageGroupName=ModelPackageGroupName
)

resources['ModelPackageGroup'] = response['ModelPackageGroupArn']
print(resources['ModelPackageGroup'])

arn:aws:sagemaker:us-east-1:794038231401:model-package-group/AbaloneModelPackageGroupName20250628054838


In [39]:
print(resources)

{'TransformJob': 'arn:aws:sagemaker:us-east-1:794038231401:transform-job/pipelines-j87a80i5euq3-AbaloneTransform-3RGSkDjiWx', 'ModelPackage': 'arn:aws:sagemaker:us-east-1:794038231401:model-package/AbaloneModelPackageGroupName20250628054838/1', 'Model': 'arn:aws:sagemaker:us-east-1:794038231401:model/pipelines-j87a80i5euq3-AbaloneCreateModel-C-UOTS6cdK8r', 'ProcessingJob': 'arn:aws:sagemaker:us-east-1:794038231401:processing-job/pipelines-j87a80i5euq3-AbaloneProcess-RFnBTffMAz', 'TrainingJob': 'arn:aws:sagemaker:us-east-1:794038231401:training-job/pipelines-j87a80i5euq3-AbaloneTrain-6ouSfLb17D', 'ModelPackageGroup': 'arn:aws:sagemaker:us-east-1:794038231401:model-package-group/AbaloneModelPackageGroupName20250628054838'}


In [40]:
# we will remove keys pairs that don't need tags
resources.pop('ModelPackage')

'arn:aws:sagemaker:us-east-1:794038231401:model-package/AbaloneModelPackageGroupName20250628054838/1'

In [41]:
for key in resources:
    print(f'\nkey: {key}')
    response = client.add_tags(
        ResourceArn = resources[key],
        Tags = tags
    )
    print(f"======\n Tags added: {response}")


key: TransformJob
 Tags added: {'Tags': [{'Key': 'AmazonDataZoneDomain', 'Value': 'dzd_amkwelbodmx2av'}, {'Key': 'AmazonDataZoneProject', 'Value': '55v4yp8pzuqs6f'}], 'ResponseMetadata': {'RequestId': '20788862-5491-44c4-b869-c7f8e225d065', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '20788862-5491-44c4-b869-c7f8e225d065', 'content-type': 'application/x-amz-json-1.1', 'content-length': '127', 'date': 'Sat, 28 Jun 2025 07:00:21 GMT'}, 'RetryAttempts': 0}}

key: Model
 Tags added: {'Tags': [{'Key': 'AmazonDataZoneDomain', 'Value': 'dzd_amkwelbodmx2av'}, {'Key': 'AmazonDataZoneProject', 'Value': '55v4yp8pzuqs6f'}], 'ResponseMetadata': {'RequestId': 'babf5c5c-f129-4068-9505-7459fcc93cd7', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'babf5c5c-f129-4068-9505-7459fcc93cd7', 'content-type': 'application/x-amz-json-1.1', 'content-length': '127', 'date': 'Sat, 28 Jun 2025 07:00:21 GMT'}, 'RetryAttempts': 0}}

key: ProcessingJob
 Tags added: {'Tags': [{'Key': 'Amazo

In [42]:
### once we successfully tag the resources, we should expect to see them on SMUS. If the behaviour persist, we can walk through this to find out if we are facing a service bug. 

In [43]:
!pip install jupytext

Collecting jupytext
  Using cached jupytext-1.17.2-py3-none-any.whl.metadata (14 kB)
Collecting mdit-py-plugins (from jupytext)
  Using cached mdit_py_plugins-0.4.2-py3-none-any.whl.metadata (2.8 kB)
Using cached jupytext-1.17.2-py3-none-any.whl (164 kB)
Using cached mdit_py_plugins-0.4.2-py3-none-any.whl (55 kB)
Installing collected packages: mdit-py-plugins, jupytext
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2/2[0m [jupytext]
[1A[2KSuccessfully installed jupytext-1.17.2 mdit-py-plugins-0.4.2


In [44]:
!jupytext --to py:percent sagemaker-pipelines-preprocess-train-evaluate-batch-transform.ipynb

[jupytext] Reading sagemaker-pipelines-preprocess-train-evaluate-batch-transform.ipynb in format ipynb
[jupytext] Writing sagemaker-pipelines-preprocess-train-evaluate-batch-transform.py in format py:percent
