<h1> Method 1: Train with custom XGB Container </h1>

This notebook demonstrates how to build and use a custom Docker container for training with Amazon SageMaker that leverages on the <strong>Script Mode</strong> execution that is implemented by the sagemaker-containers library. Reference documentation is available at https://github.com/aws/sagemaker-containers

We start by defining some variables like the current execution role, the ECR repository that we are going to use for pushing the custom Docker container and a default Amazon S3 bucket to be used by Amazon SageMaker.

In [3]:
import boto3
import sagemaker
from sagemaker import get_execution_role

ecr_namespace = 'sagemaker-training-containers/'
prefix = 'script-mode-container-xgb'

ecr_repository_name = ecr_namespace + prefix
role = "arn:aws:iam::342474125894:role/service-role/AmazonSageMaker-ExecutionRole-20190405T234154"
account_id = role.split(':')[4]
region = boto3.Session().region_name
sagemaker_session = sagemaker.session.Session()
bucket = sagemaker_session.default_bucket()

print(account_id)
print(region)
print(role)
print(bucket)
print(ecr_repository_name)

342474125894
ap-southeast-1
arn:aws:iam::342474125894:role/service-role/AmazonSageMaker-ExecutionRole-20190405T234154
sagemaker-ap-southeast-1-342474125894
sagemaker-training-containers/script-mode-container-xgb


<h3>Build and push the container</h3>
We are now ready to build this container and push it to Amazon ECR. This task is executed using a shell script stored in the ../script/ folder. Let's take a look at this script and then execute it.

In [2]:
! ../scripts/build_and_push.sh $account_id $region $ecr_repository_name

Sending build context to Docker daemon  13.31kB
Step 1/16 : FROM ubuntu:16.04
 ---> 13c9f1285025
Step 2/16 : LABEL maintainer="Giuseppe A. Porcelli"
 ---> Using cache
 ---> 6bbf3d07c68d
Step 3/16 : ARG PYTHON=python3
 ---> Using cache
 ---> 8e254b9ef0a0
Step 4/16 : ARG PYTHON_PIP=python3-pip
 ---> Using cache
 ---> 84c928b11bb3
Step 5/16 : ARG PIP=pip3
 ---> Using cache
 ---> 65e780b1f9d7
Step 6/16 : ARG PYTHON_VERSION=3.6.6
 ---> Using cache
 ---> 03bab72f170e
Step 7/16 : RUN apt-get update && apt-get install -y --no-install-recommends software-properties-common &&     add-apt-repository ppa:deadsnakes/ppa -y &&     apt-get update && apt-get install -y --no-install-recommends     build-essential     ca-certificates     curl     wget     git     libopencv-dev     openssh-client     openssh-server     vim     zlib1g-dev &&     rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> e29c159657d9
Step 8/16 : RUN wget https://www.python.org/ftp/python/$PYTHON_VERSION/Python-$PYTHON_VERSION.tgz

<h3>Training with Amazon SageMaker</h3>

Once we have correctly pushed our container to Amazon ECR, we are ready to start training with Amazon SageMaker, which requires the ECR path to the Docker container used for training as parameter for starting a training job.

In [2]:
container_image_uri = '{0}.dkr.ecr.{1}.amazonaws.com/{2}:latest'.format(account_id, region, ecr_repository_name)
print(container_image_uri)

342474125894.dkr.ecr.ap-southeast-1.amazonaws.com/sagemaker-training-containers/script-mode-container-xgb:latest


You can realize that the training code has been implemented as a standard Python script, that will be invoked by the sagemaker-containers library passing hyperparameters as arguments. This way of invoking training script is indeed called <strong>Script Mode</strong> for Amazon SageMaker containers.

<h3>Prepare Data</h3>

Now, we upload some dummy data to Amazon S3, in order to define our S3-based training channels.

In [3]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import pandas as pd

X, y = make_classification(
    n_samples=100, n_features=5, n_redundant=0, n_informative=2, n_clusters_per_class=1, n_classes=3
)
print(f"X: {X.shape}, y:{y.shape}")

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
X_train = pd.DataFrame(X_train)
X_test = pd.DataFrame(X_test)
y_train = pd.DataFrame(y_train)
y_test = pd.DataFrame(y_test)
train_df = pd.concat([y_train, X_train], axis=1)
train_df.columns = range(train_df.shape[1])
test_df = pd.concat([y_test, X_test], axis=1)
test_df.columns = range(test_df.shape[1])

print(f"train_df: {train_df.shape}, test_df:{test_df.shape}")

X: (100, 5), y:(100,)
train_df: (70, 6), test_df:(30, 6)


In [6]:
train_filename = "../test_data/train/train.csv"
test_filename = "../test_data/val/test.csv"
train_df.to_csv(train_filename, header=True, index=False)
test_df.to_csv(test_filename, header=True, index=False)

train_uri = sagemaker_session.upload_data(train_filename, bucket, prefix + '/train')
test_uri = sagemaker_session.upload_data(test_filename, bucket, prefix + '/val')
print(train_uri)
print(test_uri)
#! rm $train_filename $test_filename

s3://sagemaker-ap-southeast-1-342474125894/script-mode-container-xgb/train/train.csv
s3://sagemaker-ap-southeast-1-342474125894/script-mode-container-xgb/val/test.csv


<h3>Training

Finally, we can execute the training job by calling the fit() method of the generic Estimator object defined in the Amazon SageMaker Python SDK (https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/estimator.py). This corresponds to calling the CreateTrainingJob() API (https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateTrainingJob.html).

In [10]:
import sagemaker
import json

# JSON encode hyperparameters to avoid showing some info messages raised by the sagemaker-containers library.
def json_encode_hyperparameters(hyperparameters):
    return {str(k): json.dumps(v) for (k, v) in hyperparameters.items()}

# hyperparameters = json_encode_hyperparameters({
#     "hp1": "value1",
#     "hp2": 300,
#     "hp3": 0.001})

hyperparameters = {
    "hp1": "value1",
    "hp2": 300,
    "hp3": 0.001}

est = sagemaker.estimator.Estimator(container_image_uri,
                                    role,
                                    instance_count=1, 
                                    #instance_type='local', # we use local mode
                                    instance_type='ml.m5.xlarge',
                                    base_job_name=prefix,
                                    hyperparameters=hyperparameters)

train_config = sagemaker.inputs.TrainingInput('s3://{0}/{1}/train/'.format(bucket, prefix), content_type='text/csv')
val_config = sagemaker.inputs.TrainingInput('s3://{0}/{1}/val/'.format(bucket, prefix), content_type='text/csv')
est.fit({'train': train_config, 'validation': val_config })

2020-08-11 06:33:49 Starting - Starting the training job...
2020-08-11 06:33:52 Starting - Launching requested ML instances......
2020-08-11 06:35:01 Starting - Preparing the instances for training...
2020-08-11 06:35:42 Downloading - Downloading input data
2020-08-11 06:35:42 Training - Downloading the training image.....[34m2020-08-11 06:36:29,233 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2020-08-11 06:36:29,234 sagemaker-containers INFO     Failed to parse hyperparameter hp1 value value1 to Json.[0m
[34mReturning the value itself[0m
[34m2020-08-11 06:36:29,252 sagemaker-containers INFO     Failed to parse hyperparameter hp1 value value1 to Json.[0m
[34mReturning the value itself[0m
[34m2020-08-11 06:36:29,255 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2020-08-11 06:36:29,266 sagemaker-containers INFO     Failed to parse hyperparameter hp1 value value1 to Json.[0m
[34mReturning the value

<h3>check model artifact</h3>
    
make sure it is Booster type,

In [11]:
artifact_path = est.latest_training_job.describe()['ModelArtifacts']['S3ModelArtifacts']
print(artifact_path)


s3://sagemaker-ap-southeast-1-342474125894/script-mode-container-xgb-2020-08-11-06-33-54-847/output/model.tar.gz


In [14]:
! aws s3 cp $artifact_path .
! tar -xvf model.tar.gz
! rm model.tar.gz

download: s3://sagemaker-ap-southeast-1-342474125894/script-mode-container-xgb-2020-08-11-06-33-54-847/output/model.tar.gz to ./model.tar.gz
x model.pth


In [15]:
import pickle
with open('model.pth', 'rb') as f:
    model = pickle.load(f)
    
model

<xgboost.core.Booster at 0x124ea6d60>

<h1> Method 2: Train with Prebuilt container </h1>

Prebuilt container has both sagemaker-container/sagemaker-inference install in a single image.


In [23]:
from sagemaker.xgboost.estimator import XGBoost

hyperparameters = {
    "hp1": "value1",
    "hp2": 300,
    "hp3": 0.001}

xgb_estimator = XGBoost(
    entry_point="train.py",
    source_dir="../docker/code",
    hyperparameters=hyperparameters,
    role=role,
    instance_count=1,
    instance_type="ml.m5.xlarge",
    framework_version="1.0-1",
)

train_config = sagemaker.inputs.TrainingInput('s3://{0}/{1}/train/'.format(bucket, prefix), content_type='text/csv')
val_config = sagemaker.inputs.TrainingInput('s3://{0}/{1}/val/'.format(bucket, prefix), content_type='text/csv')
xgb_estimator.fit({'train': train_config, 'validation': val_config })

2020-08-11 06:48:22 Starting - Starting the training job...
2020-08-11 06:48:24 Starting - Launching requested ML instances......
2020-08-11 06:49:36 Starting - Preparing the instances for training......
2020-08-11 06:50:41 Downloading - Downloading input data
2020-08-11 06:50:41 Training - Downloading the training image...
2020-08-11 06:51:19 Uploading - Uploading generated training model
2020-08-11 06:51:19 Completed - Training job completed
[34mINFO:sagemaker-containers:Imported framework sagemaker_xgboost_container.training[0m
[34mINFO:sagemaker-containers:No GPUs detected (normal if no gpus installed)[0m
[34mINFO:sagemaker_xgboost_container.training:Invoking user training script.[0m
[34mINFO:sagemaker-containers:Module train does not provide a setup.py. [0m
[34mGenerating setup.py[0m
[34mINFO:sagemaker-containers:Generating setup.cfg[0m
[34mINFO:sagemaker-containers:Generating MANIFEST.in[0m
[34mINFO:sagemaker-containers:Installing module with the following command:

In [25]:
from sagemaker.serializers import CSVSerializer

serializer = CSVSerializer()
serializer.CONTENT_TYPE = "text/csv"

predictor = xgb_estimator.deploy(
    initial_instance_count=1,
    instance_type="ml.t2.medium",
    serializer=serializer
)


-------------------!

In [None]:
with open("file.csv") as f:
    payload = f.read()

predictor.predict(payload)

<h3> Get Predictor </h3>

In [33]:
from sagemaker.xgboost.model import XGBoostPredictor
import numpy as np

endpoint_name = "sagemaker-xgboost-2020-08-11-06-52-30-545"
payload = "1,2,3,4,5\n2,3,4,5,6"

xgb_predictor = XGBoostPredictor(endpoint_name)

xgb_predictor.predict(payload)

[['0.335604', '0.32953852', '0.3348575'],
 ['0.3353785', '0.329989', '0.3346325']]

<h3> Low level API - inference </h3>

In [27]:
import boto3

runtime_client = boto3.client('runtime.sagemaker')


payload = "1,2,3,4,5"
endpoint_name = 'sagemaker-xgboost-2020-08-11-06-52-30-545'

response = runtime_client.invoke_endpoint(EndpointName=endpoint_name,
                                   ContentType='text/csv',
                                   Body=payload)

result = response['Body'].read().decode('ascii')
print(result)

[[0.3306555449962616, 0.33722642064094543, 0.33211803436279297]]
