<h1>Script-mode Custom Training Container</h1>

This notebook demonstrates how to build and use a custom Docker container for training with Amazon SageMaker that leverages on the <strong>Script Mode</strong> execution that is implemented by the sagemaker-containers library. Reference documentation is available at https://github.com/aws/sagemaker-containers

We start by defining some variables like the current execution role, the ECR repository that we are going to use for pushing the custom Docker container and a default Amazon S3 bucket to be used by Amazon SageMaker.

In [1]:
import boto3
import sagemaker
from sagemaker import get_execution_role

ecr_namespace = 'sagemaker-training-containers/'
prefix = 'script-mode-container'

ecr_repository_name = ecr_namespace + prefix
role = "arn:aws:iam::342474125894:role/service-role/AmazonSageMaker-ExecutionRole-20190405T234154"
account_id = role.split(':')[4]
region = boto3.Session().region_name
sagemaker_session = sagemaker.session.Session()
bucket = sagemaker_session.default_bucket()

print("account_id", account_id)
print("region", region)
print("role", role)
print("bucket", bucket)
print("ecr_repository_name", ecr_repository_name)

account_id 342474125894
region ap-southeast-1
role arn:aws:iam::342474125894:role/service-role/AmazonSageMaker-ExecutionRole-20190405T234154
bucket sagemaker-ap-southeast-1-342474125894
ecr_repository_name sagemaker-training-containers/script-mode-container


<h3>Build and push the container</h3>
We are now ready to build this container and push it to Amazon ECR. This task is executed using a shell script stored in the ../script/ folder. Let's take a look at this script and then execute it.

In [2]:
! ../scripts/build_and_push.sh $account_id $region $ecr_repository_name

[5/9] RUN ln -s $(which python3) /usr/local/bin/python                 0.3s
[0m[34m => [6/9] RUN pip3 install --no-cache --upgrade     numpy==1.14.5     pa  28.0s
[0m[34m => [7/9] RUN pip3 install --no-cache --upgrade     sagemaker-container  146.8s
[0m[34m => [8/9] COPY code/* /opt/ml/code/                                        0.1s
[0m => exporting to image                                                    11.0s
 => => exporting layers                                                   11.0s
[?25h[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[0G[?25l[+] Building 980.5s (13/14)                                                     
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 2.27kB                                     0.0s
[0m[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                 

<h3>Training with Amazon SageMaker</h3>

Once we have correctly pushed our container to Amazon ECR, we are ready to start training with Amazon SageMaker, which requires the ECR path to the Docker container used for training as parameter for starting a training job.

In [3]:
container_image_uri = '{0}.dkr.ecr.{1}.amazonaws.com/{2}:latest'.format(account_id, region, ecr_repository_name)
print(container_image_uri)

342474125894.dkr.ecr.ap-southeast-1.amazonaws.com/sagemaker-training-containers/script-mode-container:latest


In [4]:
! echo "val1, val2, val3" > dummy.csv
print(sagemaker_session.upload_data('dummy.csv', bucket, prefix + '/train'))
print(sagemaker_session.upload_data('dummy.csv', bucket, prefix + '/val'))
! rm dummy.csv

s3://sagemaker-ap-southeast-1-342474125894/script-mode-container/train/dummy.csv
s3://sagemaker-ap-southeast-1-342474125894/script-mode-container/val/dummy.csv


Finally, we can execute the training job by calling the fit() method of the generic Estimator object defined in the Amazon SageMaker Python SDK (https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/estimator.py). This corresponds to calling the CreateTrainingJob() API (https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateTrainingJob.html).

In [5]:
import sagemaker
import json

# JSON encode hyperparameters to avoid showing some info messages raised by the sagemaker-containers library.
def json_encode_hyperparameters(hyperparameters):
    return {str(k): json.dumps(v) for (k, v) in hyperparameters.items()}

hyperparameters = json_encode_hyperparameters({
    "hp1": "value1",
    "hp2": 300,
    "hp3": 0.001})

est = sagemaker.estimator.Estimator(container_image_uri,
                                    role, 
                                    instance_count=1, 
                                    instance_type='local', # we use local mode
                                    #instance_type='ml.m5.xlarge',
                                    base_job_name=prefix,
                                    hyperparameters=hyperparameters)

train_config = sagemaker.session.s3_input('s3://{0}/{1}/train/'.format(bucket, prefix), content_type='text/csv')
val_config = sagemaker.session.s3_input('s3://{0}/{1}/val/'.format(bucket, prefix), content_type='text/csv')

est.fit({'train': train_config, 'validation': val_config })

The class sagemaker.session.s3_input has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.
The class sagemaker.session.s3_input has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.
Creating tmptraqfog4_algo-1-1dexd_1 ... 
[1BAttaching to tmptraqfog4_algo-1-1dexd_1
[36malgo-1-1dexd_1  |[0m 2021-01-04 07:30:40,259 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)
[36malgo-1-1dexd_1  |[0m 2021-01-04 07:30:40,299 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)
[36malgo-1-1dexd_1  |[0m 2021-01-04 07:30:40,322 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)
[36malgo-1-1dexd_1  |[0m 2021-01-04 07:30:40,336 sagemaker-containers INFO     Invoking user script
[36malgo-1-1dexd_1  |[0m 
[36malgo-1-1dexd_1  |[0m Training Env:
[36malgo-1-1dexd_1  |[0m 
[36malgo-1-1dexd_1  |[0m {
[36malgo-1-1dexd_1  |