## Step 1: Create custom container using SageMaker PyTorch Deep Learning Framework

Update `role` with your SageMaker role arn.

In [10]:
!pip --version

pip 20.1 from /Users/yihyap/anaconda3/envs/sandbox36/lib/python3.6/site-packages/pip (python 3.6)


In [4]:
import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.pytorch import PyTorch
import warnings
warnings.filterwarnings('ignore')

ecr_namespace = 'sagemaker-training-containers/'
prefix = 'pytorch-training'
ecr_repository_name = ecr_namespace + prefix


ecr_repository_name = ecr_namespace + prefix
role = "arn:aws:iam::342474125894:role/service-role/AmazonSageMaker-ExecutionRole-20190405T234154"
account_id = role.split(':')[4]
region = boto3.Session().region_name
sagemaker_session = sagemaker.session.Session()
bucket = sagemaker_session.default_bucket()

print('Account: {}'.format(account_id))
print('Region: {}'.format(region))
print('Role: {}'.format(role))
print('S3 Bucket: {}'.format(bucket))
print('Repo: {}'.format(ecr_repository_name))

Account: 342474125894
Region: ap-southeast-1
Role: arn:aws:iam::342474125894:role/service-role/AmazonSageMaker-ExecutionRole-20190405T234154
S3 Bucket: sagemaker-ap-southeast-1-342474125894
Repo: sagemaker-training-containers/pytorch-training


### Build training container

Next we will create a script that will build and upload the custom container image into ECR. It has to be in the same region where the job is run.

In [11]:
# ./build_and_push.sh 342474125894 ap-southeast-1 sagemaker-training-containers/pytorch-training
! ../scripts/build_and_push.sh $account_id $region $ecr_repository_name

Sending build context to Docker daemon  18.43kB
Step 1/16 : FROM ubuntu:16.04
 ---> 13c9f1285025
Step 2/16 : LABEL maintainer="Giuseppe A. Porcelli"
 ---> Using cache
 ---> 6bbf3d07c68d
Step 3/16 : ARG PYTHON=python3
 ---> Using cache
 ---> 8e254b9ef0a0
Step 4/16 : ARG PYTHON_PIP=python3-pip
 ---> Using cache
 ---> 84c928b11bb3
Step 5/16 : ARG PIP=pip3
 ---> Using cache
 ---> 65e780b1f9d7
Step 6/16 : ARG PYTHON_VERSION=3.6.6
 ---> Using cache
 ---> 03bab72f170e
Step 7/16 : RUN apt-get update && apt-get install -y --no-install-recommends software-properties-common &&     add-apt-repository ppa:deadsnakes/ppa -y &&     apt-get update && apt-get install -y --no-install-recommends         build-essential         ca-certificates         curl         wget         git         libopencv-dev         openssh-client         openssh-server         vim         zlib1g-dev &&     rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> 0b3f66ca4c73
Step 8/16 : RUN wget https://www.python.org/ftp/python/$P

In [12]:
train_image_uri = '{0}.dkr.ecr.{1}.amazonaws.com/{2}:latest'.format(account_id, region, ecr_repository_name)
print('ECR training container ARN: {}'.format(train_image_uri))

ECR training container ARN: 342474125894.dkr.ecr.ap-southeast-1.amazonaws.com/sagemaker-training-containers/pytorch-training:latest


The docker image is now pushed to ECR. In the next section, we will show how to train an acoustic classification model using the custom container.

## Step 2: Training on SageMaker PyTorch custom container

In [13]:
import sagemaker
import json

hyperparameters = {
    "seed": "1",
    "epochs": 50,
}

est = sagemaker.estimator.Estimator(train_image_uri,
                                    role,
                                    train_instance_count=1, 
                                    #instance_type='local', # we use local mode
                                    train_instance_type='ml.m5.xlarge',
                                    base_job_name=prefix,
                                    hyperparameters=hyperparameters)


est.fit()

#train_config = sagemaker.inputs.TrainingInput('s3://{0}/{1}/train/'.format(bucket, prefix), content_type='text/csv')
#val_config = sagemaker.inputs.TrainingInput('s3://{0}/{1}/val/'.format(bucket, prefix), content_type='text/csv')
#est.fit({'train': train_config, 'validation': val_config })

2020-08-11 15:04:48 Starting - Starting the training job...
2020-08-11 15:04:50 Starting - Launching requested ML instances......
2020-08-11 15:06:17 Starting - Preparing the instances for training......
2020-08-11 15:07:11 Downloading - Downloading input data
2020-08-11 15:07:11 Training - Downloading the training image......
2020-08-11 15:08:16 Uploading - Uploading generated training model.[34m2020-08-11 15:08:11,128 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2020-08-11 15:08:11,148 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2020-08-11 15:08:11,161 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2020-08-11 15:08:11,172 sagemaker-containers INFO     Invoking user script
[0m
[34mTraining Env:
[0m
[34m{
    "additional_framework_parameters": {},
    "channel_input_dirs": {},
    "current_host": "algo-1",
    "framework_module": null,
    "hosts": [
        "a

### Retrieve model location

In [14]:
model_location = est.model_data
print(model_location)

s3://sagemaker-ap-southeast-1-342474125894/pytorch-training-2020-08-11-15-05-07-606/output/model.tar.gz


## Step 3: Inference

For inference, we will use default inference image. Mandatory `model_fn` is implemented in `inference.py`. PyTorchModel is used to deploy custom model that we trained previously.

### Deploy model

In [20]:
from sagemaker.pytorch import PyTorchModel

pytorch_model = PyTorchModel(model_data=model_location, 
                             role=role, 
                             entry_point='inference.py',
                             source_dir='../docker/code',
                             py_version='py3',
                             framework_version='1.5.1',
                            )


In [21]:
predictor = pytorch_model.deploy(initial_instance_count=1, instance_type='ml.m5.xlarge', wait=True)


---------------!

In [None]:
pytorch_model.endpoint_name

### Get Predictor

In [1]:
from sagemaker.pytorch.model import PyTorchPredictor

endpoint_name = "pytorch-inference-2020-08-12-08-52-57-488"
payload = "1,2,3,4,5\n2,3,4,5,6"

predictor = PyTorchPredictor(endpoint_name)



In [None]:
import numpy as np
payload = np.array([1,2,3,4,5])
response = predictor.predict(payload)
prediction = response[0].argmax(axis=1)
print(prediction)

## Step 4: Optional Cleanup

When you're done with the endpoint, you should clean it up.

All of the training jobs, models and endpoints we created can be viewed through the SageMaker console of your AWS account.

In [18]:
predictor.delete_endpoint()