### Building and registering the container

The `build-and-push.sh` builds the container image using `docker build` and push the container image to ECR using `docker push`. 

If the `gpu` argument is passed to `build-and-push.sh` the GPU Docker file is used to create the GPU instance.  Otherwise the CPU instance is created.

This code looks for an ECR repository in the account you're using and the current default region (if you're using a SageMaker notebook instance, this is the region where the notebook instance was created). If the repository doesn't exist, the script will create it. In addition, since we are using the SageMaker PyTorch image as the base, we will need to retrieve ECR credentials to pull this public image.

In [7]:
!./build_and_push_sagemaker.sh

Requesting RL image
Login Succeeded
Login Succeeded
Sending build context to Docker daemon  281.1kB
Step 1/11 : ARG REGION=us-east-1
Step 2/11 : FROM 763104351884.dkr.ecr.$REGION.amazonaws.com/pytorch-training:1.8.1-cpu-py36
 ---> 82ca317d0c5e
Step 3/11 : RUN apt-get update && apt-get -y install cmake libopenmpi-dev zlib1g-dev
 ---> Using cache
 ---> fc5dcc3bc043
Step 4/11 : RUN pip install --upgrade pip
 ---> Using cache
 ---> ca4f4c9b99c3
Step 5/11 : COPY requirements.txt requirements.txt
 ---> Using cache
 ---> 50cf28c845d7
Step 6/11 : RUN pip install -r requirements.txt
 ---> Using cache
 ---> 3c9723108cfd
Step 7/11 : ENV PATH="/opt/ml/code:${PATH}"
 ---> Using cache
 ---> 08ac6cd8073a
Step 8/11 : COPY /src /opt/ml/code
 ---> Using cache
 ---> 49a335700f62
Step 9/11 : RUN chmod -R 755 /opt/ml/code
 ---> Using cache
 ---> 8324db4b949d
Step 10/11 : ENV SAGEMAKER_SUBMIT_DIRECTORY /opt/ml/code
 ---> Using cache
 ---> d97e7939811e
Step 11/11 : ENV SAGEMAKER_PROGRAM models/train_da_rnn_m

## Testing your algorithm on your local machine

When you're packaging your first algorithm to use with Amazon SageMaker, you probably want to test it yourself to make sure it's working correctly. We use the [SageMaker Python SDK](https://github.com/aws/sagemaker-python-sdk) to test both locally and on SageMaker. For more examples with the SageMaker Python SDK, see [Amazon SageMaker Examples](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk). In order to test our algorithm, we need our dataset.

## SageMaker Python SDK Local Training
To represent our training, we use the Estimator class, which needs to be configured in five steps. 
1. IAM role - our AWS execution role
2. train_instance_count - number of instances to use for training.
3. train_instance_type - type of instance to use for training. For training locally, we specify `local`.
4. image_name - our custom PyTorch Docker image we created.
5. hyperparameters - hyperparameters we want to pass.

Let's start with setting up our IAM role. We make use of a helper function within the Python SDK. This function throw an exception if run outside of a SageMaker notebook instance, as it gets metadata from the notebook instance.

### Training the Reinforcement Learning Model Locally
Note we are only training for 200 iterations, which is too few to see any increase in the average score.  We are a purely checking for mechanical errors.

In [8]:
from sagemaker.estimator import Estimator
from sagemaker import get_execution_role

role = get_execution_role()
estimator = Estimator(role=role,
                      instance_count=1,
                      instance_type='local',
                      image_uri='rl-portfolio-optimization:latest',
                      hyperparameters={'epochs': 1000})

estimator.fit()

Creating dwkohi7omq-algo-1-3rm0l ... 
Creating dwkohi7omq-algo-1-3rm0l ... done
Attaching to dwkohi7omq-algo-1-3rm0l
[36mdwkohi7omq-algo-1-3rm0l |[0m 2021-09-26 22:57:36,800 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training
[36mdwkohi7omq-algo-1-3rm0l |[0m 2021-09-26 22:57:36,802 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
[36mdwkohi7omq-algo-1-3rm0l |[0m 2021-09-26 22:57:36,813 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.
[36mdwkohi7omq-algo-1-3rm0l |[0m 2021-09-26 22:57:36,817 sagemaker_pytorch_container.training INFO     Invoking user training script.
[36mdwkohi7omq-algo-1-3rm0l |[0m 2021-09-26 22:57:36,820 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
[36mdwkohi7omq-algo-1-3rm0l |[0m 2021-09-26 22:57:36,834 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
[36mdwkohi7omq-algo-1-

## Training on SageMaker
Training a model on SageMaker with the Python SDK is done in a way that is similar to the way we trained it locally. This is done by changing our train_instance_type from `local` to one of the [supported EC2 instance types](https://aws.amazon.com/sagemaker/pricing/instance-types/).

### Locate the ECR image just built and pushed

In [9]:
import boto3

client = boto3.client('sts')
account = client.get_caller_identity()['Account']
region = boto3.Session().region_name
ecr_image = '{}.dkr.ecr.{}.amazonaws.com/rl-portfolio-optimization:latest'.format(account, region)

print(ecr_image)

662572584943.dkr.ecr.us-east-1.amazonaws.com/rl-portfolio-optimization:latest


### Submit the training job

In [11]:
from sagemaker.estimator import Estimator
estimator = Estimator(role=role,
                      instance_count=1,
                      instance_type='ml.m4.xlarge',
                      image_name=ecr_image,
                      image_uri=ecr_image,
                      hyperparameters={'epochs': 200})
estimator.fit()

2021-09-26 23:23:50 Starting - Starting the training job...
2021-09-26 23:23:54 Starting - Launching requested ML instancesProfilerReport-1632698630: InProgress
.........
2021-09-26 23:25:41 Starting - Preparing the instances for training......
2021-09-26 23:26:48 Downloading - Downloading input data
2021-09-26 23:26:48 Training - Downloading the training image..............[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2021-09-26 23:29:06,887 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2021-09-26 23:29:06,889 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2021-09-26 23:29:06,900 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2021-09-26 23:29:06,909 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2021-09-26 23:29