## Training on sagemaker with base pytorch image

Here all necessary libraries, variables to start the training.
This scenario covers training with default pytorch image, on GPU (1 instance). Training code is in "code" directory.
!!! NOTE : for VPC training, some additiona settings should be set + we will need a different image, with proxy set. 

In [None]:
import boto3
import pandas as pd
import sagemaker
import time

# Get SageMaker session & default S3 bucket
role = sagemaker.get_execution_role()
sagemaker_session = sagemaker.Session()
region = sagemaker_session.boto_region_name


bucket = "yourbucker" 
kms_key = "kmskey"
output_path = "s3://{}/out".format(bucket)
module_path="s3://{}/module".format(bucket)
training_data = "s3//pathtoyourdata"
tensorboard_logs = 's3://{}}/tensorboard/'.format(bucket)

named_tuple = time.localtime() # get struct_time
base_name = "pytorch-custom-"
training_job_name = "{0}{1}".format(base_name,time.strftime("%m-%d-%Y-%H-%M-%S", named_tuple))
checkpoint_s3_uri = "s3://{0}/checkpoint/{1}".format(bucket,training_job_name)

Setting sagemaker tensorboard configuration. Will be used during training. Use Tensorflow 2.0 workspace on Sagemaker, to run tensorboard with your tensorboard logs

In [None]:
from sagemaker.debugger import TensorBoardOutputConfig

tensorboard_output_config = TensorBoardOutputConfig(
    s3_output_path=tensorboard_logs,
    container_local_output_path='/opt/tensorboard/'
)

Setting hyperparameters. Can be accessed within the container

In [None]:
import json

# JSON encode hyperparameters.
def json_encode_hyperparameters(hyperparameters):
    return {str(k): json.dumps(v) for (k, v) in hyperparameters.items()}

# s3paths = paths for validation. unfortunately, did not find a good way to have them piped, so, better downloaded locally
# train.py is adapted for this, and, will download these files locally, and load during validation
hyperparameters = json_encode_hyperparameters({
    "epochs": 2,
    "learning_rate" : 2e-5,
    "max_len" : 128,
    "eps" : 1e-8,
    "batch_size" : 20,
    "steps_epoch" : 1250,
    "s3paths" : {
        "english" : "s3://yourvalidationdata/file.txt",
    }
    
})

###### What below code will do:
1. Start an estimator object - an abstraction level class for Pytorch framework
2. Pack your code (all content of code directory) and put it to "module_dir" s3 path
3. Launch a training job in pipe mode
4. In code file, you can specify requirements.txt with python modules to install. Pytorch container will do it for you

###### A note on Pipe mode with Pytorch:
1. It reads bytes from the file you set as training file. In train.py you specify how many bytes are to be read. 
2. There is a helper created by me for transforming bytes to dataframe (since we have bytes read from object, it might happen it reads half of the row, or half of the object in the row). It is not perfect (it drops cases he cannot decode), but it helps with the transformation. 
3. Data will be read sequencially, until the EOF, and will start again on next epoch. Each sequence will be transformed in dataframe (by train.py script) and will be fed for training. 

In [None]:
from sagemaker.pytorch import PyTorch
from sagemaker.estimator import Estimator

estimator = PyTorch(
    source_dir='code',
    entry_point='train.py',
    code_location=module_path,
    output_path=output_path,
    framework_version="1.6.0",
    py_version="py3",
    output_kms_key=kms_key,
    role=role,
    tensorboard_output_config = tensorboard_output_config,
    checkpoint_s3_uri = checkpoint_s3_uri,
    instance_count = 1,
    hyperparameters = hyperparameters,
    instance_type='ml.p2.xlarge',
    input_mode='Pipe'
)
estimator.fit(job_name=training_job_name, inputs ={"training":f'{training_data}'})