## SageMaker Training

Now imagine that you have an idea of which model to use and has performed hyperparameter tuning. All the code has also been placed into a python script.

Instead of running locally, you are now ready to perform data preprocessing using SageMaker framework processing job with managed ec2 instance.


### Reference

https://sagemaker.readthedocs.io/en/stable/overview.html#train-a-model-with-the-sagemaker-python-sdk

- `SM_CHANNEL_XXXX`: A string that represents the path to the directory that contains the input data for the specified channel.

- Example: 

```python

# Trigger traininig job with SageMaker framework
sklearn.fit({"train": s3_input_train, "test": s3_input_validation}, wait=False)

# Retrieve the train/test data in python script
parser.add_argument("--train", type=str, default="/opt/ml/input/data/train")
parser.add_argument("--train", type=str, default=os.environ['SM_CHANNEL_TRAIN'])
parser.add_argument("--train", type=str, os.getenv("SM_CHANNEL_TRAIN", "/opt/ml/input/data/train")
```

### Setup

Verify you have completed processing job with transformed data in `/tmp/{train, test, model}`

### Verify python script

Verify that your script can run successfully without any bug to speed up development

In [1]:
%%bash

python ../src/mlmax/train.py --train /tmp/train --test /tmp/test --model-dir /tmp/model


Received arguments Namespace(inspect=False, model_dir='/tmp/model', test='/tmp/test', train='/tmp/train')
Reading train data from /tmp/train
Reading test data from /tmp/test
Training LR model
Validating LR model
Creating classification evaluation report
Classification report:
{'0': {'precision': 0.9405025868440503, 'recall': 0.7476498237367802, 'f1-score': 0.8330605564648118, 'support': 17020}, '1': {'precision': 0.38254744105807936, 'recall': 0.7677437968840162, 'f1-score': 0.510650546919977, 'support': 3466}, 'accuracy': 0.751049497217612, 'macro avg': {'precision': 0.6615250139510649, 'recall': 0.7576968103103983, 'f1-score': 0.6718555516923944, 'support': 20486}, 'weighted avg': {'precision': 0.8461028731227688, 'recall': 0.751049497217612, 'f1-score': 0.7785124214905661, 'support': 20486}, 'roc_auc': 0.7576968103103983}
{'0': {'precision': 0.9405025868440503, 'recall': 0.7476498237367802, 'f1-score': 0.8330605564648118, 'support': 17020}, '1': {'precision': 0.38254744105807936, 'r

  return f(*args, **kwargs)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


In [2]:
# Quick check on output files
!ls -l /tmp/model

total 8
-rw-rw-r-- 1 ec2-user ec2-user 1462 Sep 21 00:15 model.joblib
-rw-rw-r-- 1 ec2-user ec2-user 2672 Sep 21 00:14 proc_model.tar.gz


### Run on SageMaker training

Ref: https://github.com/aws/sagemaker-scikit-learn-container

import required packages

In [3]:
import sagemaker
from sagemaker.sklearn.estimator import SKLearn
from sagemaker.inputs import TrainingInput



Setup directory and parameters

In [4]:
role = "arn:aws:iam::342474125894:role/service-role/AmazonSageMaker-ExecutionRole-20190405T234154"
s3_bucket = "wy-project-template"

Create sklearn processor

In [5]:
local_mode = True

if local_mode:
    instance_type = "local"
else:
    instance_type = "ml.m5.xlarge"

sklearn = SKLearn(
    entry_point="../src/mlmax/train.py",
    instance_type="ml.m5.xlarge",
    role=role,
    py_version="py3",
    framework_version="0.23-1",
)


print(f"Container image: {sklearn.image_uri}")

Container image: 121021644041.dkr.ecr.ap-southeast-1.amazonaws.com/sagemaker-scikit-learn:0.23-1-cpu-py3


### S3 - Input data

Understand the data mapping between S3 and local directory

In [6]:
content_type = "csv"
s3_input_train = TrainingInput(
    s3_data=f"s3://{s3_bucket}/sklearn/processed/train_data/", content_type=content_type
)
s3_input_validation = TrainingInput(
    s3_data=f"s3://{s3_bucket}/sklearn/processed/test_data", content_type=content_type
)

sklearn.fit({"train": s3_input_train, "test": s3_input_validation}, wait=False)


preprocessing_job_description = sklearn.jobs[-1].describe()
print(preprocessing_job_description)


{'TrainingJobName': 'sagemaker-scikit-learn-2021-09-21-00-15-34-278', 'TrainingJobArn': 'arn:aws:sagemaker:ap-southeast-1:342474125894:training-job/sagemaker-scikit-learn-2021-09-21-00-15-34-278', 'TrainingJobStatus': 'InProgress', 'SecondaryStatus': 'Starting', 'HyperParameters': {'sagemaker_container_log_level': '20', 'sagemaker_job_name': '"sagemaker-scikit-learn-2021-09-21-00-15-34-278"', 'sagemaker_program': '"train.py"', 'sagemaker_region': '"ap-southeast-1"', 'sagemaker_submit_directory': '"s3://sagemaker-ap-southeast-1-342474125894/sagemaker-scikit-learn-2021-09-21-00-15-34-278/source/sourcedir.tar.gz"'}, 'AlgorithmSpecification': {'TrainingImage': '121021644041.dkr.ecr.ap-southeast-1.amazonaws.com/sagemaker-scikit-learn:0.23-1-cpu-py3', 'TrainingInputMode': 'File', 'EnableSageMakerMetricsTimeSeries': False}, 'RoleArn': 'arn:aws:iam::342474125894:role/service-role/AmazonSageMaker-ExecutionRole-20190405T234154', 'InputDataConfig': [{'ChannelName': 'train', 'DataSource': {'S3Data