# Now, we can start a new training job

We'll send a zip file called **trainingjob.zip**, with the following structure:
 - trainingjob.json (Sagemaker training job descriptor)
 - assets/deploy-model-prd.yml (Cloudformation for deploying our model into Production)
 - assets/deploy-model-dev.yml (Cloudformation for deploying our model into Development)

## Let's start by defining the hyperparameters for both algorithms

In [None]:
hyperparameters = {
    "logistic_max_iter": 100,
    "logistic_solver": "lbfgs",

    "random_forest_max_depth": 10,
    "random_forest_n_jobs": 5,
    "random_forest_verbose": 1
}

## Then, let's  create the trainingjob descriptor

In [None]:
import time
import sagemaker
import boto3

sts_client = boto3.client("sts")

model_prefix='iris-model'
account_id = sts_client.get_caller_identity()["Account"]
region = boto3.session.Session().region_name
training_image = '{}.dkr.ecr.{}.amazonaws.com/{}:latest'.format(account_id, region, model_prefix)
roleArn = "arn:aws:iam::{}:role/Airliquide".format(account_id)
timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())
job_name = model_prefix + timestamp
sagemaker_session = sagemaker.Session()

training_params = {}

# Here we set the reference for the Image Classification Docker image, stored on ECR (https://aws.amazon.com/pt/ecr/)
training_params["AlgorithmSpecification"] = {
    "TrainingImage": training_image,
    "TrainingInputMode": "File"
}

# The IAM role with all the permissions given to Sagemaker
training_params["RoleArn"] = roleArn

# Here Sagemaker will store the final trained model
training_params["OutputDataConfig"] = {
    "S3OutputPath": 's3://{}/{}'.format(sagemaker_session.default_bucket(), model_prefix)
}

# This is the config of the instance that will execute the training
training_params["ResourceConfig"] = {
    "InstanceCount": 1,
    "InstanceType": "ml.t2.medium",
    "VolumeSizeInGB": 30
}

# The job name. You'll see this name in the Jobs section of the Sagemaker's console
training_params["TrainingJobName"] = job_name

for i in hyperparameters:
    hyperparameters[i] = str(hyperparameters[i])
    
# Here you will configure the hyperparameters used for training your model.
training_params["HyperParameters"] = hyperparameters

# Training timeout
training_params["StoppingCondition"] = {
    "MaxRuntimeInSeconds": 360000
}

# The algorithm currently only supports fullyreplicated model (where data is copied onto each machine)
training_params["InputDataConfig"] = []

# Please notice that we're using application/x-recordio for both 
# training and validation datasets, given our dataset is formated in RecordIO

# Here we set training dataset
# Training data should be inside a subdirectory called "train"
training_params["InputDataConfig"].append({
    "ChannelName": "training",
    "DataSource": {
        "S3DataSource": {
            "S3DataType": "S3Prefix",
            "S3Uri": 's3://{}/{}/input'.format(sagemaker_session.default_bucket(), model_prefix),
            "S3DataDistributionType": "FullyReplicated"
        }
    },
    "ContentType": "text/csv",
    "CompressionType": "None"
})
training_params["Tags"] = []

## Before we start the training process, we need to upload our dataset to S3

In [None]:
import sagemaker

# Get the current Sagemaker session
sagemaker_session = sagemaker.Session()

default_bucket = sagemaker_session.default_bucket()
role = sagemaker.get_execution_role()


!mkdir -p input/data/training

import pandas as pd
import numpy as np

from sklearn import datasets
iris = datasets.load_iris()

dataset = np.insert(iris.data, 0, iris.target,axis=1)

pd = pd.DataFrame(data=dataset, columns=['iris_id'] + iris.feature_names)
pd.to_csv('input/data/training/iris.csv', header=None, index=False, sep=',', encoding='utf-8')

data_location = sagemaker_session.upload_data(path='input/data/training', key_prefix='iris-model/input')

## Alright! Now it's time to start the training process

In [None]:
import boto3
import io
import zipfile
import json

s3 = boto3.client('s3')
sts_client = boto3.client("sts")

session = boto3.session.Session()

account_id = sts_client.get_caller_identity()["Account"]
region = session.region_name

bucket_name = "mlops-%s-%s" % (region, account_id)
key_name = "training_jobs/iris_model/trainingjob.zip"

zip_buffer = io.BytesIO()
with zipfile.ZipFile(zip_buffer, 'a') as zf:
    zf.writestr('trainingjob.json', json.dumps(training_params))
    zf.writestr('assets/deploy-model-prd.yml', open('../../assets/deploy-model-prd.yml', 'r').read())
    zf.writestr('assets/deploy-model-dev.yml', open('../../assets/deploy-model-dev.yml', 'r').read())

zip_buffer.seek(0)

s3.put_object(Bucket=bucket_name, Key=key_name, Body=bytearray(zip_buffer.read()))

### Ok, now open the AWS console in another tab and go to the CodePipeline console to see the status of our building pipeline

> Finally, click here [NOTEBOOK](04_Check%20Progress%20and%20Test%20the%20endpoint.ipynb) to see the progress and test your endpoint