# 02 - Project Template Applied Use Case: Sagemaker

**Obs**: This notebook works only using Sagemaker's Environment. Do not try to run it locally or any other place. It won't run.

This notebook shows how you can train, deploy and evaluate new values through a Sagemaker endpoint. These steps require a fully functional custom algorithm container hosted on AWS ECR, as well the data being stored inside the right AWS S3 buckets.

FYI, we're working with the **testebella** bucket, using **houses_train.csv** for training purposes and **houses_test.csv** for prediction.

In [1]:
import boto3
import re
import io
import os
import numpy as np
import pandas as pd
from time import gmtime, strftime

import sagemaker as sage
from sagemaker import get_execution_role
from sagemaker.predictor import csv_serializer

Declare the place and the filename (key) for each CSV dataset. Each string converges to the AWS S3 data location. 
Do declare the account and region object as well to be passed as parameters when calling the estimator object which trains the model using the custom container specified by the image variable.


In [2]:
role = get_execution_role()
sess = sage.Session()

bucket='testebella'
train_data_key = 'houses_train.csv'
test_data_key = 'houses_test.csv'
train_data_location = 's3://{}/{}'.format(bucket, train_data_key)
test_data_location = 's3://{}/{}'.format(bucket, test_data_key)

account = sess.boto_session.client("sts").get_caller_identity()["Account"]
region = sess.boto_session.region_name
image = "{}.dkr.ecr.{}.amazonaws.com/houses-regression:latest".format(account, region)

The model object inherits the **fit()** and predict()** methods usual to machine learning algorithms.
Using the **fit()** method, we can point the houses_train.csv dataset to train the algorithm. Notice that the container does not have the model binary e it isn't hosted in any AWS S3 bucket. Its generated through this train job process and saved later in a specified bucket of your choice.

It's expected the model metrics (Mean Absolute Error in this case), are the very same trained in other locals the container were built and tested. Do pay attention to that. Here, we can see the **Test Mean Absolute Error** is the same obtained in the git repo notebook.

In [4]:
model = sage.estimator.Estimator(
    image,
    role,
    1,
    "ml.c4.2xlarge",
    output_path="s3://{}/output".format(sess.default_bucket()),
    sagemaker_session=sess,
)

model.fit(train_data_location)

2021-12-09 12:47:07 Starting - Starting the training job...
2021-12-09 12:47:30 Starting - Launching requested ML instancesProfilerReport-1639054026: InProgress
......
2021-12-09 12:48:30 Starting - Preparing the instances for training............
2021-12-09 12:50:31 Downloading - Downloading input data
2021-12-09 12:50:31 Training - Downloading the training image...
2021-12-09 12:51:03 Training - Training image download completed. Training in progress..[34mTrain Mean Absolute Error    : 15257.468036901593[0m
[34mTest Mean Absolute Error    : 38853.808498214305[0m
[34mTraining complete.[0m

2021-12-09 12:51:31 Uploading - Uploading generated training model
2021-12-09 12:51:31 Completed - Training job completed
Training seconds: 65
Billable seconds: 65


After the training job, use the **deploy()** method to create an endpoint for your model and the **predict()** method to make predictions.

In [5]:
predictor = model.deploy(1, "ml.t2.medium", serializer=csv_serializer)

--------!

In [10]:
test_data = pd.read_csv(test_data_location, header=None)

In [13]:
print(predictor.predict(test_data.values).decode("utf-8"))

The csv_serializer has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


201561.77
179703.24833333335
242042.8525
249770.68
131698.89107142857
188006.5125
126082.26441666666
165787.7454761905
274305.2825
134462.8775
175266.11091269844
105223.98950396826
105223.98950396826
109479.70684523809
113089.96828373018
249615.02000000005
188601.52416666664
296827.15
279271.645
397350.36
428829.355
249710.5
235032.15464285712
178686.71507936507
274038.8975
182129.04039682544
207041.7925
133310.7
186597.42833333334
310344.34
183040.15
142599.92416666666
174906.81833333336
197172.53916666665
284623.53
175868.4281746032
120668.76019841271
173443.8047619048
172796.3047619048
172796.3047619048
173546.85119047615
173546.85119047615
305203.12916666665
156728.695
191135.9975
259801.12226190476
215665.09
204708.51
200544.50892857142
156484.70833333334
152426.2275
173436.6075
182192.28916666665
174432.99809523814
140753.75
209036.575
232794.68833333335
147647.6935
185541.8075
132516.66941666667
175977.815
149444.25
193333.49083333332
140535.7525
172293.28
177126.72
98708.2045
1

Delete the resources to avoid further billing.

In [14]:
sess.delete_endpoint(predictor.endpoint)

The endpoint attribute has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.
