# SageMaker Built-in Algorithm

- It is a training algorithm provided by Sagemaker
- User don't need to develop the training code.
- Can train quickly with minimal effort.
- Only need to choose suitable algorithm for thier problem type

## Example Code
The model used for this example is `LinearLearner`, one of SageMaker's built-in algorithm.

#### Training will be done in the following order:
1. Build development environment
2. Prepare input data
3. Run training job
4. Deploy saved models

## 1. Build development environment

#### Set up the Notebook environment
- instance spec: ml.t3.medium (2vCPU + 4GB)
- kernel image: Python 3 (Data Science)

## 2. Prepare input data

#### Download housing dataset
I have prepared the data in the `./data/kc_house_data.csv`.

In [1]:
import pandas as pd

df = pd.read_csv("./data/kc_house_data.csv")
df.head()

Unnamed: 0,id,date,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,...,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15
0,7129300520,20141013T000000,221900.0,3,1.0,1180,5650,1.0,N,0,...,7,1180,0,1955,0,98178,47.5112,-122.257,1340,5650
1,6414100192,20141209T000000,538000.0,3,2.25,2570,7242,2.0,N,0,...,7,2170,400,1951,1991,98125,47.721,-122.319,1690,7639
2,5631500400,20150225T000000,180000.0,2,1.0,770,10000,1.0,N,0,...,6,770,0,1933,0,98028,47.7379,-122.233,2720,8062
3,2487200875,20141209T000000,604000.0,4,3.0,1960,5000,1.0,N,0,...,7,1050,910,1965,0,98136,47.5208,-122.393,1360,5000
4,1954400510,20150218T000000,510000.0,3,2.0,1680,8080,1.0,N,0,...,8,1680,0,1987,0,98074,47.6168,-122.045,1800,7503


#### Load train, test, validation data

In [2]:
import numpy as np
from sklearn.model_selection import train_test_split

ys = np.array(df['price']).astype("float32")
xs = np.array(df['sqft_living']).astype("float32").reshape(-1,1)

np.random.seed(8675309)
train_features, test_features, train_labels, test_labels = train_test_split(xs, ys, test_size=0.2)
val_features, test_features, val_labels, test_labels = train_test_split(test_features, test_labels, test_size=0.5)

## 3. Run Training Job

#### Define training job spec

To use built-in algorithm, import `LinearLearner` object from sagemaker SDK.  
We can pass job spec as parameter.

In [3]:
import sagemaker
from sagemaker import LinearLearner

job_name = 'DEMO-built-in'
instance_type = 'ml.m4.xlarge'

linear_model = LinearLearner(role=sagemaker.get_execution_role(),
                             base_job_name=job_name,
                             instance_count=1,
                             instance_type=instance_type,
                             predictor_type='regressor')

*To transfer our training data to built-in algorithm, convert it to sagemaker's `RecordSet`*

In [4]:
train_records = linear_model.record_set(train_features, train_labels, channel='train')
val_records = linear_model.record_set(val_features, val_labels, channel='validation')
test_records = linear_model.record_set(test_features, test_labels, channel='test')

#### Run training job
To start training, we need to invoke `fit` method of LinearLearner.  
And pass training data as parameter.

We can check the training progress and real-time logs in the output.

In [5]:
linear_model.fit([train_records, val_records, test_records])

Defaulting to the only supported framework/algorithm version: 1. Ignoring framework/algorithm version: 1.


2021-06-27 12:58:23 Starting - Starting the training job...
2021-06-27 12:58:47 Starting - Launching requested ML instancesProfilerReport-1624798703: InProgress
......
2021-06-27 12:59:47 Starting - Preparing the instances for training.........
2021-06-27 13:01:13 Downloading - Downloading input data
2021-06-27 13:01:13 Training - Downloading the training image..[34mDocker entrypoint called with argument(s): train[0m
[34mRunning default environment configuration script[0m
[34m[06/27/2021 13:01:37 INFO 139729954969408] Reading default configuration from /opt/amazon/lib/python3.7/site-packages/algorithm/resources/default-input.json: {'mini_batch_size': '1000', 'epochs': '15', 'feature_dim': 'auto', 'use_bias': 'true', 'binary_classifier_model_selection_criteria': 'accuracy', 'f_beta': '1.0', 'target_recall': '0.8', 'target_precision': '0.8', 'num_models': 'auto', 'num_calibration_samples': '10000000', 'init_method': 'uniform', 'init_scale': '0.07', 'init_sigma': '0.01', 'init_bias':

#### Monitor trianing

We can monitor training status, metrics in [AWS console page](
https://ap-northeast-2.console.aws.amazon.com/sagemaker/home?region=ap-northeast-2#/jobs).<br>
Or, we can also use sagemaker SDK to analysis training results.

In [6]:
sagemaker.analytics.TrainingJobAnalytics(linear_model._current_job_name, metric_names = ['test:mse', 'test:absolute_loss']).dataframe()

Unnamed: 0,timestamp,metric_name,value
0,0.0,test:mse,69602620000.0
1,0.0,test:absolute_loss,175449.3


## 4. Deploy saved model

#### Deploy saved model
After training job is finished, we can deploy the model using `deploy` method of SDK.  
We can check the deployed server information in [AWS console](https://ap-northeast-2.console.aws.amazon.com/sagemaker/home?region=ap-northeast-2#/endpoints).

In [7]:
predictor = linear_model.deploy(initial_instance_count=1, instance_type="ml.m4.xlarge")

Defaulting to the only supported framework/algorithm version: 1. Ignoring framework/algorithm version: 1.


---------------!

#### Invoke inference API

We can invoke deployed server using URL.  
However, SageMaker SDK provides a method to invoke this server more easily.

We can check the predicted result in below.

In [9]:
for i in range(0, 10):
    result = predictor.predict(test_features[i])
    score = result[0].label['score'].float32_tensor.values[0]
    print("A %d sqft home is predicted to cost %f" % (test_features[i], score))

A 1320 sqft home is predicted to cost 329956.781250
A 3020 sqft home is predicted to cost 801750.187500
A 1680 sqft home is predicted to cost 429865.968750
A 1300 sqft home is predicted to cost 324406.281250
A 1700 sqft home is predicted to cost 435416.500000
A 2740 sqft home is predicted to cost 724043.062500
A 3580 sqft home is predicted to cost 957164.500000
A 1770 sqft home is predicted to cost 454843.281250
A 7440 sqft home is predicted to cost 2028413.000000
A 1400 sqft home is predicted to cost 352158.812500
