# SageMaker Marketplace Algorithm

- Various external vendors are selling SageMaker Algorithm through AWS Marketplace.

## Example Code
The model used for this example is `Autogluon`, one of marketplace algorithm.

#### Training is done in the following order:
1. Subscribe algorithm in Marketplace
2. Build development environment
3. Prepare input data
4. Run training job
5. Deploy saved model

## 1. Subscribe algorithm in Marketplace

Go to [AWS marketplace](https://aws.amazon.com/marketplace/search/results?page=1&filters=fulfillment_options%2Cresource_type&fulfillment_options=SAGEMAKER&resource_type=ALGORITHM) and subscribe `Autoglon` algorithm.

After the subscription, we can get algorithm ARN info in [here](https://ap-northeast-2.console.aws.amazon.com/sagemaker/home?region=ap-northeast-2#/algorithms/my-subscriptions).

## 2. Build development environment

#### Set up the Notebook environment
- instance spec: ml.t3.medium (2vCPU + 4GB)
- kernel image: Python 3 (Data Science)

## 3. Prepare input data

#### Download the dataset

I have previously downloaded the dataset in the `./data/bank-additional-full.csv`.<br>
And pre-process the data and upload to object storage.

In [1]:
import pandas as pd

data = pd.read_csv("./data/bank-additional-full.csv", sep=';')

# Split train/test data
train = data.sample(frac=0.7, random_state=42)
test = data.drop(train.index)

train.head()

Unnamed: 0,age,job,marital,education,default,housing,loan,contact,month,day_of_week,...,campaign,pdays,previous,poutcome,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,nr.employed,y
32884,57,technician,married,high.school,no,no,yes,cellular,may,mon,...,1,999,1,failure,-1.8,92.893,-46.2,1.299,5099.1,no
3169,55,unknown,married,unknown,unknown,yes,no,telephone,may,thu,...,2,999,0,nonexistent,1.1,93.994,-36.4,4.86,5191.0,no
32206,33,blue-collar,married,basic.9y,no,no,no,cellular,may,fri,...,1,999,1,failure,-1.8,92.893,-46.2,1.313,5099.1,no
9403,36,admin.,married,high.school,no,no,no,telephone,jun,fri,...,4,999,0,nonexistent,1.4,94.465,-41.8,4.967,5228.1,no
14020,27,housemaid,married,high.school,no,yes,no,cellular,jul,fri,...,2,999,0,nonexistent,1.4,93.918,-42.7,4.963,5228.1,no


In [2]:
import sagemaker

sagemaker_session = sagemaker.Session()

train.to_csv("train.csv", index=False)
train_s3_path = sagemaker_session.upload_data("train.csv", 
                                              bucket=sagemaker_session.default_bucket(), 
                                              key_prefix="data/bank")

print(train_s3_path)

s3://sagemaker-ap-northeast-2-834160605896/data/bank/train.csv


## 4. Run training job

#### Define training job spec

Import `AlgorithmEstimator` object from sagemaker SDK.<br>
Pass the subscribing algorithm ARN.

In [3]:
import sagemaker
from sagemaker.algorithm import AlgorithmEstimator

# Define job specs
algorithm_arn = "arn:aws:sagemaker:ap-northeast-2:745090734665:algorithm/autogluon-tabular-v3-5-cb7001bd0e8243b50adc3338deb44a48"
job_name = "DEMO-marketplace"
instance_type = "ml.m5.4xlarge"

algo = AlgorithmEstimator(
    algorithm_arn=algorithm_arn,
    role=sagemaker.get_execution_role(),
    base_job_name=job_name,
    instance_count=1,
    instance_type=instance_type,
    hyperparameters={
        "init_args": {"label": "y"},
        "fit_args": { "presets": ["optimize_for_deployment"] },
        "feature_importance": True}
)

algo.fit({"training": train_s3_path})

2021-06-27 11:40:38 Starting - Starting the training job...
2021-06-27 11:40:40 Starting - Launching requested ML instancesProfilerReport-1624794038: InProgress
......
2021-06-27 11:41:47 Starting - Preparing the instances for training...
2021-06-27 11:42:35 Downloading - Downloading input data...
2021-06-27 11:42:53 Training - Downloading the training image.........
2021-06-27 11:44:34 Training - Training image download completed. Training in progress..[34m2021-06-27 11:44:34,720 sagemaker-training-toolkit INFO     Imported framework sagemaker_mxnet_container.training[0m
[34m2021-06-27 11:44:34,723 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2021-06-27 11:44:34,723 sagemaker-training-toolkit INFO     Failed to parse hyperparameter init_args value {'label': 'y'} to Json.[0m
[34mReturning the value itself[0m
[34m2021-06-27 11:44:34,723 sagemaker-training-toolkit INFO     Failed to parse hyperparameter fit_args value {'presets': ['op

## 5. Deploy saved models

#### Deploy saved model
After a training, we can host the created model in SageMaker.
After create inference endpoint, we can check the API url.

In [4]:
from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import StringDeserializer

predictor = algo.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.4xlarge",
    serializer=CSVSerializer(),
    deserializer=StringDeserializer(),
)

..........
---------------!

#### Invoke inference API

In [9]:
import numpy as np
from collections import Counter
from sklearn.metrics import accuracy_score

results = predictor.predict(test.to_csv(index=False)).splitlines()

# Check output
y_results = np.array([i.split(",")[0] for i in results])

print(Counter(y_results))

print("accuracy: {}".format(accuracy_score(y_true=test["y"], y_pred=y_results)))

Counter({'no': 11318, 'yes': 1038})
accuracy: 0.9187439300744578
