
#### Amazon SageMaker is a fully managed machine learning service. 

* easily build and train machine learning models, and then directly deploy them into a production-ready hosted environment. 

* It provides an integrated Jupyter authoring notebook instance for easy access to your data sources for exploration and analysis, so you don't have to manage servers

* . It also provides common machine learning algorithms that are optimized to run efficiently against extremely large data in a distributed environment.


![img.png](https://docs.aws.amazon.com/sagemaker/latest/dg/images/ml-concepts-10.png)

### What is Boto

Boto is the Amazon Web Services (AWS) SDK for Python. It enables Python developers to create, configure, and manage AWS services, such as EC2 and S3. Boto provides an easy to use, object-oriented API, as well as low-level access to AWS services.

In [1]:
import sagemaker
import boto3
from sagemaker.amazon.amazon_estimator import get_image_uri
from sagemaker.session import s3_input, Session
print('import successfull')

import successfull


In [2]:
 # <--- CHANGE THIS VARIABLE TO A UNIQUE NAME FOR YOUR BUCKET
BUCKET_NAME = 'bank-churn-app2020'
## set the region of the instance
my_region = boto3.session.Session().region_name
print(my_region)

us-east-1


In [3]:
s3 = boto3.resource('s3')
print(s3)

s3.ServiceResource()


In [4]:
try:
    if my_region=='us-east-1':
        s3.create_bucket(Bucket =BUCKET_NAME)
        print('bucket created successfully')
except Exception as e:
    print(f's3 error{e}')

bucket created successfully


In [52]:
# set an output path where the trained model will be saved
prefix = 'xgboost-inbulit-algo'
OUTPUT_PATH = f's3://{BUCKET_NAME}/{prefix}'
print(OUTPUT_PATH)

s3://bank-churn-app2020/xgboost-inbulit-algo


#### Downloading the dataset and storing into S3

In [53]:
import pandas as pd
import numpy as np
import urllib.request
import os
import sys

In [54]:
try:
    urllib.request.urlretrieve ("https://d1.awsstatic.com/tmt/build-train-deploy-machine-learning-model-sagemaker/bank_clean.27f01fbbdf43271788427f3682996ae29ceca05d.csv", "bank_clean.csv")
    print('Success: downloaded bank_clean.csv.')
except Exception as e:
    print('Data load error: ',e)

try:
    model_data = pd.read_csv('./bank_clean.csv',index_col=0)
    print('Success: Data loaded into dataframe.')
except Exception as e:
    print('Data load error: ',e)

Success: downloaded bank_clean.csv.
Success: Data loaded into dataframe.


In [55]:
! dir

bank-churn-app2020.ipynb  bank_clean.csv  lost+found  test.csv	train.csv


In [56]:
# Train_test split
train_data , test_data = np.split(model_data.sample(frac=1,random_state=42),[int(0.7*len(model_data))])
print(train_data.shape , test_data.shape )

(28831, 61) (12357, 61)


In [57]:
### Saving Train And Test Into Buckets
## We start with Train Data
import os
# creating train dataset for model
pd.concat([train_data['y_yes'], train_data.drop(['y_no', 'y_yes'], 
                                                axis=1)], 
                                                axis=1).to_csv('train.csv', index=False, header=False)

# uploding the train dataset to s3 bucket 
boto3.Session().resource('s3').Bucket(BUCKET_NAME).Object(os.path.join(prefix, 'train/train.csv')).upload_file('train.csv')

s3_input_train = sagemaker.s3_input(s3_data=f's3://{BUCKET_NAME}/{prefix}/train', content_type='csv')


's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.


In [58]:
# Test Data Into Buckets
pd.concat([test_data['y_yes'], test_data.drop(['y_no', 'y_yes'], axis=1)], axis=1).to_csv('test.csv', index=False, header=False)

boto3.Session().resource('s3').Bucket(BUCKET_NAME).Object(os.path.join(prefix, 'test/test.csv')).upload_file('test.csv')

s3_input_test = sagemaker.s3_input(s3_data=f's3://{BUCKET_NAME}/{prefix}/test', content_type='csv')

's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.


#### Building model: inbuilt model xgboost

* Fetch the container with the corresponding algorithm to use from the list of pre-defined Sagemaker algorithms or give your own custom container to support custom algorithm

In [59]:
# this line automatically looks for the XGBoost image URI and builds an XGBoost container.
# specify the repo_version depending on your preference.
container = get_image_uri(boto3.Session().region_name,
                          'xgboost', 
                          repo_version='1.0-1')

'get_image_uri' method will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


In [60]:
container

'683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.0-1-cpu-py3'

In [61]:
# initialize hyperparameters
hyperparameters = {
        "max_depth":"5",
        "eta":"0.2",
        "gamma":"4",
        "min_child_weight":"6",
        "subsample":"0.7",
        "objective":"binary:logistic",
        "num_round":50
        }

In [62]:
hyperparameters

{'max_depth': '5',
 'eta': '0.2',
 'gamma': '4',
 'min_child_weight': '6',
 'subsample': '0.7',
 'objective': 'binary:logistic',
 'num_round': 50}

In [63]:
# construct a SageMaker estimator that calls the xgboost-container
estimator = sagemaker.estimator.Estimator(image_name=container, 
                                          hyperparameters=hyperparameters,
                                          role=sagemaker.get_execution_role(),
                                          train_instance_count=1, 
                                          train_instance_type='ml.m5.2xlarge', 
                                          train_volume_size=5, # 5 GB 
                                          output_path=OUTPUT_PATH,
                                          train_use_spot_instances=True,
                                          train_max_run=300,
                                          train_max_wait=600)
estimator

Parameter image_name will be renamed to image_uri in SageMaker Python SDK v2.


<sagemaker.estimator.Estimator at 0x7f880edfe160>

In [64]:
estimator.fit({'train': s3_input_train,'validation': s3_input_test})

2020-09-02 06:15:47 Starting - Starting the training job...
2020-09-02 06:15:50 Starting - Launching requested ML instances......
2020-09-02 06:17:04 Starting - Preparing the instances for training......
2020-09-02 06:18:17 Downloading - Downloading input data
2020-09-02 06:18:17 Training - Downloading the training image...
2020-09-02 06:18:44 Uploading - Uploading generated training model
2020-09-02 06:18:44 Completed - Training job completed
[34mINFO:sagemaker-containers:Imported framework sagemaker_xgboost_container.training[0m
[34mINFO:sagemaker-containers:Failed to parse hyperparameter objective value binary:logistic to Json.[0m
[34mReturning the value itself[0m
[34mINFO:sagemaker-containers:No GPUs detected (normal if no gpus installed)[0m
[34mINFO:sagemaker_xgboost_container.training:Running XGBoost Sagemaker in algorithm mode[0m
[34mINFO:root:Determined delimiter of CSV input is ','[0m
[34mINFO:root:Determined delimiter of CSV input is ','[0m
[34mINFO:root:Determ

#### Deploy ml model as endpoint

* Deploy the Trained Model using the Sagemaker API. Provide the instance type and instance count as required. Once the deployment is complete, the test data is used to test the deployed application.

* Once the Model is deployed an http endpoint is generated which is used by other applications such as a lambda function which is part of a streaming application or a synchronous application

In [65]:
xgb_predictor = estimator.deploy(initial_instance_count=1,instance_type='ml.m4.xlarge')
xgb_predictor

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.


---------------!

<sagemaker.predictor.RealTimePredictor at 0x7f880e9f4d68>

#### prediction for test dataset

In [66]:
from sagemaker.predictor import csv_serializer
test_data_array = test_data.drop(['y_no', 'y_yes'], axis=1).values #load the data into an array
xgb_predictor.content_type = 'text/csv' # set the data type for an inference
xgb_predictor.serializer = csv_serializer # set the serializer type
predictions = xgb_predictor.predict(test_data_array).decode('utf-8') # predict!
predictions_array = np.fromstring(predictions[1:], sep=',') # and turn the prediction into an array
print(predictions_array.shape)

(12357,)


In [67]:
predictions_array

array([0.02587362, 0.02734977, 0.08641617, ..., 0.68751293, 0.04483907,
       0.10314494])

In [68]:
cm = pd.crosstab(index=test_data['y_yes'], columns=np.round(predictions_array), rownames=['Observed'], colnames=['Predicted'])
tn = cm.iloc[0,0]; fn = cm.iloc[1,0]; tp = cm.iloc[1,1]; fp = cm.iloc[0,1]; p = (tp+tn)/(tp+tn+fp+fn)*100
print("\n{0:<20}{1:<4.1f}%\n".format("Overall Classification Rate: ", p))
print("{0:<15}{1:<15}{2:>8}".format("Predicted", "No Purchase", "Purchase"))
print("Observed")
print("{0:<15}{1:<2.0f}% ({2:<}){3:>6.0f}% ({4:<})".format("No Purchase", tn/(tn+fn)*100,tn, fp/(tp+fp)*100, fp))
print("{0:<16}{1:<1.0f}% ({2:<}){3:>7.0f}% ({4:<}) \n".format("Purchase", fn/(tn+fn)*100,fn, tp/(tp+fp)*100, tp))


Overall Classification Rate: 89.8%

Predicted      No Purchase    Purchase
Observed
No Purchase    91% (10809)    34% (152)
Purchase        9% (1103)     66% (293) 



#### Deleting Endpoints:

In [69]:
sagemaker.Session().delete_endpoint(xgb_predictor.endpoint)

In [70]:
bucket_to_delete = boto3.resource('s3').Bucket(BUCKET_NAME)
bucket_to_delete.objects.all().delete()

[{'ResponseMetadata': {'RequestId': 'FN7W4G9SFN2TET0Y',
   'HostId': 'WaJo6SkGgrq9DnaYhVABM2F3S12L7L22RP8CH/qIcQ0Twjdud2eUlKGDt6b6mwHnuxpO1B5TXXA=',
   'HTTPStatusCode': 200,
   'HTTPHeaders': {'x-amz-id-2': 'WaJo6SkGgrq9DnaYhVABM2F3S12L7L22RP8CH/qIcQ0Twjdud2eUlKGDt6b6mwHnuxpO1B5TXXA=',
    'x-amz-request-id': 'FN7W4G9SFN2TET0Y',
    'date': 'Wed, 02 Sep 2020 06:26:33 GMT',
    'connection': 'close',
    'content-type': 'application/xml',
    'transfer-encoding': 'chunked',
    'server': 'AmazonS3'},
   'RetryAttempts': 0},
  'Deleted': [{'Key': 'xgboost-inbulit-algo/test/test.csv'},
   {'Key': 'xgboost-inbulit-algo/train/train.csv'},
   {'Key': 'xgboost-inbulit-algo/sagemaker-xgboost-2020-09-02-06-15-45-867/output/model.tar.gz'}]}]