## Invoke SageMaker Enpoint from outside of AWS environment using SageMaker SDK

Model used: XGBoost Bike Rental Prediction Trained in the XGBoost Lectures  
  
This example uses the IAM user: ml_user_predict. The user was setup in the housekeeping lecture of the course.  

Refer to the lecture: Configure IAM Users, Setup Command Line Interface (CLI)

Ensure xgboost-biketrain-v1 Endpoint is deployed before running this example  
  
To create an endpoint using SageMaker Console:  
1. Select "Models" under "Inference" in navigation pane
2. Search for model using this prefix: xgboost-biketrain-v1
3. Select the latest model and choose create endpoint
4. Specify endpoint name as: xgboost-biketrain-v1
5. Create a new endpoint configuration
6. Create a new endpoint
7. After this lab is completed, delete the endpoint to avoid unnecessary charges

In [1]:
# Install SageMaker 2.x version.
!pip install --upgrade sagemaker

Collecting sagemaker

ERROR: aiobotocore 1.1.0 has requirement botocore<1.17.45,>=1.17.44, but you'll have botocore 1.20.3 which is incompatible.



  Downloading sagemaker-2.24.3.tar.gz (396 kB)
Collecting boto3>=1.16.32
  Downloading boto3-1.17.3-py2.py3-none-any.whl (130 kB)
Collecting protobuf3-to-dict>=0.1.5
  Using cached protobuf3-to-dict-0.1.5.tar.gz (3.5 kB)
Collecting smdebug_rulesconfig==1.0.1
  Downloading smdebug_rulesconfig-1.0.1-py2.py3-none-any.whl (20 kB)
Collecting botocore<1.21.0,>=1.20.3
  Downloading botocore-1.20.3-py2.py3-none-any.whl (7.2 MB)
Building wheels for collected packages: sagemaker, protobuf3-to-dict
  Building wheel for sagemaker (setup.py): started
  Building wheel for sagemaker (setup.py): finished with status 'done'
  Created wheel for sagemaker: filename=sagemaker-2.24.3-py2.py3-none-any.whl size=560565 sha256=380396d661f347ed998a9ecbd7d0b437c1a418013ff9adb7d00dc070cd14f4fe
  Stored in directory: c:\users\sesa488017\appdata\local\pip\cache\wheels\df\6d\ae\530dafd51e74a4160a637727925f0a352c3853f1bc61b0a5f2
  Building wheel for protobuf3-to-dict (setup.py): started
  Building wheel for protobuf

In [11]:
import boto3
import sagemaker
import math
import dateutil

# SDK 2 serializers and deserializers
from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import JSONDeserializer

In [12]:
# Establish a session with AWS
# Specify credentials and region to be used for this session.
# We will use a ml_user_predict credentials that has limited privileges
boto_session = boto3.Session(profile_name='ml_user_predict',region_name='us-east-1')

In [13]:
sess = sagemaker.Session(boto_session=boto_session)

In [14]:
# Create a predictor and point to an existing endpoint

# Get Predictor using SageMaker SDK
# Specify Your Endpoint Name
endpoint_name = 'xgboost-biketrain-v1'

predictor = sagemaker.predictor.Predictor(endpoint_name=endpoint_name,
                                                 sagemaker_session=sess)

In [15]:
# We are sending data for inference in CSV format
predictor.serializer = CSVSerializer()

In [16]:
#datetime,season,holiday,workingday,weather,temp,atemp,humidity,windspeed
# Actual=562
sample_one = '2012-12-19 17:00:00,4,0,1,1,16.4,20.455,50,26.0027'
# Actual=569
sample_two = '2012-12-19 18:00:00,4,0,1,1,15.58,19.695,50,23.9994'
# Actual=4
sample_three = '2012-12-10 01:00:00,4,0,1,2,14.76,18.94,100,0'

In [17]:
# Raw Data Structure: 
# datetime,season,holiday,workingday,weather,temp,atemp,humidity,windspeed,casual,registered,count

# Model expects data in this format (it was trained with these features):
# season,holiday,workingday,weather,temp,atemp,humidity,windspeed,year,month,day,dayofweek,hour

def transform_data(data):
    features = data.split(',')
    
    # Extract year, month, day, dayofweek, hour
    dt = dateutil.parser.parse(features[0])

    features.append(str(dt.year))
    features.append(str(dt.month))
    features.append(str(dt.day))
    features.append(str(dt.weekday()))
    features.append(str(dt.hour))
    
    # Return the transformed data. skip datetime field
    return ','.join(features[1:])

In [18]:
print('Raw Data:\n',sample_one)
print('Transformed Data:\n',transform_data(sample_one))

Raw Data:
 2012-12-19 17:00:00,4,0,1,1,16.4,20.455,50,26.0027
Transformed Data:
 4,0,1,1,16.4,20.455,50,26.0027,2012,12,19,2,17


In [19]:
# Let's invoke prediction now
predictor.predict(transform_data(sample_one))

b'573.6282958984375'

In [20]:
# Actual Count is 562...but predicted is 6.3.

# Model was trained with log1p(count)
# So, we need to apply inverse transformation to get the actual count
# Predicted Count looks much better now
result = predictor.predict(transform_data(sample_one))
result = result.decode("utf-8")
print ('Predicted Count', math.expm1(float(result)))

Predicted Count 1.329240521840346e+249


In [22]:
# how to send multiple samples
result = predictor.predict([transform_data(sample_one), transform_data(sample_two)])

In [23]:
result.decode("utf-8")

'573.6282958984375,547.5216064453125'

In [24]:
# Batch Prediction
# Transform data and invoke prediction in specified batch sizes
def run_predictions(data, batch_size):
    predictions = []
    
    transformed_data = [transform_data(row.strip()) for row in data]
    
    for i in range(0, len(data), batch_size):
        
        print(i,i+batch_size)
        
        result = predictor.predict(transformed_data[i : i + batch_size])
        
        result = result.decode("utf-8")
        result = result.split(',')
        
        predictions += [math.expm1(float(r)) for r in result]
                
    return predictions

In [25]:
run_predictions([sample_one,sample_two,sample_three],10)

0 10


[1.329240521840346e+249, 6.103970162143148e+237, 33650.626876684866]

In [26]:
# Run a batch prediction on Test.CSV File
# Read the file content
data = []
with open('test.csv','r') as f:
    # skip header
    f.readline()
    # Read remaining lines
    data = f.readlines()

In [27]:
len(data)

6493

In [28]:
%%time
predictions = run_predictions(data,15)

0 15
15 30
30 45
45 60
60 75
75 90
90 105
105 120
120 135
135 150
150 165
165 180
180 195
195 210
210 225
225 240
240 255
255 270
270 285
285 300
300 315
315 330
330 345
345 360
360 375
375 390
390 405
405 420
420 435
435 450
450 465
465 480
480 495
495 510
510 525
525 540
540 555
555 570
570 585
585 600
600 615
615 630
630 645
645 660
660 675
675 690
690 705
705 720
720 735
735 750
750 765
765 780
780 795
795 810
810 825
825 840
840 855
855 870
870 885
885 900
900 915
915 930
930 945
945 960
960 975
975 990
990 1005
1005 1020
1020 1035
1035 1050
1050 1065
1065 1080
1080 1095
1095 1110
1110 1125
1125 1140
1140 1155
1155 1170
1170 1185
1185 1200
1200 1215
1215 1230
1230 1245
1245 1260
1260 1275
1275 1290
1290 1305
1305 1320
1320 1335
1335 1350
1350 1365
1365 1380
1380 1395
1395 1410
1410 1425
1425 1440
1440 1455
1455 1470
1470 1485
1485 1500
1500 1515
1515 1530
1530 1545
1545 1560
1560 1575
1575 1590
1590 1605
1605 1620
1620 1635
1635 1650
1650 1665
1665 1680
1680 1695
1695 1710
1710 17

OverflowError: math range error

In [20]:
len(predictions),len(data)

NameError: name 'predictions' is not defined

In [None]:
# Don't forget to delete the endpoint
# From SageMaker Console, Select "Endpoints" under Inference and Delete the Endpoint