# Price recommender with airbnb London listings data

The API for the airbnb listings open dataset used in this regression analysis is available here: https://public.opendatasoft.com/explore/dataset/airbnb-reviews/api/. The goal of the notebook is the supervised machine learning task of using listing features and price to predict the review scores ratings in order to construct a price recommendation engine for airbnb hosts. We will use Amazon SageMaker hosting and software to this end and so we begin with the necessary imports...

In [52]:
import sagemaker
from sagemaker import get_execution_role
import os
import json
import boto3
import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler, OneHotEncoder

role = get_execution_role()
region = boto3.Session().region_name
smclient = boto3.Session().client('sagemaker')

bucket = sagemaker.Session().default_bucket()
prefix = 'airbnb-recommender-data'

Let's now download and unzip the listings open dataset from http://insideairbnb.com and inspect it...

In [2]:
!wget http://data.insideairbnb.com/united-kingdom/england/london/2020-04-14/data/listings.csv.gz
!gunzip listings.csv.gz

--2020-06-14 16:04:46--  http://data.insideairbnb.com/united-kingdom/england/london/2020-04-14/data/listings.csv.gz
Resolving data.insideairbnb.com (data.insideairbnb.com)... 52.216.165.202
Connecting to data.insideairbnb.com (data.insideairbnb.com)|52.216.165.202|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 78560292 (75M) [application/x-gzip]
Saving to: ‘listings.csv.gz’


2020-06-14 16:04:54 (10.2 MB/s) - ‘listings.csv.gz’ saved [78560292/78560292]



## Preprocessing the data

Let's inspect the data before preprocessing it...

In [3]:
listings_dataf = pd.read_csv('listings.csv') 
listings_dataf.head()

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,experiences_offered,neighborhood_overview,...,instant_bookable,is_business_travel_ready,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,13913,https://www.airbnb.com/rooms/13913,20200414180850,2020-04-16,Holiday London DB Room Let-on going,My bright double bedroom with a large window h...,"Hello Everyone, I'm offering my lovely double ...",My bright double bedroom with a large window h...,business,Finsbury Park is a friendly melting pot commun...,...,f,f,moderate,f,f,2,1,1,0,0.18
1,15400,https://www.airbnb.com/rooms/15400,20200414180850,2020-04-16,Bright Chelsea Apartment. Chelsea!,Lots of windows and light. St Luke's Gardens ...,Bright Chelsea Apartment This is a bright one...,Lots of windows and light. St Luke's Gardens ...,romantic,It is Chelsea.,...,t,f,strict_14_with_grace_period,t,t,1,1,0,0,0.71
2,17402,https://www.airbnb.com/rooms/17402,20200414180850,2020-04-15,Superb 3-Bed/2 Bath & Wifi: Trendy W1,You'll have a wonderful stay in this superb mo...,"This is a wonderful very popular beautiful, sp...",You'll have a wonderful stay in this superb mo...,none,"Location, location, location! You won't find b...",...,t,f,strict_14_with_grace_period,f,f,15,15,0,0,0.38
3,17506,https://www.airbnb.com/rooms/17506,20200414180850,2020-04-16,Boutique Chelsea/Fulham Double bed 5-star ensuite,Enjoy a chic stay in this elegant but fully mo...,Enjoy a boutique London townhouse bed and brea...,Enjoy a chic stay in this elegant but fully mo...,business,Fulham is 'villagey' and residential – a real ...,...,f,f,strict_14_with_grace_period,f,f,2,0,2,0,
4,25023,https://www.airbnb.com/rooms/25023,20200414180850,2020-04-15,All-comforts 2-bed flat near Wimbledon tennis,"Large, all comforts, 2-bed flat; first floor; ...",10 mins walk to Southfields tube and Wimbledon...,"Large, all comforts, 2-bed flat; first floor; ...",none,This is a leafy residential area with excellen...,...,t,f,moderate,f,f,1,1,0,0,0.7


In [4]:
for n, c, d in zip(range(len(listings_dataf.columns)),listings_dataf.columns,listings_dataf.dtypes): print(n,c,d)

0 id int64
1 listing_url object
2 scrape_id int64
3 last_scraped object
4 name object
5 summary object
6 space object
7 description object
8 experiences_offered object
9 neighborhood_overview object
10 notes object
11 transit object
12 access object
13 interaction object
14 house_rules object
15 thumbnail_url float64
16 medium_url float64
17 picture_url object
18 xl_picture_url float64
19 host_id int64
20 host_url object
21 host_name object
22 host_since object
23 host_location object
24 host_about object
25 host_response_time object
26 host_response_rate object
27 host_acceptance_rate object
28 host_is_superhost object
29 host_thumbnail_url object
30 host_picture_url object
31 host_neighbourhood object
32 host_listings_count float64
33 host_total_listings_count float64
34 host_verifications object
35 host_has_profile_pic object
36 host_identity_verified object
37 street object
38 neighbourhood object
39 neighbourhood_cleansed object
40 neighbourhood_group_cleansed float64
41 city obje

Now to perform the data preprocessing...

In [5]:
validation_split_ratio = 0.2
test_split_ratio = 0.1

columns_of_interest = ['review_scores_rating','price', 'minimum_nights',
                       'maximum_nights', 'number_of_reviews', 'accommodates', 'guests_included',
                       'bathrooms', 'bedrooms', 'host_total_listings_count', 'host_is_superhost',
                       'host_identity_verified', 'neighbourhood_cleansed',
                       'is_location_exact', 'property_type', 'room_type', 'bed_type',
                       'requires_license', 'instant_bookable', 'cancellation_policy']

# Defined without the label: 'review_scores_rating'
numeric_column_names = ['price', 'minimum_nights',
                        'maximum_nights', 'number_of_reviews', 'accommodates', 'guests_included',
                        'bathrooms', 'bedrooms', 'host_total_listings_count']

categorical_column_names = ['host_is_superhost','host_identity_verified', 'neighbourhood_cleansed',
                            'is_location_exact', 'property_type', 'room_type', 'bed_type',
                            'requires_license', 'instant_bookable', 'cancellation_policy']


print('Data shape: {}'.format(listings_dataf.shape))
    
# Reduce to all listings with review scores
listings_dataf = listings_dataf[listings_dataf['review_scores_rating'].notnull()]

# Reduce further (only a few more) to all listings with a 'bathrooms' record
listings_dataf['bathrooms'] = pd.to_numeric(listings_dataf['bathrooms'],errors='coerce')
listings_dataf['bathrooms'] = listings_dataf['bathrooms'].notnull().astype(int)

# Reduce further (only a few more) to all listings with a 'bedrooms' record
listings_dataf['bedrooms'] = pd.to_numeric(listings_dataf['bedrooms'],errors='coerce')
listings_dataf['bedrooms'] = listings_dataf['bedrooms'].notnull().astype(int)

# Reduce further (only 2 more) to all listings with a 'host_is_superhost' record
listings_dataf = listings_dataf[listings_dataf['host_is_superhost'].notnull()]

# Clean price data by removing dollar signs and commas
listings_dataf['price'] = listings_dataf['price'].str.replace('$','').str.replace(',','').astype(float)

# Convert 'total_listings_count' to more sensible format
listings_dataf['host_total_listings_count'] = listings_dataf['host_total_listings_count'].astype(int)
    
# Remove crazy entries
listings_dataf = listings_dataf[listings_dataf['property_type']!='Minsu (Taiwan)']
listings_dataf = listings_dataf[listings_dataf['property_type']!='Island']
listings_dataf = listings_dataf[listings_dataf['property_type']!='Parking Space']
    
print('Data shape post-cleaning: {}'.format(listings_dataf.shape))
    
print('Splitting data into train, validation and test with ratios {}, {} and {}'.format(1.0-\
      validation_split_ratio-test_split_ratio,validation_split_ratio,test_split_ratio))

# Perform the train, validation and test split
train_data, validation_data, test_data = np.split(listings_dataf.sample(frac=1, random_state=0), \
                                         [int((1.0-validation_split_ratio-test_split_ratio) * len(listings_dataf)), \
                                          int((1.0-test_split_ratio)*len(listings_dataf))]) 

print('Training data shape post-split: {}'.format(train_data.shape))
print('Validation data shape post-split: {}'.format(validation_data.shape))
print('Test data shape post-split: {}'.format(test_data.shape))

print('Running preprocessing and feature engineering transformations')

# Preprocessing each of the relevant columns
ss = StandardScaler()
ohe = OneHotEncoder(handle_unknown='error', sparse=False)

numeric_train_preprocessed = ss.fit_transform(train_data[numeric_column_names])
numeric_train_preprocessed = pd.DataFrame(numeric_train_preprocessed, columns=numeric_column_names)

numeric_validation_preprocessed = ss.transform(validation_data[numeric_column_names])
numeric_validation_preprocessed = pd.DataFrame(numeric_validation_preprocessed, columns=numeric_column_names)

numeric_test_preprocessed = ss.transform(test_data[numeric_column_names])
numeric_test_preprocessed = pd.DataFrame(numeric_test_preprocessed, columns=numeric_column_names)

categorical_train_preprocessed = ohe.fit_transform(train_data[categorical_column_names])
categorical_train_preprocessed = pd.DataFrame(categorical_train_preprocessed, columns=ohe.get_feature_names())

categorical_validation_preprocessed = ohe.transform(validation_data[categorical_column_names])
categorical_validation_preprocessed = pd.DataFrame(categorical_validation_preprocessed, columns=ohe.get_feature_names())

categorical_test_preprocessed = ohe.transform(test_data[categorical_column_names])
categorical_test_preprocessed = pd.DataFrame(categorical_test_preprocessed, columns=ohe.get_feature_names())

# Concatenate the numeric and categorical features
train_preprocessed = pd.concat([numeric_train_preprocessed,categorical_train_preprocessed],axis=1)
validation_preprocessed = pd.concat([numeric_validation_preprocessed,categorical_validation_preprocessed],axis=1)
test_preprocessed = pd.concat([numeric_test_preprocessed,categorical_test_preprocessed],axis=1)
    
print('Training data shape post-preprocessing: {}'.format(train_preprocessed.shape))
print('Validation data shape post-preprocessing: {}'.format(validation_preprocessed.shape))
print('Test data shape post-preprocessing: {}'.format(test_preprocessed.shape))
    
# Write sets to .csv files    
pd.concat([train_data['review_scores_rating'],train_preprocessed],axis=1).to_csv('train.csv',index=False, header=False)
pd.concat([validation_data['review_scores_rating'],validation_preprocessed],axis=1).to_csv('validation.csv',\
                                                                                           index=False, header=False)
pd.concat([test_data['review_scores_rating'],test_preprocessed],axis=1).to_csv('test.csv',index=False, header=False)

# Upload train and validation sets to the S3 bucket
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, \
                                              '/train/train.csv')).upload_file('train.csv')
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, \
                                              '/validation/validation.csv')).upload_file('validation.csv')

Data shape: (86358, 106)
Data shape post-cleaning: (63196, 106)
Splitting data into train, validation and test with ratios 0.7000000000000001, 0.2 and 0.1
Training data shape post-split: (44237, 106)
Validation data shape post-split: (12639, 106)
Test data shape post-split: (6320, 106)
Running preprocessing and feature engineering transformations


  return self.partial_fit(X, y)
  return self.fit(X, **fit_params).transform(X)


Training data shape post-preprocessing: (44237, 101)
Validation data shape post-preprocessing: (12639, 101)
Test data shape post-preprocessing: (6320, 101)


## Training the model and tuning its hyperparameters

We now begin training and tuning the regression model: XGBoost hyperparameters tuned by Bayesian optimisation... 

In [6]:
from time import gmtime, strftime, sleep
from sagemaker.amazon.amazon_estimator import get_image_uri

training_image = get_image_uri(region, 'xgboost', '0.90-1')
tuning_job_name = 'xgboost-tuningjob-' + strftime("%d-%H-%M-%S", gmtime())

s3_input_train = 's3://{}/{}/train'.format(bucket, prefix)
s3_input_validation ='s3://{}/{}/validation/'.format(bucket, prefix)

print(tuning_job_name)

tuning_job_config = {
    "ParameterRanges": {
      "CategoricalParameterRanges": [],
      "ContinuousParameterRanges": [
        {
          "MaxValue": "1",
          "MinValue": "0",
          "Name": "eta",
        },
        {
          "MaxValue": "10",
          "MinValue": "1",
          "Name": "min_child_weight",
        },
        {
          "MaxValue": "2",
          "MinValue": "0",
          "Name": "alpha",            
        }
      ],
      "IntegerParameterRanges": [
        {
          "MaxValue": "10",
          "MinValue": "1",
          "Name": "max_depth",
        }
      ]
    },
    "ResourceLimits": {
      "MaxNumberOfTrainingJobs": 20,
      "MaxParallelTrainingJobs": 3
    },
    "Strategy": "Bayesian",
    "HyperParameterTuningJobObjective": {
      "MetricName": "validation:rmse",
      "Type": "Minimize"
    }
  }
    
training_job_definition = {
    "AlgorithmSpecification": {
      "TrainingImage": training_image,
      "TrainingInputMode": "File"
    },
    "InputDataConfig": [
      {
        "ChannelName": "train",
        "CompressionType": "None",
        "ContentType": "csv",
        "DataSource": {
          "S3DataSource": {
            "S3DataDistributionType": "FullyReplicated",
            "S3DataType": "S3Prefix",
            "S3Uri": s3_input_train
          }
        }
      },
      {
        "ChannelName": "validation",
        "CompressionType": "None",
        "ContentType": "csv",
        "DataSource": {
          "S3DataSource": {
            "S3DataDistributionType": "FullyReplicated",
            "S3DataType": "S3Prefix",
            "S3Uri": s3_input_validation
          }
        }
      }
    ],
    "OutputDataConfig": {
      "S3OutputPath": "s3://{}/{}/output".format(bucket,prefix)
    },
    "ResourceConfig": {
      "InstanceCount": 1,
      "InstanceType": "ml.m4.xlarge",
      "VolumeSizeInGB": 10
    },
    "RoleArn": role,
    "StaticHyperParameters": {
      "eval_metric": "rmse",
      "num_round": "100",
      "objective": "reg:linear",
      "rate_drop": "0.3",
      "tweedie_variance_power": "1.4"
    },
    "StoppingCondition": {
      "MaxRuntimeInSeconds": 3600
    }
}


smclient.create_hyper_parameter_tuning_job(HyperParameterTuningJobName = tuning_job_name,
                                            HyperParameterTuningJobConfig = tuning_job_config,
                                            TrainingJobDefinition = training_job_definition)

xgboost-tuningjob-15-15-08-37


{'HyperParameterTuningJobArn': 'arn:aws:sagemaker:eu-west-2:232666250507:hyper-parameter-tuning-job/xgboost-tuningjob-15-15-08-37',
 'ResponseMetadata': {'RequestId': '85a0d4e8-4bd4-41b8-a4a7-bd3fa2962b40',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '85a0d4e8-4bd4-41b8-a4a7-bd3fa2962b40',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '130',
   'date': 'Mon, 15 Jun 2020 15:08:37 GMT'},
  'RetryAttempts': 0}}

Checking status of the job...

In [53]:
tuning_job_result = smclient.describe_hyper_parameter_tuning_job(HyperParameterTuningJobName=tuning_job_name)
print(tuning_job_result['HyperParameterTuningJobStatus'])

status = tuning_job_result['HyperParameterTuningJobStatus']
if status != 'Completed':
    print('Reminder: the tuning job has not been completed.')
    
job_count = tuning_job_result['TrainingJobStatusCounters']['Completed']
print("%d training jobs have completed" % job_count)
    
is_minimize = (tuning_job_result['HyperParameterTuningJobConfig']['HyperParameterTuningJobObjective']['Type']!='Minimize')
objective_name = tuning_job_result['HyperParameterTuningJobConfig']['HyperParameterTuningJobObjective']['MetricName']

Completed
20 training jobs have completed


In [54]:
from pprint import pprint
if tuning_job_result.get('BestTrainingJob',None):
    print("Best model found so far:")
    pprint(tuning_job_result['BestTrainingJob'])
else:
    print("No training jobs have reported results yet.")

Best model found so far:
{'CreationTime': datetime.datetime(2020, 6, 15, 15, 30, 2, tzinfo=tzlocal()),
 'FinalHyperParameterTuningJobObjectiveMetric': {'MetricName': 'validation:rmse',
                                                 'Value': 35.304100036621094},
 'ObjectiveStatus': 'Succeeded',
 'TrainingEndTime': datetime.datetime(2020, 6, 15, 15, 33, 27, tzinfo=tzlocal()),
 'TrainingJobArn': 'arn:aws:sagemaker:eu-west-2:232666250507:training-job/xgboost-tuningjob-15-15-08-37-020-e28087c7',
 'TrainingJobName': 'xgboost-tuningjob-15-15-08-37-020-e28087c7',
 'TrainingJobStatus': 'Completed',
 'TrainingStartTime': datetime.datetime(2020, 6, 15, 15, 32, 6, tzinfo=tzlocal()),
 'TunedHyperParameters': {'alpha': '1.2608655936557942',
                          'eta': '0.0191498656440885',
                          'max_depth': '9',
                          'min_child_weight': '5.977332053687228'}}


In [55]:
tuner = sagemaker.HyperparameterTuningJobAnalytics(tuning_job_name)

full_df = tuner.dataframe()

if len(full_df) > 0:
    df = full_df[full_df['FinalObjectiveValue'] > -float('inf')]
    if len(df) > 0:
        df = df.sort_values('FinalObjectiveValue', ascending=is_minimize)
        print("Number of training jobs with valid objective: %d" % len(df))
        print({"lowest":min(df['FinalObjectiveValue']),"highest": max(df['FinalObjectiveValue'])})
        pd.set_option('display.max_colwidth', -1)  # Don't truncate TrainingJobName        
    else:
        print("No training jobs have reported valid results yet.")
        
df

Number of training jobs with valid objective: 20
{'lowest': 35.304100036621094, 'highest': 68.12300109863281}


Unnamed: 0,FinalObjectiveValue,TrainingElapsedTimeSeconds,TrainingEndTime,TrainingJobName,TrainingJobStatus,TrainingStartTime,alpha,eta,max_depth,min_child_weight
11,68.123001,72.0,2020-06-15 15:19:34+00:00,xgboost-tuningjob-15-15-08-37-009-49363852,Completed,2020-06-15 15:18:22+00:00,0.123576,0.0,6.0,3.475196
12,67.935204,68.0,2020-06-15 15:19:24+00:00,xgboost-tuningjob-15-15-08-37-008-7e10139c,Completed,2020-06-15 15:18:16+00:00,0.123576,3.1e-05,6.0,3.475196
10,67.067101,71.0,2020-06-15 15:22:26+00:00,xgboost-tuningjob-15-15-08-37-010-23cf4ab3,Completed,2020-06-15 15:21:15+00:00,0.13088,0.000173,6.0,3.475196
5,41.874298,79.0,2020-06-15 15:26:55+00:00,xgboost-tuningjob-15-15-08-37-015-2ad46e0d,Completed,2020-06-15 15:25:36+00:00,0.956217,0.652596,9.0,2.611297
19,39.1343,104.0,2020-06-15 15:12:21+00:00,xgboost-tuningjob-15-15-08-37-001-1d9c43c2,Completed,2020-06-15 15:10:37+00:00,0.941671,0.755797,5.0,4.78815
9,38.5597,74.0,2020-06-15 15:22:54+00:00,xgboost-tuningjob-15-15-08-37-011-e511daf0,Completed,2020-06-15 15:21:40+00:00,1.652865,0.576824,5.0,2.84178
18,37.900299,97.0,2020-06-15 15:12:29+00:00,xgboost-tuningjob-15-15-08-37-002-e14b7c88,Completed,2020-06-15 15:10:52+00:00,0.683612,0.442583,3.0,7.691104
6,37.897301,66.0,2020-06-15 15:26:02+00:00,xgboost-tuningjob-15-15-08-37-014-eef17ac4,Completed,2020-06-15 15:24:56+00:00,0.198954,0.796806,2.0,7.140136
13,37.858501,70.0,2020-06-15 15:19:02+00:00,xgboost-tuningjob-15-15-08-37-007-d2781cc0,Completed,2020-06-15 15:17:52+00:00,1.330485,0.124642,1.0,8.942542
7,37.856899,65.0,2020-06-15 15:26:05+00:00,xgboost-tuningjob-15-15-08-37-013-3ddba973,Completed,2020-06-15 15:25:00+00:00,0.313517,0.06672,1.0,8.772979


Plot training and tuning progress...

In [57]:
import bokeh
import bokeh.io
bokeh.io.output_notebook()
from bokeh.plotting import figure, show
from bokeh.models import HoverTool

class HoverHelper():

    def __init__(self, tuning_analytics):
        self.tuner = tuning_analytics

    def hovertool(self):
        tooltips = [
            ("FinalObjectiveValue", "@FinalObjectiveValue"),
            ("TrainingJobName", "@TrainingJobName"),
        ]
        for k in self.tuner.tuning_ranges.keys():
            tooltips.append( (k, "@{%s}" % k) )

        ht = HoverTool(tooltips=tooltips)
        return ht

    def tools(self, standard_tools='pan,crosshair,wheel_zoom,zoom_in,zoom_out,undo,reset'):
        return [self.hovertool(), standard_tools]

hover = HoverHelper(tuner)

p = figure(plot_width=900, plot_height=400, tools=hover.tools(), x_axis_type='datetime')
p.circle(source=df, x='TrainingStartTime', y='FinalObjectiveValue')
show(p)

The best model was...

In [58]:
tuning_job_result.get('BestTrainingJob',None)

{'TrainingJobName': 'xgboost-tuningjob-15-15-08-37-020-e28087c7',
 'TrainingJobArn': 'arn:aws:sagemaker:eu-west-2:232666250507:training-job/xgboost-tuningjob-15-15-08-37-020-e28087c7',
 'CreationTime': datetime.datetime(2020, 6, 15, 15, 30, 2, tzinfo=tzlocal()),
 'TrainingStartTime': datetime.datetime(2020, 6, 15, 15, 32, 6, tzinfo=tzlocal()),
 'TrainingEndTime': datetime.datetime(2020, 6, 15, 15, 33, 27, tzinfo=tzlocal()),
 'TrainingJobStatus': 'Completed',
 'TunedHyperParameters': {'alpha': '1.2608655936557942',
  'eta': '0.0191498656440885',
  'max_depth': '9',
  'min_child_weight': '5.977332053687228'},
 'FinalHyperParameterTuningJobObjectiveMetric': {'MetricName': 'validation:rmse',
  'Value': 35.304100036621094},
 'ObjectiveStatus': 'Succeeded'}

## Hosting the best model from tuning for predictions

Registering the best model with hosting and creating endpoint configuration...

In [63]:
from time import gmtime, strftime

container = get_image_uri(region, 'xgboost', '0.90-1')
model_name = tuning_job_result.get('BestTrainingJob',None)['TrainingJobName']

info = smclient.describe_training_job(TrainingJobName=model_name)
model_data = info['ModelArtifacts']['S3ModelArtifacts']

primary_container = {
    'Image': container,
    'ModelDataUrl': model_data
}

create_model_response = smclient.create_model(
    ModelName = model_name,
    ExecutionRoleArn = role,
    PrimaryContainer = primary_container)

print(create_model_response['ModelArn'])

endpoint_config_name = 'airbnb-recommender-EndpointConfig-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print(endpoint_config_name)

create_endpoint_config_response = smclient.create_endpoint_config(
    EndpointConfigName = endpoint_config_name,
    ProductionVariants=[{
        'InstanceType':'ml.m4.xlarge',
        'InitialVariantWeight':1,
        'InitialInstanceCount':1,
        'ModelName':model_name,
        'VariantName':'AllTraffic'}])

print("Endpoint Config Arn: " + create_endpoint_config_response['EndpointConfigArn'])

arn:aws:sagemaker:eu-west-2:232666250507:model/xgboost-tuningjob-15-15-08-37-020-e28087c7
airbnb-recommender-EndpointConfig-2020-06-15-15-43-17
Endpoint Config Arn: arn:aws:sagemaker:eu-west-2:232666250507:endpoint-config/airbnb-recommender-endpointconfig-2020-06-15-15-43-17


Now creating the endpoint...

In [None]:
endpoint_name = 'airbnb-recommender-Endpoint-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print(endpoint_name)
create_endpoint_response = smclient.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name)
print(create_endpoint_response['EndpointArn'])

resp = smclient.describe_endpoint(EndpointName=endpoint_name)
status = resp['EndpointStatus']
print("Status: " + status)

while status=='Creating':
    time.sleep(60)
    resp = smclient.describe_endpoint(EndpointName=endpoint_name)
    status = resp['EndpointStatus']
    print("Status: " + status)

print("Arn: " + resp['EndpointArn'])
print("Status: " + status)

## Testing the endpoint against the test set

First we will generate predictions for a single user-input datapoint...

Following: https://github.com/awslabs/amazon-sagemaker-examples/blob/d5681a07611ae29567355b60b2f22500b561218b/advanced_functionality/xgboost_bring_your_own_model/xgboost_bring_your_own_model.ipynb

In [None]:
runtime_client = boto3.client('runtime.sagemaker', region_name=region)



Delete endpoint because I'm not made of money...

In [None]:
smclient.delete_endpoint(EndpointName=endpoint_name)