## XGBoost example

This notebook uses XGBoost 1.0

In [1]:
import numpy as np
import pandas as pd

## Create data set

We will create a dataset containing 7 features. The goal is to predict if the conditions are as expected, 
or an alarm should be raised.

In [2]:
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=2000, n_features=7, n_informative=5, n_redundant=2, weights=[0.8, 0.2],
                           scale=[5, 5, 1, 1000, 200, 20, 10], shuffle=False, random_state=42)

In [3]:
X_data = pd.DataFrame(X, columns=['heat', 'heat2', 'dust', 'light', 'humidity', 'pressure', 'nitrogen_concentration'])
y_data = pd.DataFrame(y, columns=['ok'])
X_data.head()

Unnamed: 0,heat,heat2,dust,light,humidity,pressure,nitrogen_concentration
0,3.21464,-3.107601,2.663515,-599.783123,67.133968,-40.880974,22.175197
1,-0.354232,14.506926,-0.832169,-1501.497188,-42.529406,8.094005,-13.488557
2,4.380911,10.794966,1.566156,-1121.889667,-30.364312,-22.922713,10.072141
3,6.396967,1.283805,1.444611,-827.627508,220.112282,-37.722183,14.348605
4,13.324452,15.490906,1.597028,-673.973149,24.585876,-29.507437,19.428033


In [4]:
from sklearn.model_selection import train_test_split
import xgboost as xgb

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X_data, y_data, test_size=0.2, random_state=123)

In [6]:
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

## Create model + parameters and train

We will create a model that outputs 1 or 0 for a class, not the probabilities. After saving the model and uploading it to the Waylay platform, send predict requests to the model will also result in 1's and 0's.

In [7]:
# set xgboost params
param = {
    'max_depth': 3,
    'learning_rate': 0.1,
    'colsample_bytree': 0.3,
    'objective': 'binary:hinge'
}
num_round = 100  # the number of training iterations

In [8]:
bst = xgb.train(param, dtrain, num_round)

In [9]:
preds = bst.predict(dtest)
preds[:5]

array([0., 0., 0., 0., 1.], dtype=float32)

In [10]:
bst.save_model('model.bst')

In [11]:
from zipfile import ZipFile
from io import BytesIO
import requests
from requests.auth import HTTPBasicAuth
def upload_model(model_name, description, upload_url, api_key, api_secret):
    
    zipfile_name = 'model.zip'
    # Create a zip
    with ZipFile(zipfile_name, 'w') as zipfile:
       # Add multiple files to the zip
       zipfile.write('model.bst')
        
    with open(zipfile_name, 'rb') as f:
        upload_file = BytesIO(f.read())
        
    # upload to waylay
    resp = requests.post(upload_url,
                         files={"file": (zipfile_name, upload_file)},
                         data={"name": model_name, "framework": "xgboost", "description": description},
                         auth=HTTPBasicAuth(api_key, api_secret))

    return resp.json()

In [12]:
# https://docs.waylay.io/api/rest/#authentication
api_key = 'your api key'
api_secret = 'your api secret'
byom_url = 'https://byoml.waylay.io/models'
model_name = 'xgboost-demo-1'
upload_model(
    model_name, 
    'Model expecting 7 features and outputting the class it belongs to', 
    byom_url, 
    api_key, 
    api_secret
)

{'message': 'Model successfully uploaded'}

In [13]:
requests.post(byom_url + '/' + model_name + '/predict',
              json = {"instances": X_test[:5].to_dict('records')},
              auth=HTTPBasicAuth(api_key, api_secret)).json()

{'predictions': [0.0, 0.0, 0.0, 0.0, 1.0]}

And indeed we get 1's and 0's as expected. If we need the probabilities, we can simply change the objective during training.

## Create model that outputs probabilities

In [14]:
# set xgboost params
param = {
    'max_depth': 3,
    'learning_rate': 0.1,
    'colsample_bytree': 0.3,
    'objective': 'binary:logistic'
}
num_round = 100  # the number of training iterations

In [15]:
bst = xgb.train(param, dtrain, num_round)

In [16]:
preds = bst.predict(dtest)
preds[:5]

array([0.03388389, 0.00636836, 0.07393802, 0.41373524, 0.59628487],
      dtype=float32)

In [17]:
bst.save_model('model.bst')

In [18]:
model_name = 'xgboost-demo-2'
upload_model(
    model_name, 
    'Model expecting 7 features and outputting the probability it belongs to class 1', 
    byom_url, 
    api_key, 
    api_secret
)

{'message': 'Model successfully uploaded'}

In [19]:
requests.post(byom_url + '/' + model_name + '/predict',
              json = {"instances": X_test[:5].to_dict('records')},
              auth=HTTPBasicAuth(api_key, api_secret)).json()

{'predictions': [0.03388388827443123,
  0.0063683632761240005,
  0.07393801957368851,
  0.4137352406978607,
  0.5962848663330078]}

And indeed we get the same probabilities as our model outputs

## Using the XGBClassifier
Instead of directly using a booster we can also use the XGBClassifier as is shown in the example below. Keep in mind although the predict method will return 1's and 0's, the uploaded model will output the probabilities as the underlying Booster object is used. This can again be "solved", by using the `binary:hinge` objective. 

In [20]:
from xgboost import XGBClassifier

In [21]:
model = XGBClassifier(
    max_depth=3,
    learning_rate=0.1,
    colsample_bytree=0.3,
)

In [22]:
model.fit(X_train, y_train.values.ravel())

XGBClassifier(base_score=0.5, booster=None, colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=0.3, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints=None,
              learning_rate=0.1, max_delta_step=0, max_depth=3,
              min_child_weight=1, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=0, num_parallel_tree=1,
              objective='binary:logistic', random_state=0, reg_alpha=0,
              reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method=None,
              validate_parameters=False, verbosity=None)

In [23]:
model.predict(X_test[:5])

array([0, 0, 0, 0, 1])

In [24]:
model.save_model('model.bst')

In [25]:
model_name = 'xgboost-demo-3'

upload_model(
    model_name, 
    'Model expecting 7 features and outputting the probability it belongs to class 1', 
    byom_url, 
    api_key, 
    api_secret
)

{'message': 'Model successfully uploaded'}

In [26]:
requests.post(byom_url + '/' + model_name + '/predict',
              json = {"instances": X_test[:5].to_dict('records')},
              auth=HTTPBasicAuth(api_key, api_secret)).json()

{'predictions': [0.03388388827443123,
  0.0063683632761240005,
  0.07393801957368851,
  0.4137352406978607,
  0.5962848663330078]}

And indeed in this case the model output and uploaded model output are not the same.

## Json payload

All examples above use named input. This is possible if you correctly name your features in the pandas.DataFrame or DMatrix. 
The value belonging to the `instances` key in the json payload is a list of dicts. This is a readable and clear way to represent the data you are sending to the model.

In [27]:
{"instances": X_test[:5].to_dict('records')}

{'instances': [{'heat': 5.183476657027435,
   'heat2': 6.198644531071017,
   'dust': -1.219976895450299,
   'light': 107.67054117584141,
   'humidity': -465.49147434265274,
   'pressure': 38.75732300448353,
   'nitrogen_concentration': 4.022103340261581},
  {'heat': 3.5709824385447964,
   'heat2': 8.125604685184385,
   'dust': -2.478955463265079,
   'light': -2604.0885807693817,
   'humidity': -373.7320669125462,
   'pressure': 36.35173333809967,
   'nitrogen_concentration': -7.037420725073713},
  {'heat': 0.7194705291230102,
   'heat2': 2.1881363868111894,
   'dust': 0.5056000888381222,
   'light': -511.6016770257128,
   'humidity': 379.57152848272784,
   'pressure': -29.93469296853077,
   'nitrogen_concentration': -3.2944090003344955},
  {'heat': -0.7485073820496291,
   'heat2': 0.5923026689001221,
   'dust': -0.5468724982274361,
   'light': 1515.0726572135725,
   'humidity': -52.18054740864142,
   'pressure': 17.549078297445277,
   'nitrogen_concentration': -4.908612515506896},
  {'

Sending data to the model this way is convenient as you do not have to know the order of the features. If you would just use numpy arrays for training without naming anything, you can also call your model in the following way:

In [28]:
instances = X_test[:5].values.tolist()
instances

[[5.183476657027435,
  6.198644531071017,
  -1.219976895450299,
  107.67054117584141,
  -465.49147434265274,
  38.75732300448353,
  4.022103340261581],
 [3.5709824385447964,
  8.125604685184385,
  -2.478955463265079,
  -2604.0885807693817,
  -373.7320669125462,
  36.35173333809967,
  -7.037420725073713],
 [0.7194705291230102,
  2.1881363868111894,
  0.5056000888381222,
  -511.6016770257128,
  379.57152848272784,
  -29.93469296853077,
  -3.2944090003344955],
 [-0.7485073820496291,
  0.5923026689001221,
  -0.5468724982274361,
  1515.0726572135725,
  -52.18054740864142,
  17.549078297445277,
  -4.908612515506896],
 [1.834596974045903,
  -13.159726910751811,
  -0.006244944028804933,
  -1507.0812759276143,
  61.24565650426958,
  -15.100126382881927,
  11.40206238086906]]

In [29]:
requests.post(byom_url + '/' + model_name + '/predict',
              json = {"instances": instances},
              auth=HTTPBasicAuth(api_key, api_secret)).json()

{'predictions': [0.03388388827443123,
  0.0063683632761240005,
  0.07393801957368851,
  0.4137352406978607,
  0.5962848663330078]}

## Delete created models

In [30]:
models = ['xgboost-demo-1', 'xgboost-demo-2', 'xgboost-demo-3']
for model in models:
    print(
        requests.delete(byom_url + '/' + model, 
                        auth=HTTPBasicAuth(api_key, api_secret)).json()
    )

{'message': 'Model successfully deleted'}
{'message': 'Model successfully deleted'}
{'message': 'Model successfully deleted'}
