# Deploy a Serverless Model Server with Nuclio-KFServing
  --------------------------------------------------------------------

The following notebook demonstrates how to deploy an XGBoost model using nuclio + KFServing (a.k.a <b>Nuclio-serving</b>)

#### **notebook how-to's**
* Write and test model serving (KFServing) class in a notebook.
* Deploy the model server as a Nuclio-serving function.
* Invoke and test the serving function.

<a id='top'></a>
#### **steps**
**[define a new function and its dependencies](#define-function)**<br>
**[test the model serving class locally](#test-locally)**<br>
**[deploy our serving class using as a serverless function](#deploy)**<br>
**[test our model server using HTTP request](#test-model-server)**<br>

In [1]:
# nuclio: ignore
# if the nuclio-jupyter package is not installed run !pip install nuclio-jupyter
import nuclio 

<a id='define-function'></a>
### **define a new function and its dependencies**

In [3]:
%%nuclio cmd
pip install kfserving
pip install numpy
pip install dask-xgboost
pip install git+https://github.com/mlrun/mlrun@development

Requirement already up-to-date: kfserving in /User/.pythonlibs/lib/python3.6/site-packages (0.2.2.1)
Collecting git+https://github.com/mlrun/mlrun@development
  Cloning https://github.com/mlrun/mlrun (to revision development) to /tmp/pip-req-build-zq43fqef
Branch development set up to track remote branch development from origin.
Switched to a new branch 'development'
Collecting kubernetes<=9.0.0,>=8.0.0 (from kfp>=0.1.29->mlrun==0.4.1)
  Using cached https://files.pythonhosted.org/packages/00/f7/4f196c55f1c2713d3edc8252c4b45326306eef4dc10048f13916fe446e2b/kubernetes-9.0.0-py2.py3-none-any.whl
Building wheels for collected packages: mlrun
  Running setup.py bdist_wheel for mlrun ... [?25ldone
[?25h  Stored in directory: /tmp/pip-ephem-wheel-cache-rrtkbiub/wheels/4a/07/73/40a96ccddaf2d81ec84f70e16d94d6c0116915da784d6c6022
Successfully built mlrun
[31mkfserving 0.2.2.1 has requirement kubernetes>=10.0.1, but you'll have kubernetes 9.0.0 which is incompatible.[0m
Installing collected p

In [2]:
import kfserving
import os
import numpy as np
import xgboost as xgb

In [3]:
BOOSTER_FILE = "model.bst"

class XGBoostModel(kfserving.KFModel):
    def __init__(self, name: str, model_dir: str, booster: xgb.XGBModel = None):
        super().__init__(name)
        self.name = name
        self.model_dir = model_dir
        if not booster is None:
            self._booster = booster
            self.ready = True

    def load(self):
        model_file = os.path.join(
            kfserving.Storage.download(self.model_dir), BOOSTER_FILE)
        self._booster = xgb.Booster(model_file=model_file)
        self.ready = True

    def predict(self, body):
        try:
            # Use of list as input is deprecated see https://github.com/dmlc/xgboost/pull/3970
            dmatrix = xgb.DMatrix(body['instances'])
            result: xgb.DMatrix = self._booster.predict(dmatrix)
            return result.tolist()
        except Exception as e:
            raise Exception("Failed to predict %s" % e)


The following end-code annotation tells ```nuclio``` to stop parsing the notebook from this cell. _**Please do not remove this cell**_:

In [4]:
# nuclio: end-code

______________________________________________

<a id='test-locally'></a>
### **test the model serving class locally**
The class above can be tested locally. Just instantiate the class, `.load()` will load the model to a local dir.

> **Verify there is a `model.bst` file in the model_dir path (generated by the training notebook)**

In [5]:
model_dir = '/User/netops/models/'

In [22]:
my_server = XGBoostModel('netops-xgb-v1', model_dir=model_dir)
my_server.load()

[I 200109 11:57:57 storage:35] Copying contents of /User/netops/models/ to local



We can use the `.predict(body)` method to test the model.

In [23]:
my_server.predict({"instances":np.array(
       [[100., 100. , 50., 0., 89.53725756, 176.34950703, 33.57995254, 17.38931609, 75.67427177, 214.66368902, 21.13675177, 8.16638713], 
    [ 73.78647124, 6.56632202, 0.78634142, 299.04336301, 91.26215708, 192.80521924, 35.69171398, 16.92878047, 75.71293304, 215.07670863, 21.17323133, 8.17075569]])})

[0.5442115664482117, 0.4523093104362488]

<a id='deploy'></a>
### **deploy our serving class using as a serverless function**
in the following section we create a new model serving function which wraps our class , and specify model and other resources.

the `models` dict store model names and the assosiated model **dir** URL (the URL can start with `S3://` and other blob store options), the faster way is to use a shared file volume, we use `.apply(mount_v3io())` to attach a v3io (iguazio data fabric) volume to our function. By default v3io will mount the current user home into the `\User` function path.

**verify the model dir does contain a valid `model.bst` file**

In [24]:
from mlrun import new_model_server, mount_v3io
import requests

In [25]:
fn = new_model_server('netops-srv', 
                      models={'netops-xgb-v1': model_dir}, 
                      model_class='XGBoostModel')

fn.apply(mount_v3io()) 

<mlrun.runtimes.function.RemoteRuntime at 0x7f033eb13438>

In [26]:
addr = fn.deploy()

[mlrun] 2020-01-09 11:58:37,619 deploy started
[nuclio] 2020-01-09 11:58:38,704 (info) Building processor image
[nuclio] 2020-01-09 12:00:28,587 (info) Build complete
[nuclio] 2020-01-09 12:00:33,633 done creating iris-srv, function address: 3.18.11.15:30757


<a id="test-model-server"></a>
### **test our model server using HTTP request**


We invoke our model serving function using test data, the data vector is specified in the `instances` attribute.

In [27]:
# KFServing protocol event
event_data = {"instances":[[100., 100. , 50., 0., 89.53725756, 176.34950703, 33.57995254, 17.38931609, 75.67427177, 214.66368902, 21.13675177, 8.16638713], 
    [ 73.78647124, 6.56632202, 0.78634142, 299.04336301, 91.26215708, 192.80521924, 35.69171398, 16.92878047, 75.71293304, 215.07670863, 21.17323133, 8.17075569]]}

In [None]:
import json
resp = requests.put(addr + '/netops-xgb-v1/predict', json=json.dumps(event_data))
print(resp.text)

**[back to top](#top)**