# HSEngine Tutorial
HSEngine, or **H**orizontal **S**erving **Engine** or HSE, is a python package (`hsengine`) that provides a set of classes/APIs to help people to perform the end-to-end workflow of setting up a [KFServing](https://github.com/kubeflow/kfserving) service from a horizontally-trained model in [FATE](https://github.com/FederatedAI/FATE).

## Prerequisites
* HSEngine by default expects that a KFServing installation is presented in the target kubernetes cluster.
* Below example runs in a Jupyter Notebook pod in the target kubernetes cluster. Note HSEngine is not limited to this setup, as will be discussed later.

## High-level API: HSEngine
The high-level class is called `HSEngine`. It encapsulate the complexity of all the steps. In the simplest, most typical case, a user only needs to provide the serving service name wanted, the service version string and the trained model version string recorded in FATE.

In [2]:
# un-comment this to get more detailed logs
# import logging
# logger = logging.getLogger()
# logger.setLevel(logging.DEBUG)

SERVICE_NAME = "sklearn-lr"
VERSION = "1.0"

from hsengine import HSEngine

h = HSEngine(SERVICE_NAME,
             VERSION, 
             model_version="2021040701544681092419")

[I 210413 06:53:45 fate_client:93] Downloading FATE model with role: guest, party_id: 3333, model_id: arbiter-3333#guest-3333#host-3333#model, model_version: 2021040701544681092419
[I 210413 06:53:45 fate_client:113] FATE model is saved to: /tmp/tmpc3hw44qc
[I 210413 06:53:46 fate_model:133] FATEModel initialized with component type: HomoLR and target framework: sklearn


And then call the `run` method to launch everything. By default, a KFServing InferenceService will be setup in the default kubernetes cluster - the one that current Jupyter Notebook is running in. If you are not running this code in a kubernetes cluster, then you need to provide more parameters like the target cluster configuration file to HSEngine, refer to the later sections about low-level APIs and the complete HSEngine class definition for details.

In [3]:
h.run()

[I 210413 06:53:46 base:113] Preparing model storage and InferenceService spec...
[I 210413 06:53:46 minio:158] Uploaded model objects into path: s3://models/sklearn-lr/1.0
[I 210413 06:53:46 base:102] Prepared model with uri: s3://models/sklearn-lr/1.0
[I 210413 06:53:46 base:145] InferenceService spec ready
[I 210413 06:53:46 base:121] Creating InferenceService sklearn-lr...
[I 210413 06:53:48 base:131] InferenceService: sklearn-lr created. To check service readiness, call this deployer's status(), wait() methods, or use KFServing query APIs


{'apiVersion': 'serving.kubeflow.org/v1beta1',
 'kind': 'InferenceService',
 'metadata': {'annotations': {'hsengine.dev/uuid': 'bf63fab5-742e-4394-9027-3ff487b1df7e'},
  'creationTimestamp': '2021-04-13T06:53:47Z',
  'generation': 1,
  'managedFields': [{'apiVersion': 'serving.kubeflow.org/v1beta1',
    'fieldsType': 'FieldsV1',
    'fieldsV1': {'f:metadata': {'f:annotations': {'.': {},
       'f:hsengine.dev/uuid': {}}},
     'f:spec': {'.': {},
      'f:predictor': {'.': {},
       'f:sklearn': {'.': {}, 'f:protocolVersion': {}, 'f:storageUri': {}}}}},
    'manager': 'OpenAPI-Generator',
    'operation': 'Update',
    'time': '2021-04-13T06:53:46Z'}],
  'name': 'sklearn-lr',
  'namespace': 'fate-3333',
  'resourceVersion': '5138823',
  'uid': '8daf8ece-cbcd-4093-b4e7-e1f20eaab59b'},
 'spec': {'predictor': {'sklearn': {'name': 'kfserving-container',
    'protocolVersion': 'v1',
    'resources': {'limits': {'cpu': '1', 'memory': '2Gi'},
     'requests': {'cpu': '1', 'memory': '2Gi'}},


We can then call the embedded deployer's (will be explained later) `wait` method to monitor the KFServing service status.

In [4]:
h.deployer.wait()

NAME                 READY      PREV                      LATEST                    URL                                                              
sklearn-lr           Unknown                                                                                                                         
sklearn-lr           Unknown                                                                                                                         
sklearn-lr           Unknown    0                         100                                                                                        
sklearn-lr           Unknown    0                         100                                                                                        
sklearn-lr           Unknown    0                         100                                                                                        
sklearn-lr           Unknown    0                         100                                       

The `HSEngine` contains a `trained_model`, a `converter` and a `deployer` object that work together for streamlining the workflow. Again, refer to the low-level APIs discussed below.

In [5]:
vars(h)

{'service_name': 'sklearn-lr',
 'version': '1.0',
 'converter_kwargs': {},
 'deployer_kwargs': {},
 'converter': <hsengine.converters.fate.fate_converter.FATEConverter at 0x7f8f9be8ffd0>,
 'deployer': <hsengine.deployers.kfserving.sklearn.SKLearnV1KFDeployer at 0x7f8f9be8f290>,
 'trained_model': <hsengine.trained_model.fate_model.FATEModel at 0x7f909c5a6f10>,
 'converted_model': <hsengine.converters.converter_base.ConvertedModel at 0x7f8f9be8fb10>}

### Test the sevice

Next couple cells is not related to HSEngine implementation. They compose a http request from testing data, and send the request to the serving service.

In [6]:
# read data for prediction

TEST_DATA_FILE = "/data/projects/fate/examples/data/breast_homo_guest.csv"
import pandas
data = pandas.read_csv(TEST_DATA_FILE).sort_values(by=['id'])
X = data.values[:,2:]
X.shape

from sklearn.preprocessing import StandardScaler
X = StandardScaler().fit(X).transform(X)
X.shape

(227, 30)

Please change the values of `SERVICE_HOSTNAME`, `INGRESS_HOST`, `INGRESS_PORT` based on your cluster setup.

In [7]:
# verify the serving service
import requests

SERVICE_HOSTNAME = '{}.fate-3333.example.com'.format(SERVICE_NAME)
INGRESS_HOST="10.182.130.98"
INGRESS_PORT="30273"

URL = f'http://{INGRESS_HOST}:{INGRESS_PORT}/v1/models/{SERVICE_NAME}:predict'
    
# Use the X
INPUT={'instances':X.tolist()}
res = requests.post(URL, json=INPUT, headers={'Host': SERVICE_HOSTNAME})
res

<Response [200]>

The classification results are returned in the predict response

In [8]:
res.json()

{'predictions': [0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  1.0,
  0.0,
  0.0,
  0.0,
  0.0,
  1.0,
  0.0,
  0.0,
  1.0,
  1.0,
  1.0,
  1.0,
  0.0,
  1.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  1.0,
  0.0,
  0.0,
  0.0,
  1.0,
  0.0,
  0.0,
  1.0,
  1.0,
  0.0,
  1.0,
  0.0,
  1.0,
  1.0,
  1.0,
  1.0,
  0.0,
  0.0,
  0.0,
  1.0,
  1.0,
  0.0,
  0.0,
  0.0,
  0.0,
  1.0,
  0.0,
  1.0,
  1.0,
  1.0,
  1.0,
  1.0,
  1.0,
  1.0,
  0.0,
  1.0,
  1.0,
  0.0,
  1.0,
  1.0,
  0.0,
  1.0,
  1.0,
  0.0,
  0.0,
  0.0,
  1.0,
  0.0,
  1.0,
  0.0,
  1.0,
  1.0,
  0.0,
  0.0,
  0.0,
  0.0,
  1.0,
  0.0,
  0.0,
  1.0,
  1.0,
  0.0,
  0.0,
  1.0,
  1.0,
  0.0,
  1.0,
  0.0,
  0.0,
  1.0,
  0.0,
  1.0,
  1.0,
  1.0,
  1.0,
  1.0,
  1.0,
  1.0,
  1.0,
  1.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  1.0,
  1.0,
  1.0,
  1.0,
  0.0,
  1.0,
  0.0,
  0.0,
  1.0,
  1.0,
  1.0,
  1.0,
  1.0,
  1.0,
  1.0,
  1.0,
  0.0,
  1.0,
  1.0,
  1.0,
  1.0,
  1.0,
  1.0,
  1.0,
  1.0,
  1.0,
  0.0,
  1.0,
  0.0,
  1.0,
  1.0,
  0.0

Now, if no longer needed, we can use the deployer's destroy method to shutdown the service

In [9]:
h.deployer.destroy()

[I 210413 06:54:23 base:155] InferenceService sklearn-lr is deleted


An end-to-end workflow of `HSEngine` is relatively straightforward. As the log lines in above example indicate, the "engine" fetches the FATE model files from FATE server (the fate flow server), then converts the model to a model object of a common framework, and finally uses kubernetes and KFServing APIs to setup a serving service in a cluster.

Next let's dig into the classes that perform that actual work above, known as the **low-level APIs**

## Low-level APIs

Within the HSEngine, there are three main classes: `FATEModel`, `FATEConverter` and `KFServingDeployer`. If more complicated deployment senarios or configurations are needed, these classes can be used directly, instead of using the high-level API.

All the configurations are actually exposed via the high-level API too, please refer to the docs of the HSEngine class for how to work with it. (or see the appendix at the end of this tutorial)

### Low-level APIs: FATEModel

Below is a FATEModel example, people can specify the local model path, a in-memory model dict or simply provide the model version and the FATE flow server access info. For the last case, the model will be downloaded from FATE flow server.

Finally, a model dict instance will be created and saved inside the `FATEModel` object. Any future workflow will be based on this "model dict".

In [10]:
from hsengine.trained_model.fate_model import FATEModel

fm = FATEModel(model_path=None,
               model_dict=None,
               model_version="2021040701544681092419",
               fate_flow_host=None,
               fate_flow_port=None,
               api_version="v1",
               role="guest",
               party_id=None,
               model_id=None)

[I 210413 06:54:39 fate_client:93] Downloading FATE model with role: guest, party_id: 3333, model_id: arbiter-3333#guest-3333#host-3333#model, model_version: 2021040701544681092419
[I 210413 06:54:40 fate_client:113] FATE model is saved to: /tmp/tmpw9c3jaax
[I 210413 06:54:40 fate_model:133] FATEModel initialized with component type: HomoLR and target framework: sklearn


### Low-level APIs: FATEConverter

A `FATEConverter` accept a `FATEModel` as input. And its `convert` method will give a model object of the common framework, based on the type of the original `FATEModel`.

Currenty, `HomoLR` model will be converted to scikit-learn's `LogisticRegression` model. And for "HomoNN", a `tensorflow.keras` model or `torch.nn` model will be returned based on the original trained model within FATE.

`model_storage` related topics will be covered later.

In [11]:
from hsengine.converters.fate.fate_converter import FATEConverter
from hsengine.integration.model_storage import ModelStorageType

fc = FATEConverter(fm,
                   model_storage_type=ModelStorageType.LOCAL_FILE,
                   model_storage_kwargs=None)

converted_model = fc.convert()

`converted_model` is of type `ConvertedModel` that contains two attribute:  
* `framework`: a string indicating the ML framework of the model.  
* `model`: the converted model object.

In [12]:
vars(converted_model)

{'framework': 'sklearn',
 'model': LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
           intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
           penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
           verbose=0, warm_start=False)}

`FATEModel` also has a `save_model` method, that by default will serialize the converted model - for example, using joblib to dump the sklearn model - to save it into local filesystem for future use.

In [13]:
fc.save_model("/tmp/saved_model.joblib")

[I 210413 06:54:51 base:41] Saved model of framework: sklearn into /tmp/saved_model.joblib


'/tmp/saved_model.joblib'

Sometimes, we can stop here. `FATEConverter` is all we need - we only needs the converted model object and we can then do whatever we want with it - save it to local path, do local prediction directly, etc.

If we want to use KFServing to serve our model, then go on to the next section about deployers classes.

### Low-level APIs: KFServingDeployer & sub-classes

A `KFServingDeployer` is in charge of taking the converted model and setting up a serving service from it. It is an abstract class, that people should not use it directly. A convenient method `get_kfserving_deployer` can be used to get the sub-class deployer instance according to the framework of the converted model.

In [14]:
from hsengine.deployers.kfserving import get_kfserving_deployer

deployer = get_kfserving_deployer(service_name=SERVICE_NAME, 
                                  version=VERSION, 
                                  converted_model=converted_model,
                                  protocol_version="v1",
                                  model_storage_type=ModelStorageType.MINIO,
                                  model_storage_kwargs=None,
                                  storage_uri=None,
                                  isvc=None,
                                  kfserving_config=None,
                                  replace=True,
                                  framework_kwargs=None) # for specific server config
type(deployer)

hsengine.deployers.kfserving.sklearn.SKLearnV1KFDeployer

The `KFServingDeployer` contains several methods to do the setup. The most common ones are `deploy`, `status`, `wait` and `destroy` methods, performing the tasks as their names indicate.

In [15]:
deployer.deploy()
deployer.wait()
# deployer.status()
# deployer.destroy()

[I 210413 07:01:52 base:113] Preparing model storage and InferenceService spec...
[I 210413 07:01:52 minio:158] Uploaded model objects into path: s3://models/sklearn-lr/1.0
[I 210413 07:01:52 base:102] Prepared model with uri: s3://models/sklearn-lr/1.0
[I 210413 07:01:52 base:145] InferenceService spec ready
[I 210413 07:01:52 base:121] Creating InferenceService sklearn-lr...
[I 210413 07:01:54 base:131] InferenceService: sklearn-lr created. To check service readiness, call this deployer's status(), wait() methods, or use KFServing query APIs


NAME                 READY      PREV                      LATEST                    URL                                                              
sklearn-lr           Unknown                                                                                                                         
sklearn-lr           Unknown                                                                                                                         
sklearn-lr           Unknown    0                         100                                                                                        
sklearn-lr           Unknown    0                         100                                                                                        
sklearn-lr           Unknown    0                         100                                                                                        
sklearn-lr           Unknown    0                         100                                       

Beside these methods, people can further customize the service by editing the InferenceService (isvc) object embedded ini the deployer directly.

* First, call the `prepare_model` method to prepare the model files needed for the service. This step is required by KFServing - it needs a url to fetch the model file. Different framework requires different model files layout, and our deployer classes here take care of it.
  * Sometimes this is enough - it will returns a uri that can be used to set up an InferenceService. People who are familiar with KFServing can use this uri to manually create the service in different kubernetes clusters.
* Then, call the `prepare_isvc` method to generate the isvc object based on the framework and the model storage uri we just prepared.
* Now we can get the isvc object and add whatever changes to it as we need.

In [16]:
deployer.destroy()
deployer = get_kfserving_deployer(service_name=SERVICE_NAME, 
                                  version=VERSION, 
                                  converted_model=converted_model,
                                  protocol_version="v1",
                                  model_storage_type=ModelStorageType.MINIO,
                                  model_storage_kwargs=None,
                                  storage_uri=None,
                                  isvc=None,
                                  kfserving_config=None,
                                  replace=True,
                                  framework_kwargs=None) # for specific server config

deployer.prepare_model()
deployer.prepare_isvc()
isvc = deployer.isvc
# isvc.some_attribute = some_value

[I 210413 07:02:21 base:155] InferenceService sklearn-lr is deleted
[I 210413 07:02:21 minio:158] Uploaded model objects into path: s3://models/sklearn-lr/1.0
[I 210413 07:02:21 base:102] Prepared model with uri: s3://models/sklearn-lr/1.0
[I 210413 07:02:21 base:145] InferenceService spec ready


* Finally we call `deploy` method to deploy our customized isvc object

In [17]:
deployer.deploy()

[I 210413 07:16:47 base:121] Creating InferenceService sklearn-lr...
[I 210413 07:16:49 base:131] InferenceService: sklearn-lr created. To check service readiness, call this deployer's status(), wait() methods, or use KFServing query APIs


{'apiVersion': 'serving.kubeflow.org/v1beta1',
 'kind': 'InferenceService',
 'metadata': {'annotations': {'hsengine.dev/uuid': '11152016-e850-4ad3-b498-68f4b2282e43'},
  'creationTimestamp': '2021-04-13T07:16:49Z',
  'generation': 1,
  'managedFields': [{'apiVersion': 'serving.kubeflow.org/v1beta1',
    'fieldsType': 'FieldsV1',
    'fieldsV1': {'f:metadata': {'f:annotations': {'.': {},
       'f:hsengine.dev/uuid': {}}},
     'f:spec': {'.': {},
      'f:predictor': {'.': {},
       'f:sklearn': {'.': {}, 'f:protocolVersion': {}, 'f:storageUri': {}}}}},
    'manager': 'OpenAPI-Generator',
    'operation': 'Update',
    'time': '2021-04-13T07:16:47Z'}],
  'name': 'sklearn-lr',
  'namespace': 'fate-3333',
  'resourceVersion': '5154815',
  'uid': '9433c0b8-475d-488b-9714-0a8e2c5ce23a'},
 'spec': {'predictor': {'sklearn': {'name': 'kfserving-container',
    'protocolVersion': 'v1',
    'resources': {'limits': {'cpu': '1', 'memory': '2Gi'},
     'requests': {'cpu': '1', 'memory': '2Gi'}},


### Low-level APIs: BaseModelStorage & MinIOModelStorage

Model storage represents an interface to save the model files. By default the `BaseModelStorage` is used by the `FATEConverter` to save model to local path. And `MinIOModelStorage` is used by `KFServingDeployer` to upload model files for future serving use.

Refer to the doc of each class for detailed explanation of the parameters. For example, the `KFServingDeployer` uses code similar to the following cell to initialize its inner model storage.

In [18]:
from hsengine.integration.model_storage import *
minio_storage = MinIOModelStorage(sub_path="",
                                  framework="",
                                  bucket="models",
                                  endpoint=None,
                                  access_key=None,
                                  secret_key=None,
                                  region=None,
                                  secure=True)

# Appendix A: HSEngine signature
A illustrated above, the low-level APIs expose lots of paramters. When using the HSEngine class, all these parameter can be provided too, though not very straightforward. Here is what the HSEngine initilization function looks like

In [19]:
help(HSEngine)

Help on class HSEngine in module hsengine.hsengine:

class HSEngine(builtins.object)
 |  HSEngine(service_name, version='1.0', converter_kwargs=None, deployer_kwargs=None, **kwargs)
 |  
 |  Methods defined here:
 |  
 |  __init__(self, service_name, version='1.0', converter_kwargs=None, deployer_kwargs=None, **kwargs)
 |      :param service_name: the service name
 |      :param version: the service version, default to "1.0"
 |      :param converter_kwargs: a dict of keyword arguments that will be used to infer
 |                               and initialize the converter. See the doc of get_converter
 |                               and subclasses of ConverterBase for supported arguments.
 |      :param deployer_kwargs: a dict of keyword arguments that will be used to infer
 |                              the deployer type and initialize it. See the doc of
 |                              get_deployer() and subclasses of the ConverterBase class
 |                              for suppo