## Dynamic Inference (Serving Models in Production via REST)
Inference is the term used to describe the process of using a pre-trained model to make predictions for unseen data.
Dynamic Inference is the term used to describe making predictions on demand, using a server. 

This notebook is a walk through for how to serve a machine learning model using a low latency prediction servering system called **[clipper.ai](http://clipper.ai/)**. 
clipper can be hosted on your favorite cloud provider or on-premise.

Overview 
+ Model training
+ Clipper cluster creation
+ App creation & model deployment
+ Model query (single row, multiple rows) via Python requests & curl
+ Model versioning update 
+ Model versioning rollback
+ Model replication

References:
+ [clipper.ai documentation](http://clipper.ai/)

+ [clipper @github](https://github.com/ucbrise/clipper)

### Model Training
Before we can serve a model, we must first train it. Model training is an iterative process often persisting artifacts to disk. After model has been trained, we can load the model back into memory (RAM) when serving.

In [1]:
import logging, xgboost as xgb, numpy as np
from sklearn.metrics import mean_absolute_error
import joblib
import pandas as pd
from datetime import datetime
import pickle
import time
import matplotlib.pyplot as plt
plt.show(block=True)

training_examples = pd.read_pickle("../data/processed/airlines_training_examples.pkl")
f1=open("../data/processed/airlines_training_targets.pkl",'rb')
training_targets = pickle.load(f1) 
f1.close()
test_examples = pd.read_pickle("../data/processed/airlines_test_examples.pkl")

def get_train_points():
     return training_examples.values.tolist()

def get_test_points(start_row_index,end_row_index):
    return test_examples.iloc[start_row_index:end_row_index].values.tolist()

def get_test_point(row_index):
     return test_examples.iloc[row_index].tolist()
    
# Create a training matrix.
dtrain = xgb.DMatrix(get_train_points(), label=training_targets)
# We then create parameters, watchlist, and specify the number of rounds
# This is code that we use to build our XGBoost Model, and your code may differ.
param = {'max_depth': 2, 'eta': 1, 'silent': 1, 'objective': 'binary:logistic'}
watchlist = [(dtrain, 'train')]
num_round = 2
bst = xgb.train(param, dtrain, num_round, watchlist)

[0]	train-error:0.378671
[1]	train-error:0.375975


In [2]:
def predict(xs):
    result = bst.predict(xgb.DMatrix(xs))
    return result 
# make predictions
predictions = predict(test_examples.values)
print("Predict instances in test set using custom defined scoring function...")
predictions

Predict instances in test set using custom defined scoring function...


array([0.9041093 , 0.87798643, 0.9511411 , ..., 0.86522025, 0.87798643,
       0.87798643], dtype=float32)

### Clipper Cluster Creation

In [3]:
from clipper_admin import ClipperConnection, DockerContainerManager
clipper_conn = ClipperConnection(DockerContainerManager())
print("Start Clipper...")
clipper_conn.start_clipper()

18-08-22:00:19:15 INFO     [docker_container_manager.py:151] [default-cluster] Starting managed Redis instance in Docker


Start Clipper...


18-08-22:00:19:17 INFO     [docker_container_manager.py:229] [default-cluster] Metric Configuration Saved at /private/var/folders/kv/w56d6z9j4c79zvw8c8jsn6hw0000gn/T/tmpnagwxgkh.yml
18-08-22:00:19:18 INFO     [clipper_admin.py:138] [default-cluster] Clipper is running


### App Creation & Model Deployment

In [4]:
from clipper_admin.deployers import python as python_deployer
print("Register Clipper application...")
clipper_conn.register_application('xgboost-airlines', 'doubles', 'default_pred', 100000)

# We specify which packages to install in the pkgs_to_install arg.
# For example, if we wanted to install xgboost and psycopg2, we would use
# pkgs_to_install = ['xgboost', 'psycopg2']
print("Deploy predict function closure using Clipper...")
python_deployer.deploy_python_closure(clipper_conn, name='xgboost-model', version=1,
    input_type="doubles", func=predict, pkgs_to_install=['xgboost'])

18-08-22:00:19:21 INFO     [clipper_admin.py:215] [default-cluster] Application xgboost-airlines was successfully registered
18-08-22:00:19:21 INFO     [deployer_utils.py:41] Saving function to /var/folders/kv/w56d6z9j4c79zvw8c8jsn6hw0000gn/T/tmpnonwe561clipper
18-08-22:00:19:21 INFO     [deployer_utils.py:51] Serialized and supplied predict function
18-08-22:00:19:21 INFO     [python.py:192] Python closure saved
18-08-22:00:19:21 INFO     [python.py:206] Using Python 3.6 base image
18-08-22:00:19:21 INFO     [clipper_admin.py:467] [default-cluster] Building model Docker image with model data from /var/folders/kv/w56d6z9j4c79zvw8c8jsn6hw0000gn/T/tmpnonwe561clipper


Register Clipper application...
Deploy predict function closure using Clipper...


18-08-22:00:19:22 INFO     [clipper_admin.py:472] [default-cluster] Step 1/3 : FROM clipper/python36-closure-container:develop
18-08-22:00:19:22 INFO     [clipper_admin.py:472] [default-cluster]  ---> 0fac6e6e8242
18-08-22:00:19:22 INFO     [clipper_admin.py:472] [default-cluster] Step 2/3 : RUN apt-get -y install build-essential && pip install xgboost
18-08-22:00:19:22 INFO     [clipper_admin.py:472] [default-cluster]  ---> Using cache
18-08-22:00:19:22 INFO     [clipper_admin.py:472] [default-cluster]  ---> 761b4e2e5cea
18-08-22:00:19:22 INFO     [clipper_admin.py:472] [default-cluster] Step 3/3 : COPY /var/folders/kv/w56d6z9j4c79zvw8c8jsn6hw0000gn/T/tmpnonwe561clipper /model/
18-08-22:00:19:22 INFO     [clipper_admin.py:472] [default-cluster]  ---> 791bea6320e2
18-08-22:00:19:22 INFO     [clipper_admin.py:472] [default-cluster] Successfully built 791bea6320e2
18-08-22:00:19:22 INFO     [clipper_admin.py:472] [default-cluster] Successfully tagged default-cluster-xgboost-model:1
18-08

In [5]:
print("Link Clipper connection to model application...")
clipper_conn.link_model_to_app('xgboost-airlines', 'xgboost-model')

18-08-22:00:19:34 INFO     [clipper_admin.py:277] [default-cluster] Model xgboost-model is now linked to application xgboost-airlines


Link Clipper connection to model application...


### Query Model via Python requests module

In [6]:
import requests, json
# Get Address
addr = clipper_conn.get_query_addr()
print("Model predict for a single instance via Python requests POST request & parse response...")

# Post Query
response = requests.post(
     "http://%s/%s/predict" % (addr, 'xgboost-airlines'),
     headers={"Content-type": "application/json"},
     data=json.dumps({
         'input': get_test_point(0)
     }))
result = response.json() 
result

Model predict for a single instance via Python requests POST request & parse response...


{'query_id': 0, 'output': 0.9041093, 'default': False}

### Query Model (single row)

In [7]:
import requests, json, numpy as np
print("Model predict for a single instance via Python requests POST request...")
headers = {"Content-type": "application/json"}
requests.post("http://localhost:1337/xgboost-airlines/predict", headers=headers, data=json.dumps({"input": get_test_point(0)})).json()

Model predict for a single instance via Python requests POST request...


{'query_id': 1, 'output': 0.9041093, 'default': False}

### Query Model (multiple rows)

In [8]:
import requests, json, numpy as np
print("Model predict for a batch of instances via Python requests POST request...")
headers = {"Content-type": "application/json"}
requests.post("http://localhost:1337/xgboost-airlines/predict", headers=headers, 
              data=json.dumps({"input_batch": get_test_points(0,2)})).json()

Model predict for a batch of instances via Python requests POST request...


{'batch_predictions': [{'query_id': 2, 'output': 0.9041093, 'default': False},
  {'query_id': 3, 'output': 0.87798643, 'default': False}]}

### Query Model via curl

In [9]:
get_test_point(0)
print("Model predict for a single instance via curl...")
!curl -X POST --header "Content-Type:application/json" -d '{"input": [1987.0, 10.0, 1.0, 4.0, 1250.0, 1340.0, 1509.0, 50.0, 226.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]}' 127.0.0.1:1337/xgboost-airlines/predict

Model predict for a single instance via curl...
{"query_id":4,"output":0.9041093,"default":false}

### Model Update
Suppose you found a new set of hyper-parameters which increase the predictive power of your model. You found that it yields better prediction results than the first model you deployed. We decide to deploy version 2 of our XGBoost model.

Let's first retrain v2...

In [13]:
# Create a training matrix.
dtrain = xgb.DMatrix(get_train_points(), label=training_targets)
# We then create parameters, watchlist, and specify the number of rounds
# This is code that we use to build our XGBoost Model, and your code may differ.
param = {'max_depth': 3, 'eta': 1, 'silent': 1, 'objective': 'binary:logistic'}
watchlist = [(dtrain, 'train')]
num_round = 2
bst_v2 = xgb.train(param, dtrain, num_round, watchlist)

[0]	train-error:0.369608
[1]	train-error:0.351091


In [14]:
def predict(xs):
    result = bst_v2.predict(xgb.DMatrix(xs))
    return result 
# make predictions
predictions = predict(test_examples.values)
print("Predict instances in test set using custom defined scoring function...")
predictions

Predict instances in test set using custom defined scoring function...


array([0.8924916, 0.7657896, 0.8924916, ..., 0.9179656, 0.7657896,
       0.7657896], dtype=float32)

In [15]:
# Deploy the 'predict' function as a model (to a new container)... observe the name change
python_deployer.deploy_python_closure(clipper_conn, name='xgboostv2-model', version=2,
    input_type="doubles", func=predict, pkgs_to_install=['xgboost'])

18-08-22:00:26:38 INFO     [deployer_utils.py:41] Saving function to /var/folders/kv/w56d6z9j4c79zvw8c8jsn6hw0000gn/T/tmplfrzglliclipper
18-08-22:00:26:38 INFO     [deployer_utils.py:51] Serialized and supplied predict function
18-08-22:00:26:38 INFO     [python.py:192] Python closure saved
18-08-22:00:26:38 INFO     [python.py:206] Using Python 3.6 base image
18-08-22:00:26:38 INFO     [clipper_admin.py:467] [default-cluster] Building model Docker image with model data from /var/folders/kv/w56d6z9j4c79zvw8c8jsn6hw0000gn/T/tmplfrzglliclipper
18-08-22:00:26:39 INFO     [clipper_admin.py:472] [default-cluster] Step 1/3 : FROM clipper/python36-closure-container:develop
18-08-22:00:26:39 INFO     [clipper_admin.py:472] [default-cluster]  ---> 0fac6e6e8242
18-08-22:00:26:39 INFO     [clipper_admin.py:472] [default-cluster] Step 2/3 : RUN apt-get -y install build-essential && pip install xgboost
18-08-22:00:26:39 INFO     [clipper_admin.py:472] [default-cluster]  ---> Using cache
18-08-22:00

### Query Model (single row)

In [17]:
import requests, json, numpy as np
print("Model predict for a single instance via Python requests POST request...")
headers = {"Content-type": "application/json"}
requests.post("http://localhost:1337/xgboost-airlines/predict", headers=headers, 
              data=json.dumps({"input": get_test_point(0)})).json()

Model predict for a single instance via Python requests POST request...


{'query_id': 5, 'output': 0.9041093, 'default': False}

### Model Rollback
Suppose you find out that model v2 is overfitting, here's how you can roll it back to v1...

In [19]:
# rollback
clipper_conn.set_model_version(name='xgboostv2-model', version='2')

In [20]:
import requests, json, numpy as np
print("Model predict for a single instance via Python requests POST request...")
headers = {"Content-type": "application/json"}
requests.post("http://localhost:1337/xgboost-airlines/predict", headers=headers, data=json.dumps({"input": get_test_point(0)})).json()

Model predict for a single instance via Python requests POST request...


{'query_id': 6, 'output': 0.9041093, 'default': False}

### Model Replication

Machine learning models can be computationally expensive. A single instance of the model hosting machine may not meet the throughput requirements of a serving workload. In order to increase the prediction throughput you can add additional replicas...

In [21]:
clipper_conn.set_num_replicas('xgboost-model', num_replicas=10, version='1')

18-08-22:00:33:13 INFO     [docker_container_manager.py:353] [default-cluster] Found 1 replicas for xgboost-model:1. Adding 9


In [22]:
!docker ps

CONTAINER ID        IMAGE                                 COMMAND                  CREATED              STATUS                        PORTS                                            NAMES
2e14fab54dde        default-cluster-xgboost-model:1       "/container/containe…"   57 seconds ago       Up 56 seconds (healthy)                                                        xgboost-model_1-50045
cedd0a671508        default-cluster-xgboost-model:1       "/container/containe…"   58 seconds ago       Up 57 seconds (healthy)                                                        xgboost-model_1-58540
325a91f5e787        default-cluster-xgboost-model:1       "/container/containe…"   59 seconds ago       Up 58 seconds (healthy)                                                        xgboost-model_1-55105
529cf633fc3d        default-cluster-xgboost-model:1       "/container/containe…"   About a minute ago   Up 59 seconds (healthy)                                                        xgboost-m

### Clipper Troubleshooting Guide
[This](http://clipper.ai/tutorials/troubleshooting/) is a helpful guide for troubleshooting common issues with clipper.ai.

In [None]:
# clipper_conn.inspect_instance()
# clipper_conn.get_clipper_logs()

### Shut down clipper

In [23]:
print("Shutting down Clipper connection.")
clipper_conn.stop_all()

Shutting down Clipper connection.


18-08-22:00:45:22 INFO     [clipper_admin.py:1278] [default-cluster] Stopped all Clipper cluster and all model containers


In [24]:
# stop all containers:
!docker rm $(docker ps -a -q)

2e14fab54dde
cedd0a671508
325a91f5e787
529cf633fc3d
7048b26b9eca
bdfeeba04421
d2d15c45e696
b41933a7c398
967b5e3e2b6a
c12ae436a22b
def6b8ed204e
0f2029271e62
9e4b79f49149
d0cce2095a2b
f744d1ad7d2f
da5efdc9dc58


In [25]:
!docker ps

CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES


### Some other useful system commands

In [None]:
# stop all containers:
# docker kill $(docker ps -q)

# remove all containers
# !docker rm $(docker ps -a -q)

# remove all docker images
# docker rmi $(docker images -q)