<a href="https://colab.research.google.com/github/timsetsfire/odsc-ml-drum/blob/main/Colab%20-%20DRUM%20Model%20Serving.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# DRUM - Automated Model Serving Made Easy

 We'll get our hands dirty by 

* Building a simple regression model using Scikit
* Using DRUM for Batch Scoring
* Using DRUM to get a REST API endpoint
* Show a simple example app connected to the REST API
* H2O, Keras, XGBoost, and DataRobot
* Add a DataRobot remote agent if you are interested in further model monitoring


# Build a Model

In [1]:
!git clone https://github.com/timsetsfire/odsc-ml-drum.git

Cloning into 'odsc-ml-drum'...
remote: Enumerating objects: 598, done.[K
remote: Counting objects: 100% (139/139), done.[K
remote: Compressing objects: 100% (133/133), done.[K
remote: Total 598 (delta 75), reused 0 (delta 0), pack-reused 459[K
Receiving objects: 100% (598/598), 83.45 MiB | 33.30 MiB/s, done.
Resolving deltas: 100% (284/284), done.


In [2]:
!pip install -r /content/odsc-ml-drum/colab-requirements.txt -q

[K     |████████████████████████████████| 276kB 6.8MB/s 
[K     |████████████████████████████████| 8.7MB 24.3MB/s 
[K     |████████████████████████████████| 276kB 49.4MB/s 
[K     |████████████████████████████████| 148.9MB 83kB/s 
[K     |████████████████████████████████| 61kB 9.3MB/s 
[K     |████████████████████████████████| 788kB 45.5MB/s 
[K     |████████████████████████████████| 51kB 7.5MB/s 
[K     |████████████████████████████████| 153kB 53.7MB/s 
[K     |████████████████████████████████| 204kB 55.2MB/s 
[K     |████████████████████████████████| 808kB 38.7MB/s 
[K     |████████████████████████████████| 102kB 12.6MB/s 
[K     |████████████████████████████████| 71kB 9.7MB/s 
[K     |████████████████████████████████| 552kB 41.2MB/s 
[?25h  Building wheel for PyYAML (setup.py) ... [?25l[?25hdone
  Building wheel for strictyaml (setup.py) ... [?25l[?25hdone
  Building wheel for memory-profiler (setup.py) ... [?25l[?25hdone
  Building wheel for progress (setup.py) 

In [39]:
%%sh
pip install datarobot-drum -U -q

ERROR: albumentations 0.1.12 has requirement imgaug<0.2.7,>=0.2.5, but you'll have imgaug 0.2.9 which is incompatible.


In [4]:
!sudo apt install nginx -q

Reading package lists...
Building dependency tree...
Reading state information...
nginx is already the newest version (1.14.0-0ubuntu1.7).
The following package was automatically installed and is no longer required:
  libnvidia-common-460
Use 'sudo apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 34 not upgraded.


In [5]:
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
import pickle
import datetime

## load data

df = pd.read_csv('/content/odsc-ml-drum/data/boston_housing.csv')
df.head()

## set features and target

X = df.drop('MEDV', axis=1)
y = df['MEDV']

## train the model
rf = RandomForestRegressor(n_estimators = 20)
rf.fit(X,y)

## serialize the model

with open('/content/odsc-ml-drum/src/custom_model/rf.pkl', 'wb') as pkl:
    pickle.dump(rf, pkl)

# Testing the Model

In [43]:
%%sh 
drum perf-test --code-dir /content/odsc-ml-drum/src/custom_model \
--input /content/odsc-ml-drum/data/boston_housing_inference.csv \
--target-type regression

DRUM performance test
Model:      /content/odsc-ml-drum/src/custom_model
Data:       /content/odsc-ml-drum/data/boston_housing_inference.csv
# Features: 13
Preparing test data...



Running test case with timeout: 600
Running test case: 72 bytes - 1 samples, 100 iterations
Running test case with timeout: 600
Running test case: 0.1MB - 1449 samples, 50 iterations
Running test case with timeout: 600
Running test case: 10MB - 144964 samples, 5 iterations
Running test case with timeout: 600
Running test case: 50MB - 724823 samples, 1 iterations
Test is done stopping drum server

  size     samples   iters    min     avg     max    used (MB)   total physical 
                                                                      (MB)      
72 bytes         1     100   0.007   0.008   0.011     554.668         12993.484
0.1MB         1449      50   0.014   0.018   0.122     560.395         12993.484
10MB        144964       5   0.604   0.614   0.637     648.484         12993.484
50MB        7

tput: terminal attributes: No such device or address



# Validation

In [41]:
%%sh 
drum validation --code-dir /content/odsc-ml-drum/src/custom_model \
--input /content/odsc-ml-drum/data/boston_housing_inference.csv \
--target-type regression > drum_validation.log

2021-05-27 01:34:33.385143: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
  defaults = yaml.load(f)
2021-05-27 01:34:36.870478: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
  defaults = yaml.load(f)
2021-05-27 01:34:40.348295: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
  defaults = yaml.load(f)
2021-05-27 01:34:43.839968: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
  defaults = yaml.load(f)
2021-05-27 01:34:47.334422: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
  defaults = yaml.load(f)
2021-05-27 01:34:50.853418: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so

In [42]:
%%sh
tail drum_validation.log



Validation checks results
      Test case          Status   Details
Basic batch prediction   PASSED          
Null value imputation    PASSED          


# Batch Scoring with DRUM
<a id="setup_complete"></a>

At this point our model has been written to disk and we want to start making predictions with it.  To do this, we'll leverage DRUM and it's ability to natively handle our scikit learn model, all we need to do is tell DRUM where it resides as well as the data we wish to score.  

There are a lot of frameworks which DRUM supports nateively, but for those which DRUM doesn't support of these shelf, we'll just need to create some custom hooks so DRUM.  In this example, we'll highlight some very simple custom hooks, and will provide links to more complex examples.  

In [6]:
%%sh 
drum score --code-dir /content/odsc-ml-drum/src/custom_model \
--input /content/odsc-ml-drum/data/boston_housing_inference.csv \
--output /content/odsc-ml-drum/data/predictions.csv --target-type regression

2021-05-27 01:21:56.349475: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
  defaults = yaml.load(f)


In [7]:
pd.read_csv("/content/odsc-ml-drum/data/predictions.csv").head()

Unnamed: 0,Predictions
0,26.5
1,21.88
2,35.015
3,34.91
4,35.69


# Start the inference server locally

Batch scoring can be very useful, but the utility DRUM offers does not stop there.  We can also leverage DRUM to serve our model as a RESTful API endpoint.  The only thing that changes is the way we will structure the command - using the `server` mode instead of `score` model.  We'll also need to provide an address which is NOT in use.  

When starting the server, we'll use `subprocess.Popen` so we may interact with the server in this notebook

In [8]:
import subprocess
import requests
import pandas as pd
from io import BytesIO
import yaml
import time
import os
import datarobot as dr
from pprint import pprint

In [9]:
run_inference_server = ["drum",
              "server",
              "--code-dir","/content/odsc-ml-drum/src/custom_model", 
              "--address", "0.0.0.0:6789", 
              "--show-perf",
              "--target-type", "regression",
              "--logging-level", "info",
              "--show-stacktrace",
              "--verbose",
              "--production", 
              "--max-workers", "5"
              ]

In [10]:
inference_server = subprocess.Popen(run_inference_server, stdout=subprocess.PIPE)

In [11]:
!sudo service nginx status

 * nginx is not running


## Ping the Server to make sure it is running

In [12]:
## confirm the server is running
time.sleep(5) ## snoozing before pinging the server to give it time to actually start
print('check status')
requests.request("GET", "http://0.0.0.0:6789/").content

check status


b'{"message": "OK"}'

## Send data to server for inference

The request must provide our dataset as form data.  In order to do so, we'll create a simple python function to pass the data over appropriately.  We'll leverage the same function in our simple flask app a little later.  

In [13]:
def score(data, port = "6789"):
    b_buf = BytesIO()
    b_buf.write(data.to_csv(index=False).encode("utf-8"))
    b_buf.seek(0)
  
    url = "http://localhost:{}/predict/".format(port)
    files = [
        ('X', b_buf)
    ]
    response = requests.request("POST", url, files = files, timeout=None, verify=False)
    return response

In [14]:
# %%timeit
scoring_data = pd.read_csv("/content/odsc-ml-drum/data/boston_housing_inference.csv")
predictions = score(scoring_data).json() ## score entire dataset but only show first 5 records
pprint(predictions)

{'predictions': [26.5,
                 21.88,
                 35.015,
                 34.91,
                 35.69,
                 27.19,
                 21.56,
                 23.47,
                 16.455]}


In [15]:
requests.request("GET", "http://0.0.0.0:6789/").content

b'{"message": "OK"}'

In [16]:
inference_server.terminate()
inference_server.stdout.readlines()

[b'Name: uWSGI\n',
 b'Version: 2.0.19.1\n',
 b'Summary: The uWSGI server\n',
 b'Home-page: https://uwsgi-docs.readthedocs.io/en/latest/\n',
 b'Author: Unbit\n',
 b'Author-email: info@unbit.it\n',
 b'License: GPLv2+\n',
 b'Location: /usr/local/lib/python3.7/dist-packages\n',
 b'Requires: \n',
 b'Required-by: mlpiper\n',
 b'Detected REST server mode - this is an advanced option\n',
 b'\x1b[32m \x1b[0m\n',
 b'\x1b[32m \x1b[0m\n',
 b'\x1b[32mComponent: uwsgi_serving\x1b[0m\n',
 b'\x1b[32mLanguage:  Python\x1b[0m\n',
 b'\x1b[32mOutput:\x1b[0m\n',
 b'\x1b[32m------------------------------------------------------------\x1b[0m\n']

In [17]:
%%sh
nginx -s stop
sudo service nginx status

 * nginx is not running


In [None]:
# requests.request("POST", "http://0.0.0.0:6789/shutdown/").content

# Value Prop

One may ask, what is the benefit to be had here?  Well, first of, there is not need for me to write an api to get the model up and running.  Second, DRUM allows me to abstract the framework away (provided I'm using one that is natively supported, or I can write enough python so that DRUM understands how to hook up to the model.  

For example, I could hot swap models as I see fit (see exampels in `./src/other_models`)

While we will run through several other frameworks with in `score` you can bet they are supported in `server` mode as well!

#### H2O Mojo

In [18]:
!drum score --code-dir /content/odsc-ml-drum/src/other_models/h2o_mojo/regression --input /content/odsc-ml-drum/data/boston_housing_inference.csv --target-type regression


SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
   Predictions
0    24.504000
1    22.492000
2    34.554001
3    34.420001
4    35.289001
5    28.394001
6    21.936000
7    23.451000
8    17.065000


#### Keras

In [19]:
!drum score --code-dir /content/odsc-ml-drum/src/other_models/python3_keras_joblib --input /content/odsc-ml-drum/data/boston_housing_inference.csv --target-type regression


2021-05-27 01:22:42.837377: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-05-27 01:22:44.372695: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-05-27 01:22:44.433723: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-27 01:22:44.434365: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:04.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.75GiB deviceMemoryBandwidth: 298.08GiB/s
2021-05-27 01:22:44.434414: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-05-27 01:22:44.553453: I tensorflow/stream_executor/platform/default

#### XGBoost

Requires XGBoost

In [20]:
!drum score --code-dir /content/odsc-ml-drum/src/other_models/python3_xgboost --input /content/odsc-ml-drum/data/boston_housing_inference.csv --target-type regression


  defaults = yaml.load(f)
2021-05-27 01:22:55.123433: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
   Predictions
0    24.541843
1    21.260277
2    34.018497
3    32.569200
4    34.248066
5    27.282364
6    20.803959
7    19.645220
8    16.968880


#### DataRobot Codegen

In [21]:
!drum score --code-dir /content/odsc-ml-drum/src/other_models/dr_codegen --input /content/odsc-ml-drum/data/boston_housing_inference.csv --target-type regression


   Predictions
0    24.258228
1    24.258228
2    32.451515
3    32.451515
4    32.451515
5    24.258228
6    21.078378
7    13.107812
8    13.107812
