# DSU ONNX - Example #1: scikit-learn

Please create a Python enviroment with Python 3.8 or 3.7. ONNX Runtime currently support python 3.8 or lower.





In [None]:
!pip install -r requirements.txt

# 1. scikit-learn:  Train the model (Optional)

Models are also available from here:

        https://storage.googleapis.com/dsu-models-20020301/example-1-sklearn/rfr_regressor.joblib

        https://storage.googleapis.com/dsu-models-20020301/example-1-sklearn/rfr_regressor.onnx

You know the spiel with this one.

In [None]:
import pandas as pd
import numpy as np

In [None]:
df = pd.read_csv('dsu-data.csv')

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df.drop('price', axis=1),df['price'], test_size=0.3, random_state=32)

In [None]:
from sklearn.ensemble import RandomForestRegressor

hyper_params = {
    'n_jobs':4, # uses 4 threads -> faster training
    'n_estimators':3000
}

rfr = RandomForestRegressor()
rfr.set_params(**hyper_params)

rfr.fit(X_train, y_train)
score = rfr.score(X_train, y_train).round(6)
print("Model score", score)

### Run inference on scikit-learn model


In [None]:
rfr.predict([[10]])

### Saving/Loading model in joblib format

scikit-learn has a native support for joblib exports. It is a viable method of saving and consuming scikit-learn models but as showcased ONNX offers benefits compared to joblib.

1. Smaller module footprint
2. Faster inference
3. Smaller model size

In [None]:
from joblib import dump, load
dump(rfr, 'rfr_regressor.joblib')

In [None]:
# loading joblib models
from joblib import load
load('rfr_regressor.joblib')

# 2. ONNX: Convert to ONNX

ONNX documentation

sklearn-onnx docs: http://onnx.ai/sklearn-onnx/introduction.html <br/>
supported scikit-learn models: https://onnx.ai/sklearn-onnx/supported.html

In [None]:
# import
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import Int64TensorType

In [None]:
# ONNX 'needs' to know what will be the input to the model
# Int64TensorType defines that the input will be a single Int64 value
# ONNX model will be expecting 'model_input = [10]'

initial_type = [("input_array", Int64TensorType([1]))]


### Advanced input cases

This might be a tricky part when converting models to ONNX as it must be defined what will be the input.

#### Multiple features
Let's consider the following example: 

```initial_type = [("input_array", Int64TensorType([6]))]```

In this case the input should 6 integers: 

```model_input = [1,10,14,2,5,7]```

#### Batch inference
```initial_type = [("input_array", Int64TensorType([None ,6]))]```

Notice 'None' in the Int64TensorType with this the ONNX model will be able to run inference on batches.

 ```model_input = [[1,78,90,10,77,6], [1,10,14,2,5,7], [10, ..]]```


### Convert scikit-learn model to ONNX

In [None]:
# Conversion to ONNX
onx = convert_sklearn(rfr, initial_types=initial_type)

# Serialization
with open("rfr_regressor.onnx", "wb") as f:
    f.write(onx.SerializeToString())

### Run ONNX inference

In [None]:
# import ONNX Runtime
import onnxruntime as rt

# create an ONNX inference session
sess = rt.InferenceSession('rfr_regressor.onnx')

# get input and label names - we defined these at the model conversion step
input_name = sess.get_inputs()[0].name
label_name = sess.get_outputs()[0].name

# run inference
model_input = [10]
result = sess.run([label_name], {input_name: model_input})
result

# Fin

Compare the output from scikit-learn and ONNX. They should be approximately the same.