# Getting Started with BentoML

[BentoML](http://bentoml.ai) is an open-source platform for __machine learning model serving__.

What does BentoML do?

* Turn your ML model into production API endpoint with just a few lines of code
* Support all major machine learning training frameworks
* High performance API serving system with adaptive micro-batching support
* DevOps best practices baked in, simplify the transition from model development to production
* Model management for teams, providing CLI and Web UI dashboard
* Flexible model deployment orchestration with support for AWS Lambda, SageMaker, EC2, Docker, Kubernetes, KNative and more

This is a quick tutorial on how to use BentoML to serve a sklearn modeld via a REST API server and deploy it to [AWS Lambda](https://aws.amazon.com/lambda/) as a serverless endpoint.

![Impression](https://www.google-analytics.com/collect?v=1&tid=UA-112879361-3&cid=555&t=event&ec=guides&ea=bentoml-quick-start-guide&dt=bentoml-quick-start-guide)

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

BentoML requires python 3.6 or above, install via `pip`:

In [None]:
# Install BentoML
!pip install bentoml

# Also install scikit-learn, we will use a sklean model as an example
!pip install pandas sklearn

## Creating a Prediction Service with BentoML


A minimal prediction service in BentoML looks something like this:

In [3]:
%%writefile iris_classifier.py
from bentoml import BentoService, api, env, artifacts
from bentoml.artifact import SklearnModelArtifact
from bentoml.handlers import DataframeHandler

@artifacts([SklearnModelArtifact('model')])
@env(auto_pip_dependencies=True)
class IrisClassifier(BentoService):

    @api(DataframeHandler)
    def predict(self, df):
        return self.artifacts.model.predict(df)

Overwriting iris_classifier.py


The `bentoml.api` decorator defines a service API, which is the entry point for sending prediction request. The function being decorated is user defined code for processing prediction requests. Lastly the `DataframeHandler` here tells BentoML that this service API is expecting `pandas.DataFrame` object as its input format.

The `bentoml.env` decorator allows specifying the dependencies and environment settings for this prediction service. Here we are using BentoML's `auto_pip_dependencies` fature which automatically extracts and bundles all pip packages that are required for your prediction service and pins down their version.


Lastly `bentoml.artifact` defines the required trained models to be
bundled with this prediction service. Here it is using the built-in `SklearnModelArtifact` and simply naming it 'model'. BentoML also provide model artifact classes for other frameworks such as `PytorchModelArtifact`, `KerasModelArtifact`, `FastaiModelArtifact`, and `XgboostModelArtifact` etc.


## Creating a BentoService saved bundle

No thing needs to be changed in your regular model training and evaluation code:

In [4]:
from sklearn import svm
from sklearn import datasets

# Load training data
iris = datasets.load_iris()
X, y = iris.data, iris.target

# Model Training
clf = svm.SVC(gamma='scale')
clf.fit(X, y)

SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

Following by the model training code, use the IrisClassifier BentoService class defined above to package this model for serving:

In [5]:
# import the custom BentoService defined above
from iris_classifier import IrisClassifier

# Create a iris classifier service instance
svc = IrisClassifier()

# Pack the newly trained model artifact
svc.pack('model', clf)

# save BentoSerivce to a BentoML bundle
saved_path = svc.save()
print("saved_path:", saved_path)

# Check the auto-generated service version
# Which can also be set manually with svc.set_version() before `save`
print("version:", svc.version)

[2020-04-03 01:53:17,971] INFO - BentoService bundle 'IrisClassifier:20200403015304_3FC8C9' saved to: /Users/chaoyu/bentoml/repository/IrisClassifier/20200403015304_3FC8C9
saved_path: /Users/chaoyu/bentoml/repository/IrisClassifier/20200403015304_3FC8C9
version: 20200403015304_3FC8C9


_That's it._ You've just created a BentoService SavedBundle, it's a versioned file archive that is
ready for production deployment. It contains the BentoService class you defined, all its
python code dependencies and PyPI dependencies, and the trained scikit-learn model. By
default, BentoML saves those files and related metadata under `~/bentoml` directory, but 
this is easily customizable to a different directory or cloud storage like
[Amazon S3](https://aws.amazon.com/s3/).

## Model Serving via REST API

From a BentoService SavedBundle, you can start a REST API server by providing the file path to the saved bundle:

In [6]:
# Note that REST API serving **does not work in Google Colab** due to unable to access Colab's VM
#!bentoml serve IrisClassifier:latest

# Alternatively:
#!bentoml serve {saved_path}

[2020-04-03 01:54:21,534] INFO - Getting latest version IrisClassifier:20200403015304_3FC8C9
 * Serving Flask app "IrisClassifier" (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
127.0.0.1 - - [03/Apr/2020 01:54:33] "[37mPOST /predict HTTP/1.1[0m" 200 -
^C


#### View documentations for REST APIs

The REST API server provides a simply web UI for you to test and debug. If you are running this command on your local machine, visit http://127.0.0.1:5000 in your browser and try out sending API request to the server.

![BentoML API Server Web UI Screenshot](https://raw.githubusercontent.com/bentoml/BentoML/master/guides/quick-start/bento-api-server-web-ui.png)

#### Send prediction request to REST API server

You can also send prediction request with `curl` from command line:

```bash
curl -i \
--header "Content-Type: application/json" \
--request POST \
--data '[[5.1, 3.5, 1.4, 0.2]]' \
localhost:5000/predict
```

Or with `python` and `request` library:
```python
import requests
response = requests.post("http://127.0.0.1:5000/predict", json=[[5.1, 3.5, 1.4, 0.2]])
print(response.text)
```

In [1]:
!bentoml get IrisClassifier:latest

[2020-04-06 10:26:57,288] INFO - Getting latest version IrisClassifier:20200305171229_0A1411
[39m{
  "name": "IrisClassifier",
  "version": "20200305171229_0A1411",
  "uri": {
    "type": "LOCAL",
    "uri": "/Users/bozhaoyu/bentoml/repository/IrisClassifier/20200305171229_0A1411"
  },
  "bentoServiceMetadata": {
    "name": "IrisClassifier",
    "version": "20200305171229_0A1411",
    "createdAt": "2020-03-06T01:12:49.431011Z",
    "env": {
      "condaEnv": "name: bentoml-IrisClassifier\nchannels:\n- defaults\ndependencies:\n- python=3.7.3\n- pip\n",
      "pipDependencies": "bentoml==0.6.2\nscikit-learn",
      "pythonVersion": "3.7.3"
    },
    "artifacts": [
      {
        "name": "model",
        "artifactType": "SklearnModelArtifact"
      }
    ],
    "apis": [
      {
        "name": "predict",
        "handlerType": "DataframeHandler",
        "docs": "BentoService API",
        "handlerConfig": {
          "orient": "records",
          "typ": "frame",
          "input_dt