## Instructions

```
                                         1.2 Create an archive

                                       ┌──────────────────────┐
                                       │                      │
                                       │   Torchserve Model   │
                                       │       Archive        │
                                       │                      │
                                       │   ┌──────────────┐   │   ┌──────────────┐
                                       │   │   model.py   │   │   │              │              1.3
      1.1.1 from lab3   ───────────────┼─► ├──────────────┤   │   │  Torchserve  │              # workers
                                       │   │ one_layer.pt │   │   │              │ ◄──────────  # batchsize
                                       │   └──────────────┘   │   │    config    │              max batch delay
                                       │                      │   │              │              etc.
        preprocess  code               │   ┌──────────────┐   │   └──────┬───────┘
1.1.2      call model     ─────────────┼─► │  handler.py  │   │          │
        postprocess code               │   └──────────────┘   │          │
                                       │                      │          │
                                       └──────────┬───────────┘          │
                                                  │                      │
                                                  │                      │
                                                  │                      │
                                       ┌──────────▼──────────────────────▼───────┐
                                       │                                         │    1.4 Upload to storage
                                       │   Storage   ( MinIO / S3 / Url / PVC )  │
                                       │                                         │
                                       └────────────────────┬────────────────────┘
                                                            │
                                       ┌────────────────────▼────────────────────┐
                                       │                                         │    2 Define KServe Yaml
                                       │             KServe Predictor            │
                                       │                                         │    3 Do some basic testing
                                       │             ( scaling pods )            │
                                       │                                         │    4 Autoscaling
                                       └─────────────────────────────────────────┘
                                                                                      5 Canary Rollout
     
 ```
 
The lab mainly covers:
- PyTorch Serve: package PyTorch model with custom preprocess/postprocess functions
- MinIO storage usage
- KServe: basic, autoscaling, canary rollout

## 1 PyTorch Serve

#### 1.1 Prepartion for Model Archiver

Prepare 3 files:
- pytorch_one_layer.pt: a serialized file (.pt or .pth) should be a checkpoint in case of torchscript and state_dict in case of eager mode.
- model.py: a model file should contain the model architecture. This file is mandatory in case of eager mode models.
- handler.py: codes for model initialization, pre-processing, post-processing, etc.


##### 1.1.1 pytorch_one_layer.pt

I have already uploaded it in https://github.com/vmware/ml-ops-platform-for-vsphere/tree/main/website/content/en/docs/kubeflow-tutorial/lab4_files/pytorch_one_layer.pt, which comes from [Lab3](../lab3.md):

```python
if RANK == 0:
    print("saving model to", args.dir)
    os.makedirs(args.dir, exist_ok=True) 
    torch.save(model.state_dict(), os.path.join(args.dir, "pytorch_one_layer.pt"))
```

<span style="color:red">If you are using JupyterLab in Kubeflow, remember to upload it to `torchserve/pytorch_one_layer.pt`</span>

##### 1.1.2 model.py

The pytorch_one_layer.pt does not contains model architecture, we need to provide model architecture definition with torchserve.

Learn more about eager-mode vs torchscript here:
https://pytorch.org/tutorials/beginner/deploy_seq2seq_hybrid_frontend_tutorial.html

Copy model architecture class `class Net(nn.Module)` from Lab3 to the cell below. 

Just run the cell and the code inside will be saved into `torchserve/model.py`

In [1]:
!mkdir -p torchserve

In [2]:
%%writefile torchserve/model.py
import torch.nn as nn

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.linear = nn.Linear(5, 2)

    def forward(self, x):
        x = self.linear(x)
        return x

Overwriting torchserve/model.py


##### 1.1.3 Handler.py

What can handler.py do? (https://pytorch.org/serve/custom_service.html)

- Initialize the model instance
- Pre-process input data before it is sent to the model for inference or Captum explanations
- Customize how the model is invoked for inference or explanations
- Post-process output from the model before sending back a response

Implement `preprocess` and `postprocess` functions with the reference of `lab2` &  `lab3` 
- Preprocess: [Feature Extraction in lab2]
- Postprocess: [PyTorch code in lab3]

Just run the cell and the code inside will be saved into `torchserve/handler.py`

The data flow:
   ```json
   {"instances": ["This is the first email"]}

   {"instances": ["This is the second email"]}
   ```
   ↓↓↓  torchserve web server: combines multiple HTTP request into batches, forward batch requests to `Handler.py`
   ```python
   [
      "This is the first email",
      "This is the second email",
   ]
   ```
   ↓↓↓  Handler.py preprocess: convert list of dict into `torch tensor` for model inference
   ```python
   [
      [0, 0, 0, 0, 0],
      [1, 1, 1, 1, 1],
   ]
   ```
   ↓↓↓  PyTorch Model Inference
   ```python
   [
      [0.5, -0.3],
      [0.3, 0.8],
   ]
   ```
   ↓↓↓  Handler.py postprocess
   ```python
   [
      {'model_version': '1', 'prediction': 'ham'},
      {'model_version': '1', 'prediction': 'spam'},
   ]
   ```


In [3]:
%%writefile torchserve/handler.py
# custom handler file

# model_handler.py

"""
ModelHandler defines a custom model handler.
"""

import logging
import torch
from ts.torch_handler.base_handler import BaseHandler

# BaseHandler:
# https://github.com/pytorch/serve/blob/master/ts/torch_handler/base_handler.py

class ModelHandler(BaseHandler):
    """
    A custom model handler implementation.
    """

    def preprocess(self, batch):
        """
        Transform raw input into model input data.
        :param batch: list of raw requests, should match batch size
        :return: list of preprocessed model input data
        """
        feature_list = []
        logging.info("[preprocess] batch received:")
        logging.info(batch)
        for email in batch:
            # extract features from email
            feature = []
            # short text
            short_text = len(email) < 500
            feature.append(int(short_text))
            # high frequency words
            high_frequency_words = ["body", "business", "html", "money"]
            for word in high_frequency_words:
                contain_bool = word in email
                feature.append(int(contain_bool))

            feature_list.append(feature)

        logging.info("Preprocess result:")
        logging.info(feature_list)
        return torch.as_tensor(feature_list, dtype=torch.float32, device=self.device)

    def postprocess(self, inference_output):
        """
        Return inference result.
        :param inference_output: list of inference output
        :return: list of predict results
        """
        # Take output from network and post-process to desired format
        logging.info("Logits from model:")
        logging.info(inference_output)

        pred = inference_output.max(1)[1]
        positive_dict = {"version": "2", "prediction": "spam"}
        negative_dict = {"version": "2", "prediction": "ham"}
        postprocess_result = list(map(
                lambda x: positive_dict if x == 1 else negative_dict, 
                pred))

        logging.info("Postprocess result:")
        logging.info(postprocess_result)
        return postprocess_result

Overwriting torchserve/handler.py


#### 1.2 Torchserve Model Archiver

It basically create a tar called `{model-name}.mar` from `model-file`, `serialized-file (*.pt)`, `handler`

In [4]:
%%bash
cd $(dirname $0)/torchserve
base_path=$(pwd)

mkdir -p $base_path/model-store && cd $base_path/model-store &&
if [ -f $base_path/model-store/spam_email.mar ]; then
    rm $base_path/model-store/spam_email.mar
fi

pip install torch-model-archiver -i https://pypi.tuna.tsinghua.edu.cn/simple

torch-model-archiver --model-name spam_email --version 1.0 \
--model-file $base_path/model.py \
--serialized-file $base_path/pytorch_one_layer.pt \
--handler $base_path/handler.py

echo "create successfully"

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
create successfully


#### 1.3 create torchserve config

Feel free to change the parameters:

- minWorkers: the minimum number of workers of a model
- maxWorkers: the maximum number of workers of a model
- batchSize: the batch size of a model
- maxBatchDelay: the maximum dalay in msec of a batch of a model
- responseTimeout: the timeout in msec of a model's response
- defaultVersion: the default version of a model
- marName: the mar file name of a model


In [5]:
!mkdir -p torchserve/config

In [6]:
%%writefile torchserve/config/config.properties

inference_address=http://0.0.0.0:8085
management_address=http://0.0.0.0:8085
metrics_address=http://0.0.0.0:8082
grpc_inference_port=7070
grpc_management_port=7071
enable_metrics_api=true
metrics_format=prometheus
number_of_netty_threads=4
job_queue_size=10
enable_envvars_config=true
install_py_dep_per_model=true
model_store=/home/model-server/torchserve_mar/spam_email/model-store
model_snapshot={"name":"startup.cfg","modelCount":1,"models":{"spam_email":{"1.0":{"defaultVersion":true,"marName":"spam_email.mar","minWorkers":1,"maxWorkers":5,"batchSize":4,"maxBatchDelay":100,"responseTimeout":120}}}}

Overwriting torchserve/config/config.properties


#### 1.4 Upload to MinIO

If you already have the minio storage, you can directly follow the next steps. If not, we also provide a standalone minio deployment guide on the kubernetes clusters.

You can use the files from here [https://github.com/vmware/ml-ops-platform-for-vsphere/tree/main/website/content/en/docs/kubeflow-tutorial/lab4_minio_deploy], and apply in your clusters.

`kubectl apply -f minio-standalone-pvc.yml` 

`kubectl apply -f minio-standalone-service.yml`

`kubectl apply -f minio-standalone-deployment.yml`

This step uploads `torchserve/model-store`, `torchserve/config` to MinIO buckets

You need to find the MINIO
- `endpoint_url`
- `key_id`
- `access_key`

In [7]:
!pip install boto3 -i https://pypi.tuna.tsinghua.edu.cn/simple

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple


In [8]:
import os
from urllib.parse import urlparse
import boto3

os.environ["AWS_ENDPOINT_URL"] = "http://10.117.233.16:9000"
os.environ["AWS_REGION"] = "us-east-1"
os.environ["AWS_ACCESS_KEY_ID"] = "minioadmin"
os.environ["AWS_SECRET_ACCESS_KEY"] = "minioadmin"

s3 = boto3.resource('s3',
                    endpoint_url=os.getenv("AWS_ENDPOINT_URL"),
                    verify=True)

In [9]:
print("current buckets in s3:")
print(list(s3.buckets.all()))

current buckets in s3:
[s3.Bucket(name='xujinheng-bucket')]


In [10]:
bucket_name='spam-bucket'
s3.create_bucket(Bucket=bucket_name)

Upload files to your bucket_name, and you can also specify `bucket_path`

In [11]:
curr_path = os.getcwd()
base_path = os.path.join(curr_path, "torchserve")

bucket_path = "spam_email"

bucket = s3.Bucket(bucket_name)

# upload
bucket.upload_file(os.path.join(base_path, "model-store", "spam_email.mar"),
                   os.path.join(bucket_path, "model-store/spam_email.mar"))
bucket.upload_file(os.path.join(base_path, "config", "config.properties"), 
                   os.path.join(bucket_path, "config/config.properties"))

# check files 
for obj in bucket.objects.filter(Prefix=bucket_path):
    print(obj.key)

spam_email/config/config.properties
spam_email/model-store/spam_email.mar


## 2 KServe

#### 2.1 Create Minio service account && secret

- You will also need to specify the `s3-endpoint`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` here
- If you are using default user `user@exampe.com/12341234`, please also set a different name for all the <span style="color:red">metadata: name</span> in the yaml file. 

In [12]:
%%bash

cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
  name: minio-s3-secret-user
  annotations:
     serving.kserve.io/s3-endpoint: "10.117.233.16:9000" # replace with your s3 endpoint e.g minio-service.kubeflow:9000
     serving.kserve.io/s3-usehttps: "0" # by default 1, if testing with minio you can set to 0
     serving.kserve.io/s3-region: "us-east-2"
     serving.kserve.io/s3-useanoncredential: "false" # omitting this is the same as false, if true will ignore provided credential and use anonymous credentials
type: Opaque
stringData: # use "stringData" for raw credential string or "data" for base64 encoded string
  AWS_ACCESS_KEY_ID: minioadmin
  AWS_SECRET_ACCESS_KEY: minioadmin
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: minio-service-account-user
secrets:
- name: minio-s3-secret-user
EOF

secret/minio-s3-secret-user configured
serviceaccount/minio-service-account-user configured


#### 2.2 Create InferenceService from MinIO

- Set `storageUri` to your `bucket_name/bucket_path`
- You may also need to change `metadata: name` and `serviceAccountName` 

In [13]:
%%bash

cat << EOF | kubectl apply -f -
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "spam-email-serving"
spec:
  predictor:
    serviceAccountName: minio-service-account-user
    model:
      modelFormat:
        name: pytorch
      storageUri: "s3://spam-bucket/spam_email"
      resources:
          requests:
            cpu: 50m
            memory: 200Mi
          limits:
            cpu: 100m
            memory: 500Mi
          # limits:
          #   nvidia.com/gpu: "1"   # for inference service on GPU
EOF

inferenceservice.serving.kserve.io/spam-email-serving created


#### 2.3 Kubeflow UI

Check model logs at [Kubeflow UI -> Models](/models/)


## 3 Test 

#### 3.1 Define a Test_bot for convenience

In [14]:
!pip install multiprocess -i https://pypi.tuna.tsinghua.edu.cn/simple

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple


In [15]:
import requests
import json
import multiprocess as mp

class Test_bot():
    def __init__(self, uri, model, host, session):
        self.uri = uri
        self.model = model
        self.host = host
        self.session = session
        self.headers = {'Host': self.host, 'Content-Type': "application/json", 'Cookie': "authservice_session=" + self.session}
        self.email = [
        # features: shorter_text, body, business, html, money
        "[0, 0, 0, 0, 0] email longer than 500 character" + "a" * 500,                                     # ham
        "[1, 0, 0, 0, 0] email shorter than 500 character",                                                # ham
        "[1, 0, 1, 1, 1] email shorter than 500 character + business + html + money",                      # spam
        "[0, 1, 0, 0, 1] email longer than 500 character + body" + "a" * 500,                              # spam
        "[0, 1, 1, 1, 1] email longer than 500 character + body + business + html + money" + "a" * 500,    # spam
        "[1, 1, 1, 1, 1] email shorter than 500 character body + business + html + money",                 # spam
        ]
    
    def update_uri(self, uri):
        self.uri = uri
        
    def update_model(self, model):
        self.model = model
        
    def update_host(self, host):
        self.host = host
        self.update_headers()
        
    def update_session(self, session):
        self.session = session
        self.update_headers()
        
    def update_headers(self):
        self.headers = {'Host': self.host, 'Content-Type': "application/json", 'Cookie': "authservice_session=" + self.session}
        
    def get_data(self, x):
        if isinstance(x, str):
            email = x
        elif isinstance(x, int):
            email = self.email[x % 6]
        else:
            email = self.email[0]
        json_data = json.dumps({
            "instances": [
                email,
            ]
        })
        return json_data
        
    def readiness(self):
        uri = self.uri + '/v1/models/' + self.model
        response = requests.get(uri, headers = self.headers, timeout=5)
        return response.text

    def predict(self, x=None):
        uri = self.uri + '/v1/models/' + self.model + ':predict'
        response = requests.post(uri, data=self.get_data(x), headers = self.headers, timeout=10)
        return response.text
    
    def explain(self, x=None):
        uri = self.uri + '/v1/models/' + self.model + ':explain'
        response = requests.post(uri, data=self.get_data(x), headers = self.headers, timeout=10)
        return response.text
    
    def concurrent_predict(self, num=10):
        print("fire " + str(num) + " requests to " + self.host)
        with mp.Pool() as pool:
            responses = pool.map(self.predict, range(num))
        return responses

#### 3.2 Determine host and session

Run the following cell to get `host`, which will be set to the headers in our request

In [16]:
!kubectl get inferenceservice spam-email-serving -o jsonpath='{.status.url}' | cut -d "/" -f 3

spam-email-serving.kubeflow-user-example-com.example.com


Use your web browser to login to Kubeflow, and get `Cookies: authservice_session` (Chrome: Developer Tools -> Applications -> Cookies)

In [20]:
               # replace it with the url you used to access Kubeflow
bot = Test_bot(uri='http://10.117.233.8',
               model='spam_email',
               # replace it with what is printed above
               host='spam-email-serving.kubeflow-user-example-com.example.com',
               # replace it
               session='MTY2NjE2MDYyMHxOd3dBTkZZelVqVkdOVkJIVUVGR1IweEVTbG95VVRZMU5WaEVXbE5GTlV0WlVrWk1WRk5FTkU5WVIxZFJRelpLVFZoWVVFOVdSa0U9fMj0VhQPme_rORhhdy0mtBJk-yGWdzibFfPMdU3TztbJ')

print(bot.readiness())
print(bot.predict(0))
# We didn't implement model explainer, so this result will be 500: Internal Server Error
# https://kserve.github.io/website/0.8/modelserving/explainer/explainer/
# print(bot.explain(0))

{"name": "spam_email", "ready": true}
{"predictions": [{"version": "2", "prediction": "ham"}]}


## 4 Autoscaling

- Knative Pod Autoscaler (KPA)
  - Part of the Knative Serving core and enabled by default once Knative Serving is installed.
  - Supports scale to zero functionality.
  - Does not support CPU-based autoscaling.
  
- Horizontal Pod Autoscaler (HPA)
  - Not part of the Knative Serving core, and must be enabled after Knative Serving installation.
  - Does not support scale to zero functionality.
  - Supports CPU-based autoscaling.

<span style="color:red">If you use CPU-based autotscaling, ake sure HPA is installed before move on </span> (check by `kubectl get deploy autoscaler-hpa -n knative-serving`), will need to install it from https://github.com/knative/serving/releases/

Add autoscaling tag to the InferenceService and apply

In [21]:
%%bash

cat << EOF | kubectl apply -f -
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: spam-email-serving
  annotations:
    autoscaling.knative.dev/class: hpa.autoscaling.knative.dev
    # see available tags: https://knative.dev/docs/serving/autoscaling/autoscaling-targets/
    autoscaling.knative.dev/max-scale: "3"
    # HPA: specifies the CPU percentage target (default "80"). 
    # KPA: Target x requests in-flight per pod.
    autoscaling.knative.dev/target: "80"  
spec:
  predictor:
    pytorch:
      # use uri as storage is also supported
      storageUri: https://github.com/vmware/ml-ops-platform-for-vsphere/tree/main/website/content/en/docs/kubeflow-tutorial/lab4_files/v1.zip
      resources:
        requests:
          cpu: 50m
          memory: 200Mi
        limits:
          cpu: 200m
          memory: 500Mi
EOF

inferenceservice.serving.kserve.io/spam-email-serving configured


Check the number of pods. It takes a while before the one deployment get replaced.

In [22]:
!kubectl get pod

NAME                                                              READY   STATUS        RESTARTS   AGE
bitfusion-notebook-01-0                                           2/2     Running       0          46d
ml-pipeline-ui-artifact-7cd897c59f-kzlfs                          2/2     Running       0          49d
ml-pipeline-visualizationserver-795f7db965-gzjsm                  2/2     Running       0          49d
model-serving-test-0                                              2/2     Running       0          22h
sklearn-iris-predictor-default-00001-deployment-5484f4d57-ld2fr   3/3     Running       0          18h
spam-email-jhx-predictor-default-00001-deployment-6c857d65p2vs5   1/3     Terminating   0          171m
spam-email-serving-predictor-default-00001-deployment-96949br62   3/3     Running       0          2m32s
spam-email-serving-predictor-default-00002-deployment-6fc9r98xx   3/3     Running       0          37s
spam-email-serving-predictor-default-00002-deployment-cdd4lz26s   1/3 

Adjust num of concurrent predict requests, fire it, let the the number of pods scale up

In [23]:
responses = bot.concurrent_predict(num=1000)

fire 1000 requests to spam-email-serving.kubeflow-user-example-com.example.com


Check the number of pods again

In [24]:
!kubectl get pod

NAME                                                              READY   STATUS        RESTARTS   AGE
bitfusion-notebook-01-0                                           2/2     Running       0          46d
ml-pipeline-ui-artifact-7cd897c59f-kzlfs                          2/2     Running       0          49d
ml-pipeline-visualizationserver-795f7db965-gzjsm                  2/2     Running       0          49d
model-serving-test-0                                              2/2     Running       0          22h
sklearn-iris-predictor-default-00001-deployment-5484f4d57-ld2fr   3/3     Running       0          18h
spam-email-serving-predictor-default-00001-deployment-96949br62   1/3     Terminating   0          3m55s
spam-email-serving-predictor-default-00002-deployment-6fc9ghcv7   2/3     Running       0          15s
spam-email-serving-predictor-default-00002-deployment-6fc9jjlnt   2/3     Running       0          12s
spam-email-serving-predictor-default-00002-deployment-6fc9pz29r   2/3  

## 5 Canary Rollout

In [25]:
%%bash

cat << EOF | kubectl apply -f -
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: spam-email-serving
  annotations:
    autoscaling.knative.dev/class: hpa.autoscaling.knative.dev
    autoscaling.knative.dev/target: "80"
    serving.kserve.io/enable-tag-routing: "true"
spec:
  predictor:
    canaryTrafficPercent: 20
    pytorch:
      storageUri: https://github.com/vmware/ml-ops-platform-for-vsphere/tree/main/website/content/en/docs/kubeflow-tutorial/lab4_files/v2.zip
      resources:
        requests:
          cpu: 50m
          memory: 200Mi
        limits:
          cpu: 200m
          memory: 500Mi
EOF

inferenceservice.serving.kserve.io/spam-email-serving configured


In [28]:
!kubectl get revisions -l serving.kserve.io/inferenceservice=spam-email-serving

NAME                                         CONFIG NAME                            K8S SERVICE NAME   GENERATION   READY   REASON   ACTUAL REPLICAS   DESIRED REPLICAS
spam-email-serving-predictor-default-00001   spam-email-serving-predictor-default                      1            True             0                 1
spam-email-serving-predictor-default-00002   spam-email-serving-predictor-default                      2            True             4                 4
spam-email-serving-predictor-default-00003   spam-email-serving-predictor-default                      3            True             1                 0


In [29]:
!kubectl get pods -l serving.kserve.io/inferenceservice=spam-email-serving

NAME                                                              READY   STATUS        RESTARTS   AGE
spam-email-serving-predictor-default-00001-deployment-96949br62   1/3     Terminating   0          4m50s
spam-email-serving-predictor-default-00001-deployment-9694gwd89   2/3     Running       0          33s
spam-email-serving-predictor-default-00002-deployment-6fc9ghcv7   3/3     Running       0          70s
spam-email-serving-predictor-default-00002-deployment-6fc9jjlnt   3/3     Running       0          67s
spam-email-serving-predictor-default-00002-deployment-6fc9pz29r   3/3     Running       0          67s
spam-email-serving-predictor-default-00002-deployment-6fc9r98xx   3/3     Running       0          2m55s
spam-email-serving-predictor-default-00003-deployment-fb5drzk4m   3/3     Running       0          30s


check traffic status

In [30]:
!kubectl get isvc spam-email-serving
!kubectl get isvc spam-email-serving -o yaml

NAME                 URL                                                               READY   PREV   LATEST   PREVROLLEDOUTREVISION                        LATESTREADYREVISION                          AGE
spam-email-serving   http://spam-email-serving.kubeflow-user-example-com.example.com   True    80     20       spam-email-serving-predictor-default-00002   spam-email-serving-predictor-default-00003   5m5s
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  annotations:
    autoscaling.knative.dev/class: hpa.autoscaling.knative.dev
    autoscaling.knative.dev/target: "80"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"serving.kserve.io/v1beta1","kind":"InferenceService","metadata":{"annotations":{"autoscaling.knative.dev/class":"hpa.autoscaling.knative.dev","autoscaling.knative.dev/target":"80","serving.kserve.io/enable-tag-routing":"true"},"name":"spam-email-serving","namespace":"kubeflow-user-example-com"},"spec":{"predictor":{"cana

Fire concurrent predict request the model, you should see most of the responses have `version : 1`, but `20%` have `version: 2`

In [31]:
responses = bot.concurrent_predict(100)
print("Number of Version 1: ", len(list(filter(lambda x: '"version": "1"' in x, responses))))
print("Number of Version 2: ", len(list(filter(lambda x: '"version": "2"' in x, responses))))
print(responses)

fire 100 requests to spam-email-serving.kubeflow-user-example-com.example.com
Number of Version 1:  59
Number of Version 2:  14
['{"predictions": [{"version": "1", "prediction": "ham"}]}', '<html><title>500: Internal Server Error</title><body>500: Internal Server Error</body></html>', '{"predictions": [{"version": "1", "prediction": "spam"}]}', '{"predictions": [{"version": "1", "prediction": "ham"}]}', '<html><title>500: Internal Server Error</title><body>500: Internal Server Error</body></html>', '{"predictions": [{"version": "1", "prediction": "spam"}]}', '{"predictions": [{"version": "1", "prediction": "ham"}]}', '{"predictions": [{"version": "2", "prediction": "ham"}]}', '{"predictions": [{"version": "1", "prediction": "spam"}]}', '<html><title>500: Internal Server Error</title><body>500: Internal Server Error</body></html>', '{"predictions": [{"version": "1", "prediction": "spam"}]}', '<html><title>500: Internal Server Error</title><body>500: Internal Server Error</body></html>',

Replace host address with the url you print in the cell above, but use the url starts with `prev-spam-email`

You should see all responses have `version: 1`


In [32]:
bot.update_host('prev-spam-email-serving-predictor-default.user.example.com')
responses = bot.concurrent_predict(20)
print("Number of Version 1: ", len(list(filter(lambda x: '"version": "1"' in x, responses))))
print("Number of Version 2: ", len(list(filter(lambda x: '"version": "2"' in x, responses))))
print(responses)

fire 20 requests to prev-spam-email-serving-predictor-default.user.example.com
Number of Version 1:  0
Number of Version 2:  0
['<!DOCTYPE html>\n<html lang="en">\n<head>\n<meta charset="utf-8">\n<title>Error</title>\n</head>\n<body>\n<pre>Cannot POST /v1/models/spam_email:predict</pre>\n</body>\n</html>\n', '<!DOCTYPE html>\n<html lang="en">\n<head>\n<meta charset="utf-8">\n<title>Error</title>\n</head>\n<body>\n<pre>Cannot POST /v1/models/spam_email:predict</pre>\n</body>\n</html>\n', '<!DOCTYPE html>\n<html lang="en">\n<head>\n<meta charset="utf-8">\n<title>Error</title>\n</head>\n<body>\n<pre>Cannot POST /v1/models/spam_email:predict</pre>\n</body>\n</html>\n', '<!DOCTYPE html>\n<html lang="en">\n<head>\n<meta charset="utf-8">\n<title>Error</title>\n</head>\n<body>\n<pre>Cannot POST /v1/models/spam_email:predict</pre>\n</body>\n</html>\n', '<!DOCTYPE html>\n<html lang="en">\n<head>\n<meta charset="utf-8">\n<title>Error</title>\n</head>\n<body>\n<pre>Cannot POST /v1/models/spam_emai

Adjust traffic of new model

In [33]:
!kubectl patch isvc spam-email-serving --type='json' -p '[{"op": "replace", "path": "/spec/predictor/canaryTrafficPercent", "value": 50}]'

inferenceservice.serving.kserve.io/spam-email-serving patched


Set traffic of new model to 100%

In [34]:
!kubectl patch isvc spam-email-serving --type='json' -p '[{"op": "replace", "path": "/spec/predictor/canaryTrafficPercent", "value": 100}]'
!kubectl get isvc spam-email-serving
!kubectl get isvc spam-email-serving -o yaml

inferenceservice.serving.kserve.io/spam-email-serving patched
NAME                 URL                                                               READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                          AGE
spam-email-serving   http://spam-email-serving.kubeflow-user-example-com.example.com   True           100                              spam-email-serving-predictor-default-00003   5m41s
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  annotations:
    autoscaling.knative.dev/class: hpa.autoscaling.knative.dev
    autoscaling.knative.dev/target: "80"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"serving.kserve.io/v1beta1","kind":"InferenceService","metadata":{"annotations":{"autoscaling.knative.dev/class":"hpa.autoscaling.knative.dev","autoscaling.knative.dev/target":"80","serving.kserve.io/enable-tag-routing":"true"},"name":"spam-email-serving","namespace":"kubeflow-user-example-com"},"spec

rollback the new model¶

In [35]:
!kubectl patch isvc spam-email-serving --type='json' -p '[{"op": "replace", "path": "/spec/predictor/canaryTrafficPercent", "value": 0}]'
!kubectl get isvc spam-email-serving
!kubectl get isvc spam-email-serving -o yaml

inferenceservice.serving.kserve.io/spam-email-serving patched
NAME                 URL                                                               READY   PREV   LATEST   PREVROLLEDOUTREVISION                        LATESTREADYREVISION                          AGE
spam-email-serving   http://spam-email-serving.kubeflow-user-example-com.example.com   True    100    0        spam-email-serving-predictor-default-00002   spam-email-serving-predictor-default-00003   5m52s
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  annotations:
    autoscaling.knative.dev/class: hpa.autoscaling.knative.dev
    autoscaling.knative.dev/target: "80"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"serving.kserve.io/v1beta1","kind":"InferenceService","metadata":{"annotations":{"autoscaling.knative.dev/class":"hpa.autoscaling.knative.dev","autoscaling.knative.dev/target":"80","serving.kserve.io/enable-tag-routing":"true"},"name":"spam-email-serving","nam

## 6 More
Explore the Kserve 0.8 docs here https://kserve.github.io/website/0.8/modelserving/control_plane/

- Multi Model Serving
- Transformers
- Model Explainability
- Model Monitoring
- Payload Logging
- etc.
