# Week4 Assignments
In this week's assignments, you'll gain more hands-on experience with deploying ML models, especially using KServe. 

### Prerequisite: 
To do this week's assignments, we assume you have previously trained a LightGBM regression model for bike sharing demand prediction and have already uploaded the model artifact to your MLflow service (week1 assignments). If you haven't completed this step, uncomment and run the next code cell before proceeding to the assignments for this week.

### Guidelines for submitting the assignments
- As usual, please submit this assignment notebook with code cell outputs. It's important that these outputs are current and reflect the latest state of your code, as your grades may depend on them.
- Unlike the assignments of the previous weeks where you write Python code in the assignment notebooks, you will need to fill some configurations in JSON/YAML files in most of the assignments. In other words, you will need to put your answers in the required JSON/YAML files (not in this notebook, you can use the commands in the notebooks to check if you're progressing correctly).
- For submission, please also include these JSON/YAML files in your submission. More precisely, these files include `model-settings.json` in the "assignment1" directory and `*.yaml` in the "manifests" directory. 

In [7]:
# Prerequisite

# from train_helper import train
# params = {
#     "num_leaves": 63,
#     "learning_rate": 0.05,
#     "random_state": 42
# }
# train(params)

# HAD TO RUN THIS DUE TO SAME ERROR AS MENTIONED IN "Assignment 1 - Incompatible input types" IN MOODLE

Registered model 'Week1LgbmBikeDemand' already exists. Creating a new version of this model...
2023/12/03 00:57:20 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: Week1LgbmBikeDemand, version 2


The trained model is located at s3://mlflow/4/afbbeba3112a487398c5593768348525/artifacts/lgbm-bike


Created version '2' of model 'Week1LgbmBikeDemand'.


# Assignment 1: Use MLServer to deploy a model locally (2 points)
[MLServer](https://mlserver.readthedocs.io/en/latest/index.html) is an open-source inference server implementation for ML models. It provides an easy way to expose a model through an HTTP or gRPC endpoint. 

You already trained a LightGBM model for predicting bike sharing demand and upload it to the MLflow service in the first week. In this assignment, your task is to configure MLServer to serve your LightGBM model as an inference service locally. Detailed instructions can be found later. 

**Hints**:
- Reading the following MLServer documentation may be enough to complete the assignment:
    - [Getting started with MLServer](https://mlserver.readthedocs.io/en/latest/getting-started/index.html#). You'll see an example of using MLServer SDK to implement a custom model server in this documentation. You don't need to implement your own model server to complete this assignment as MLServer has an out-of-box inference server implementation for models registered to MLflow (see the second documentation). 
    - [Serving MLflow models](https://mlserver.readthedocs.io/en/latest/examples/mlflow/README.html).

First, let's get the dependency requirements of running the LightGBM model. 

In [1]:
# Replace this with your own LightGM model's S3 URI
model_s3_uri = "s3://mlflow/4/afbbeba3112a487398c5593768348525/artifacts/lgbm-bike"

In [2]:
import mlflow
import os 

# Configure MLflow
os.environ["MLFLOW_S3_ENDPOINT_URL"] = "http://mlflow-minio.local"

# Configure the credentials needed for accessing the MinIO storage service
os.environ["AWS_ACCESS_KEY_ID"] = "minioadmin"
os.environ["AWS_SECRET_ACCESS_KEY"] = "minioadmin"

# Download the requirements.txt file of the model and print the file's location
file_path = mlflow.pyfunc.get_model_dependencies(model_uri=model_s3_uri)
print(file_path)

2023/12/03 01:30:27 INFO mlflow.pyfunc: To install the dependencies that were used to train the model, run the following command: '%pip install -r /tmp/tmpx4q3fvfa/lgbm-bike/requirements.txt'.


/tmp/tmpx4q3fvfa/lgbm-bike/requirements.txt


In [3]:
# Move requirements.txt to the "assignment1" directory
!mv {file_path} ./assignment1

To not mess up the "mlops_eng" environment, let's create a new Python environment named "try_mlserver" and install the dependencies. Open a new terminal and run the following commands
```bash
conda create -n try_mlserver -yf python==3.10 ipykernel
conda activate try_mlserver
```

**Switch the Python environment of this notebook to the new "try_mlserver" environment** by clicking the current environment name at the upper right corner and install the dependencies by running the following two code cells. 

If you can't find the environment name, close VS Code and open it again, then reopen the notebook. This may force VS Code to detect all available Python environments.


In [4]:
# Make sure your notebook environment is try_mlserver
%pip install -r assignment1/requirements.txt

Collecting argparse==1.4.0 (from -r assignment1/requirements.txt (line 2))
  Using cached argparse-1.4.0-py2.py3-none-any.whl (23 kB)
Installing collected packages: argparse
Successfully installed argparse-1.4.0
Note: you may need to restart the kernel to use updated packages.



Let's also install the mlserver packages. mlserver-mlflow is the out-of-box server implementation for MLflow models and boto3 is required by MLServer to load the model from the MLflow service. 

In [5]:
%pip install mlserver==1.3.5 mlserver-mlflow==1.3.5 boto3~=1.28.80

Note: you may need to restart the kernel to use updated packages.


## Assignment1 instructions
Now you need to 
1. Add configurations to the empty [assignment1/model-settings.json](./assignment1/model-settings.json) to use the MLServer's MLflow runtime to serve your LightGBM model. The inference service name should be ***bike-demand-predictor***. The configuration can be adapted from the [one provided by this MLServer doc.](https://mlserver.readthedocs.io/en/latest/examples/mlflow/README.html#serving)
1. In a separate terminal (where the "try_mlserver" conda environment is activated), start an MLServer inference service to serve the LightGBM model.
1. Keep the inference service running and use the following code cell to check whether or not your configuration works. **Please keep the output of the following code cells.** The reviewer will use the output to check your MLServer configuration. 

Notes:
- When starting an MLServer inference service in a terminal, you need to change the conda environment of that terminal session to "try_mlserver" by running `conda activate try_mlserver` so that the mlserver command can be found. 

- MLServer will load the model from your MinIO storage service so you need to specify the following environment variables to allow MLServer to use the correct credentials to load the model from the correct MinIO service endpoint:
```bash
# Run the following command in a terminal
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minioadmin
export MLFLOW_S3_ENDPOINT_URL=http://mlflow-minio.local
```
These environment variables are only available in the terminal session where you defined them, so you need to start your MLServer inference service in the same terminal session where you defined the above environment variables. 

In [3]:
import requests
from mlserver.codecs import PandasCodec

import sys
from pathlib import Path

# We need to use some functions in ../train_helper.py so we just append it to the Python path at runtime. 
# The Python path is a list of directory locations where Python looks for modules and packages when you try to import them in your code.
parent_dir = str(Path.cwd().parent)
sys.path.append(parent_dir)

from train_helper import pull_data, preprocess


In [4]:

dataset_url = "https://raw.githubusercontent.com/yumoL/mlops_eng_course_datasets/master/intro/bike-demanding/train_full.csv"

# Prepare some data to send in requests
df = pull_data(dataset_url)
_, test = preprocess(df)
test_x = test.drop(["count"], axis=1)

request_data = test_x.head()

# Encode the request data following the V2 inference protocol
encoded_request_data = PandasCodec.encode_request(request_data).dict()
print(encoded_request_data)

{'parameters': {'content_type': 'pd'}, 'inputs': [{'name': 'season', 'shape': [5, 1], 'datatype': 'INT64', 'data': [4, 4, 4, 4, 4]}, {'name': 'holiday', 'shape': [5, 1], 'datatype': 'INT64', 'data': [0, 0, 0, 0, 0]}, {'name': 'workingday', 'shape': [5, 1], 'datatype': 'INT64', 'data': [1, 1, 1, 1, 1]}, {'name': 'weather', 'shape': [5, 1], 'datatype': 'INT64', 'data': [2, 2, 2, 2, 2]}, {'name': 'temp', 'shape': [5, 1], 'datatype': 'FP64', 'data': [11.48, 11.48, 10.66, 10.66, 10.66]}, {'name': 'atemp', 'shape': [5, 1], 'datatype': 'FP64', 'data': [13.635, 12.88, 12.12, 12.12, 12.88]}, {'name': 'humidity', 'shape': [5, 1], 'datatype': 'INT64', 'data': [52, 52, 56, 56, 56]}, {'name': 'windspeed', 'shape': [5, 1], 'datatype': 'FP64', 'data': [15.0013, 19.0012, 16.9979, 19.0012, 12.998]}, {'name': 'hour', 'shape': [5, 1], 'datatype': 'INT64', 'data': [0, 1, 2, 3, 4]}, {'name': 'day', 'shape': [5, 1], 'datatype': 'INT64', 'data': [13, 13, 13, 13, 13]}, {'name': 'month', 'shape': [5, 1], 'data

In [5]:
# Send a request
response = requests.post("http://localhost:8080/v2/models/bike-demand-predictor/infer", json=encoded_request_data)
response.json()

{'model_name': 'bike-demand-predictor',
 'id': 'f596c11f-35ee-4d06-bbce-24ecd3ccda8a',
 'parameters': {'content_type': 'np'},
 'outputs': [{'name': 'output-1',
   'shape': [5, 1],
   'datatype': 'FP64',
   'parameters': {'content_type': 'np'},
   'data': [37.289116222680455,
    19.406971833185164,
    10.248384070712056,
    9.602077884278172,
    9.602077884278172]}]}

Expected output:
```text
{'model_name': 'bike-demand-predictor',
 'id': 'f021577e-16fb-4686-8f1e-70f3ae2a7b76',
 'parameters': {'content_type': 'np'},
 'outputs': [{'name': 'output-1',
   'shape': [5, 1],
   'datatype': 'FP64',
   'parameters': {'content_type': 'np'},
   'data': [42.09539399666931,
    23.974666238188583,
    13.463013296846174,
    8.769204532023744,
    8.769204532023744]}]}
```
The id may vary. The output data ([42.09..., 23.97..., ...]) may also vary depending on how you trained the model in the first week. The key point is that the response should follow the same format as the expected output. 

---

#### Now, you need to **switch the notebook's environment back to "mlops_eng"** for the rest of the assignments. 

### Guidelines for doing Assignments 2-5
- In 2a), you'll need to write some Python code, so please put your code between the `### START CODE HERE` and `### END CODE HERE` comments. 
- In other assignments, you'll need to complete some configurations in given YAML files. Please write your configurations between the `### START CONF HERE` and `### END CONF HERE` comments in each YAML file.
- You will use a command `kubectl -n kserve-inference get isvc <name-of-inference-service>` (or `kubectl -n kserve-inference get ig <name-of-inference-graph>`) a few times when running this notebook. This command checks whether your inference service (or inference graph) deployed to KServe is ready. It takes some time (up to a few minutes) for a inference service/graph to become ready, so you may need to run the same command a few times to follow the readiness of your inference service/graph. You can also use the "-w" option to continuously watch the status of the inference service/graph (`kubectl get isvc <name-of-inference-service> -n kserve-inference -w`) and then terminate the code cell when the inference service/graph is ready.

# Assignment 2: Deploy a model to KServe (3 points)
In this assignment, you need to deploy your LightGBM model for predicting bike sharing demand as an inference service to KServe. You can use the model trained in the first week. 

Similar to the tutorial, the deployed inference service should run in the "kserve-inference" namespace and the service account name containing the credentials for accessing the MinIO storage service is also "kserve-sa". 

In [1]:
from kubernetes import client
from kserve import KServeClient
from kserve import constants
from kserve import V1beta1InferenceService
from kserve import V1beta1InferenceServiceSpec
from kserve import V1beta1PredictorSpec
from kserve import V1beta1ModelSpec
from kserve import V1beta1ModelFormat
import logging

from send_request import send_request

## 2a) Use Python SDK to deploy your LightGBM model
Complete the `deploy_model` function that uses the KServe SDK to deploy your LightGBM model. 

**Hint**: Using the LightGBM server provided by KServe doesn't work because the model saved by MLflow is in the pickled format, which is different from the format supported by KServe's LightGBM server. You can check [here](https://github.com/kserve/kserve/issues/2483) on how to use KServe SDK to deploy a model uploaded to MLflow.

In [2]:
def deploy_model(model_name: str, model_uri: str):
    """
    Args:
        model_name: the name of the deployed inference service
        model_uri: the S3 URI of the model saved in MLflow
    """

    logging.basicConfig(level=logging.INFO)
    logger = logging.getLogger(__name__)
    
    namespace = "kserve-inference"
    service_account_name="kserve-sa"
    kserve_version="v1beta1"
    api_version = constants.KSERVE_GROUP + "/" + kserve_version
    
    logger.info(f"MODEL URI: {model_uri}")
    
    modelspec = V1beta1ModelSpec(
        storage_uri=model_uri,
        model_format=V1beta1ModelFormat(name="mlflow"),
        protocol_version="v2"
    )
    
    isvc = V1beta1InferenceService(
        ### START CODE HERE
        # define api_version, kind, and metadata
        api_version=api_version,
        kind=constants.KSERVE_KIND,
        metadata=client.V1ObjectMeta(name=model_name, namespace=namespace),
        ### END CODE HERE

        spec=V1beta1InferenceServiceSpec(
            predictor=V1beta1PredictorSpec(
                ### START CODE HERE
                service_account_name=service_account_name,
                model=modelspec
                ### END CODE HERE
            )
        )
    )
    kserve = KServeClient()

    ### START CODE HERE
    # Create or update an inference service
    try:
        kserve.create(inferenceservice=isvc)
    except RuntimeError:
        kserve.patch(name=model_name, inferenceservice=isvc, namespace=namespace)
    ### END CODE HERE
    

In [3]:
model_name = "bike-lgbm-2a"

# Replace the storage_uri to your own one
model_uri = "s3://mlflow/4/afbbeba3112a487398c5593768348525/artifacts/lgbm-bike"

# Test the deploy_model function
deploy_model(model_name, model_uri)

2023-12-03 01:31:19.112 9574 __main__ INFO [deploy_model():16] MODEL URI: s3://mlflow/4/afbbeba3112a487398c5593768348525/artifacts/lgbm-bike


In [4]:
# Check if the "bike-lgbm-2a" inference service is ready
!kubectl -n kserve-inference get isvc bike-lgbm-2a -w

NAME           URL                                                READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                    AGE
bike-lgbm-2a   http://bike-lgbm-2a.kserve-inference.example.com   True           100                              bike-lgbm-2a-predictor-default-00001   3m47s
^C


Expected output:

```text
NAME           URL                                                READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                    AGE
bike-lgbm-2a   http://bike-lgbm-2a.kserve-inference.example.com   True           100                              bike-lgbm-2a-predictor-default-00001   72s
```

In [5]:
# Make sure there is one pod running for the "bike-lgbm" inference service
!kubectl -n kserve-inference get pod -l serving.kserve.io/inferenceservice=bike-lgbm-2a

NAME                                                              READY   STATUS    RESTARTS   AGE
bike-lgbm-2a-predictor-default-00001-deployment-5ff8687fd6jswjq   2/2     Running   0          3m54s


Expected output:
```text
NAME                                                              READY   STATUS    RESTARTS   AGE
bike-lgbm-2a-predictor-default-00001-deployment-6499598b7-wc28j   2/2     Running   0          65s
```

In [6]:
# Send a request to the inference service
send_request(model_name="bike-lgbm-2a")

{'model_name': 'bike-lgbm-2a', 'model_version': None, 'id': 'bf659546-6c43-48e3-ade9-e1b1f17a48e8', 'parameters': None, 'outputs': [{'name': 'predict', 'shape': [2], 'datatype': 'FP64', 'parameters': None, 'data': [51.00457318737209, 35.13687405851507]}]}


Expected output:
```text
{'model_name': 'bike-lgbm-2a', 
'model_version': None, 
'id': '4a91cc4c-3a04-4aa1-95ee-6bbbf04207b7', 
'parameters': None, 
'outputs': [{'name': 'predict', 'shape': [2], 'datatype': 'FP64', 'parameters': None, 'data': [51.00457318737209, 35.13687405851507]}]
}
```
**Note**: The id varies. The output data ([51.0..., 35.1...]) may also vary depending on how your model was trained. The important point is that the response has the correct fields as shown in the above expected output. 

*P.S.* KServe also uses MLServer to serve the models uploaded to the MLflow service, which means your inference service uses V2 inference protocol. If you take a look at the `send_request` function in [send_request.py](./send_request.py), you'll observe that the input data formats and the URL format align with what was used in Assignment 1. 


In [7]:
# Clean up by removing the "bike-lgbm-2a" inference service
!kubectl -n kserve-inference delete isvc bike-lgbm-2a

inferenceservice.serving.kserve.io "bike-lgbm-2a" deleted


Expected output:
```text
inferenceservice.serving.kserve.io "bike-lgbm-2a" deleted
```

## 2b) Use a YAML file to deploy the model
Instead of using the KServe SDK, now you need to use a YAML file to deploy your LightGBM model again. Please complete the configuration in [manifests/bike-lgbm-basic.yaml](./manifests/bike-lgbm-basic.yaml).

**Hint**: You can check from [this KServe doc](https://kserve.github.io/website/0.10/modelserving/v1beta1/mlflow/v2/#deploy-with-inferenceservice) on how to use a YAML manifest to deploy a model stored in MLflow.

In [8]:
# Deploy the LightGBM model for bike demand prediction as an inference service named "bike-lgbm"
!kubectl apply -f manifests/bike-lgbm-basic.yaml

inferenceservice.serving.kserve.io/bike-lgbm configured


Expected output:

```text
inferenceservice.serving.kserve.io/bike-lgbm created
```

In [9]:
# Make sure that the "bike-lgbm" inference service is ready
!kubectl -n kserve-inference get isvc bike-lgbm -w

NAME        URL                                             READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                 AGE
bike-lgbm   http://bike-lgbm.kserve-inference.example.com   True           100                              bike-lgbm-predictor-default-00001   15m
^C


Expected output:
```text
NAME        URL                                             READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                 AGE
bike-lgbm   http://bike-lgbm.kserve-inference.example.com   True           100                              bike-lgbm-predictor-default-00001   2m24s
```

In [10]:
# Make sure there is one pod running for the "bike-lgbm" inference service
!kubectl -n kserve-inference get pod -l serving.kserve.io/inferenceservice=bike-lgbm

NAME                                                           READY   STATUS    RESTARTS   AGE
bike-lgbm-predictor-default-00001-deployment-fc5cb7d97-bs2jl   2/2     Running   0          15m


Expected output: 
```text
NAME                                                           READY   STATUS    RESTARTS   AGE
bike-lgbm-predictor-default-00001-deployment-9d7b87595-k9kpk   2/2     Running   0          70s
```

In [11]:
# Send some requests to the "bike-lgbm" inference service
send_request(model_name="bike-lgbm")

{'model_name': 'bike-lgbm', 'model_version': None, 'id': '450cb948-b040-4eca-bb04-b76ed1450815', 'parameters': None, 'outputs': [{'name': 'predict', 'shape': [2], 'datatype': 'FP64', 'parameters': None, 'data': [51.00457318737209, 35.13687405851507]}]}


Expected output:
```text
{'model_name': 'bike-lgbm', 
'model_version': None, 
'id': '6783fc56-a759-41b0-9d01-94be32238b01', 
'parameters': None, 
'outputs': [{'name': 'predict', 'shape': [2], 'datatype': 'FP64', 'parameters': None, 'data': [51.00457318737209, 35.13687405851507]}]
}
```

**Note**: Please don't delete the "bike-lgbm" inference service, you will need it in Assignment3 later. 

# Assignment 3: Canary deployment in KServe (2 points)
You'll train a new LightGBM model for predicting the bike sharing demand. In this assignment, your task is to deploy the new model to KServe using the canary deployment strategy. 

Next, you need to make sure there's already a "bike-lgbm" inference service running in KServe.

In [12]:
!kubectl -n kserve-inference get isvc bike-lgbm

NAME        URL                                             READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                 AGE
bike-lgbm   http://bike-lgbm.kserve-inference.example.com   True           100                              bike-lgbm-predictor-default-00001   15m


Expected output: 
```text
NAME        URL                                             READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                 AGE
bike-lgbm   http://bike-lgbm.kserve-inference.example.com   True           100                              bike-lgbm-predictor-default-00001   47m
```

Now let's train a new LightGBM model and upload it to MLflow. 

In [13]:
# Train a new LightGBM model for predicting bike sharing demand
# This is the same as the assignment in Week1, we just change the hyperparameters of the model and skip the part that uses Deepchecks to perform offline model evaluation
# The model's S3 URI will be printed at the end
from train_helper import train

params = {
    "num_leaves": 127, # Before it was 63
    "learning_rate": 0.1, # Before it was 0.05
    "random_state": 42 
}
train(params)

2023-12-03 01:33:15.269 9574 botocore.credentials INFO [load():1124] Found credentials in environment variables.
Registered model 'Week1LgbmBikeDemand' already exists. Creating a new version of this model...
2023/12/03 01:33:16 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: Week1LgbmBikeDemand, version 4
Created version '4' of model 'Week1LgbmBikeDemand'.


The trained model is located at s3://mlflow/4/eb049d6ac84b4da195e25f77ad90c464/artifacts/lgbm-bike


Now, your task is to complete the configuration in [manifests/bike-lgbm-canary.yaml](./manifests/bike-lgbm-canary.yaml) to deploy the new LightGBM model using canary deployment. Your new model should receive **30%** of the user traffic. 

In [15]:
# Update the "bike-lgbm" inference service to use the new model
!kubectl apply -f manifests/bike-lgbm-canary.yaml

inferenceservice.serving.kserve.io/bike-lgbm configured


Expected output:
```text
inferenceservice.serving.kserve.io/bike-lgbm configured
```

In [20]:
# Check that the traffic is splitted between the old and the new version
!kubectl -n kserve-inference get isvc bike-lgbm

NAME        URL                                             READY   PREV   LATEST   PREVROLLEDOUTREVISION               LATESTREADYREVISION                 AGE
bike-lgbm   http://bike-lgbm.kserve-inference.example.com   True    70     30       bike-lgbm-predictor-default-00001   bike-lgbm-predictor-default-00002   17m


Expected output:
```text
NAME        URL                                             READY   PREV   LATEST   PREVROLLEDOUTREVISION               LATESTREADYREVISION                 AGE
bike-lgbm   http://bike-lgbm.kserve-inference.example.com   True    70     30       bike-lgbm-predictor-default-00001   bike-lgbm-predictor-default-00002   61m
```

In [21]:
# Check there are two pods (one old and one new one) running for the "bike-lgbm" inference service
!kubectl -n kserve-inference get pod -l serving.kserve.io/inferenceservice=bike-lgbm

NAME                                                            READY   STATUS    RESTARTS   AGE
bike-lgbm-predictor-default-00001-deployment-fc5cb7d97-bs2jl    2/2     Running   0          17m
bike-lgbm-predictor-default-00002-deployment-5864f87b6f-pjn8g   2/2     Running   0          33s


Expected output:

```text
NAME                                                           READY   STATUS    RESTARTS   AGE
bike-lgbm-predictor-default-00001-deployment-cc96598f-rr2xz    2/2     Running   0          63m
bike-lgbm-predictor-default-00002-deployment-6d9f5bbff-8mhn8   2/2     Running   0          3m36s
```

In [22]:
# Clean up by removing the "bike-lgbm" inference service
!kubectl -n kserve-inference delete isvc bike-lgbm

inferenceservice.serving.kserve.io "bike-lgbm" deleted


Expected output:
```text
inferenceservice.serving.kserve.io "bike-lgbm" deleted
```

# Assignment 4: Horizontal autoscaling (2 points)

In this assignment, your task is to complete the configuration in [manifests/bike-lgbm-scale.yaml](./manifests/bike-lgbm-scale.yaml) to deploy your LightGBM model to KServe and configure the horizontal autoscaling feature for the deployed inference service. Specifically, the horizontal autoscaling of the inference service should satisfy the following requirements:
1. The inference service should have ae least **2** pods running;
2. The inference service can have at most **8** pods running when it's being scaled up;
3. The inference service should be scaled up when each pod is receiving no less than 5 requests per second.

You can use whichever LightGBM model you trained for predicting bike sharing demand in this assignment. 

**Hint**: "rps" should be used as the scaling metric. 

*rps (requests per second) VS concurrency: These two metrics may look similar at the first glance. Both of them are metrics used to measure service performance. rps quantifies the number of requests a service can process within a specific time frame, often a second, whereas concurrency focuses on how many tasks a service can handle simultaneously.*

In [23]:
# Deploy an inference service named "bike-lgbm-scale"
!kubectl apply -f manifests/bike-lgbm-scale.yaml

inferenceservice.serving.kserve.io/bike-lgbm-scale created


Expected output:
```text
inferenceservice.serving.kserve.io/bike-lgbm-scale created
```

In [25]:
# Make sure the "bike-lgbm-scale" inference service is ready
!kubectl -n kserve-inference get isvc bike-lgbm-scale -w

NAME              URL   READY     PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION   AGE
bike-lgbm-scale         Unknown                                                                 10s
bike-lgbm-scale         Unknown                                                                 25s
bike-lgbm-scale         Unknown          100                              bike-lgbm-scale-predictor-default-00001   25s
bike-lgbm-scale   http://bike-lgbm-scale.kserve-inference.example.com   True             100                              bike-lgbm-scale-predictor-default-00001   26s
bike-lgbm-scale   http://bike-lgbm-scale.kserve-inference.example.com   True             100                              bike-lgbm-scale-predictor-default-00001   26s
^C


Expected output:
```text
NAME              URL                                                   READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                       AGE
bike-lgbm-scale   http://bike-lgbm-scale.kserve-inference.example.com   True           100                              bike-lgbm-scale-predictor-default-00001   90s
```


In [26]:
# Make sure there are two pods (replicas) running for the "bike-lgbm-scale" inference service
!kubectl -n kserve-inference get pods -l serving.kserve.io/inferenceservice=bike-lgbm-scale

NAME                                                              READY   STATUS    RESTARTS   AGE
bike-lgbm-scale-predictor-default-00001-deployment-747d455scg95   2/2     Running   0          56s
bike-lgbm-scale-predictor-default-00001-deployment-747d455svj46   2/2     Running   0          55s


Expected output:
```text
NAME                                                              READY   STATUS    RESTARTS   AGE
bike-lgbm-scale-predictor-default-00001-deployment-66df7bcd67mr   2/2     Running   0          7m26s
bike-lgbm-scale-predictor-default-00001-deployment-66df7bcjb6qx   2/2     Running   0          7m27s
```

Now please use the command below to print the configuration of your "bike-lgbm-scale" inference service. The output will be used to check if your configuration is correct. 

In [27]:
%%bash

kubectl -n kserve-inference get isvc bike-lgbm-scale -o json|jq .spec.predictor

{
  "maxReplicas": 8,
  "minReplicas": 2,
  "model": {
    "modelFormat": {
      "name": "mlflow"
    },
    "name": "",
    "protocolVersion": "v2",
    "resources": {},
    "storageUri": "s3://mlflow/4/eb049d6ac84b4da195e25f77ad90c464/artifacts/lgbm-bike"
  },
  "scaleMetric": "rps",
  "scaleTarget": 5,
  "serviceAccountName": "kserve-sa"
}


In [28]:
# Clean up by removing the "bike-lgbm-scale" inference service
!kubectl -n kserve-inference delete isvc bike-lgbm-scale

inferenceservice.serving.kserve.io "bike-lgbm-scale" deleted


Expected output:
```text
inferenceservice.serving.kserve.io "bike-lgbm-scale" deleted
```

# Assignment 5: Inference graph in KServe (3 points)

## 5a) Inference graph for ensemble
So far you already have two LightGBM models for predicting bike sharing demand. One was trained using the hyperparameters of {learning_rate=0.05, num_leaves=63} (denoted by Model A) and another {learning_rate=0.1, num_leaves=127} (denoted by Model B). 

You need to first complete the configuration in [manifests/bike-lgbm-graph1.yaml](./manifests/bike-lgbm-graph1.yaml) to deploy two inference services named "bike-lgbm-1" and "bike-lgbm-2". The "bike-lgbm-1" and "bike-lgbm-2" inference services should serve Model A and B, respectively.

Next, you need to complete [manifests/inference-graph1.yaml](./manifests/inference-graph1.yaml) to deploy an inference graph that includes one ensemble routing node. With this inference graph, a user will receive two predictions (one from each inference service) when they send a request.  


In [None]:
# Deploy the "bike-lgbm-1" and "bike-lgbm-2" inference services
!kubectl apply -f manifests/bike-lgbm-graph1.yaml

Expected output:

```text
inferenceservice.serving.kserve.io/bike-lgbm-1 created
inferenceservice.serving.kserve.io/bike-lgbm-2 created
```

In [None]:
# Make sure the two inference services are ready
!kubectl -n kserve-inference get isvc bike-lgbm-1 bike-lgbm-2

Expected output:

```text
NAME          URL                                               READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                   AGE
bike-lgbm-1   http://bike-lgbm-1.kserve-inference.example.com   True           100                              bike-lgbm-1-predictor-default-00001   105m
bike-lgbm-2   http://bike-lgbm-2.kserve-inference.example.com   True           100                              bike-lgbm-2-predictor-default-00001   105m
```

In [None]:
# Make sure there are two pods running for the two inference services, respectively
!kubectl -n kserve-inference get pods -l "serving.kserve.io/inferenceservice in (bike-lgbm-1,bike-lgbm-2)"

Expected output:
```text
NAME                                                              READY   STATUS    RESTARTS   AGE
bike-lgbm-1-predictor-default-00001-deployment-794547df56-48dhk   2/2     Running   0          109m
bike-lgbm-2-predictor-default-00001-deployment-cf7b449b5-rjc2q    2/2     Running   0          109m
```

In [None]:
# Deploy the inference graph named "my-graph1"
!kubectl apply -f manifests/inference-graph1.yaml

Expected output:

```text
inferencegraph.serving.kserve.io/my-graph1 created
```

In [None]:
# Make sure the "my-graph1" inference graph is ready
!kubectl -n kserve-inference get ig my-graph1

Expected output:

```text
NAME        URL                                             READY   AGE
my-graph1   http://my-graph1.kserve-inference.example.com   True    102s
```

In [None]:
# Also make sure there is one pod running for the "ensemble" inference graph
!kubectl -n kserve-inference get pods -l serving.kserve.io/inferencegraph=my-graph1

Expected output:
```text
NAME                                          READY   STATUS    RESTARTS   AGE
my-graph1-00001-deployment-7c4d7cfbf9-tr5tz   2/2     Running   0          2m7s
```

Now, let's send a request to the "my-graph1" inference graph

In [None]:
# Send a request
from send_request import send_request

send_request(to_ig=True, ig_name="my-graph1")

The output (i.e., the response) is expected to contain a prediction from the "bike-lgbm-1" inference service and another prediction from the "bike-lgbm-2" inference service. 

Example output:
```text
{'bike-lgbm-v1': {'id': 'ec476473-c736-478d-9830-e2b6f53548db', 'model_name': 'bike-lgbm-1', 'model_version': None, 'outputs': [{'data': [51.00457318737209, 35.13687405851507], 'datatype': 'FP64', 'name': 'predict', 'parameters': None, 'shape': [2]}], 'parameters': None}, 'bike-lgbm-v2': {'id': 'fd17e0c6-d17a-42e0-8cbe-d452e0107d34', 'model_name': 'bike-lgbm-2', 'model_version': None, 'outputs': [{'data': [34.87125805456099, 32.68341881111533], 'datatype': 'FP64', 'name': 'predict', 'parameters': None, 'shape': [2]}], 'parameters': None}}
```

**Note**: Do not delete the "bike-lgbm-1" and "bike-lgbm-2" inference services. They're still needed in the next assignment. 

## 5b) More complicated inference graph
In this assignment, you need to deploy a more complex inference graph containing more than one routing node. 

First let's train the third model, denoted by Model C. 

In [None]:
from train_helper import train

params = {
    "num_leaves": 127, 
    "learning_rate": 0.17,
    "min_child_samples": 50,
    "random_state": 42 
}

train(params)

Similar to Assignment 5a, your tasks are to
1. Complete [manifests/bike-lgbm-graph2.yaml](./manifests/bike-lgbm-graph2.yaml) to deploy third inference service named "bike-lgbm-3" that serves Model C you just trained. 
2. Complete [manifests/inference-graph2.yaml](./manifests/inference-graph2.yaml) to deploy an inference graph containing a Switch and a Ensemble routing node. 
The requests that will be sent to the inference graph look like:
    ```python
    {
      'inputs': ...,
      'userType': 'basic'
    }
    ```
    The inference graph should satisfy the following requirements:
    - If there is a field named "userType" in the request and its value is "basic", the request should be forwarded to an ensemble consisting of the "bike-lgbm-1" and "bike-lgbm-2" inference services. At this time, the "basic" user should receive one prediction from the "bike-lgbm-1" inference service and another prediction from the "bike-lgbm-2" inference service, just like in the previous assignment. 
    - If the value of "userType is "advanced", the request should be forwarded to the "bike-lgbm-3" inference service. At this time, the "advanced" user should receive one prediction from the "bike-lgbm-3" inference service.
    - Otherwise the request should be directly returned. 
    
    The behavior of the inference graph is illustrated in the figure below:

    <img src="./images/complex-inference-graph.jpg" width=600/>

**Hints**:
You may notice that you need to route requests from one routing node to another (instead of from a routing node to an inference service). Below is an example of configuring a routing node to forward requests to another routing node:
```yaml
...
spec: 
  nodes: 
    # The first routing node
    root: 
      routerType: ...
      steps: 
      # This routing node forwards requests to the second routing node named "ensembleNode"
      - nodeName: ensembleNode

    # The second routing node
    ensembleNode:
      routerType: ...
      steps:
      ...
```
You can use `"[@this].#(userType==\"...\")"` as the condition that determines whether a request should be routed to an ensemble of model A and model B, or to to the standalone model C.  


In [None]:
# Deploy the third inference service named "bike-lgbm-3"
!kubectl apply -f manifests/bike-lgbm-graph2.yaml

Expected output:

```text
inferenceservice.serving.kserve.io/bike-lgbm-3 created
```

In [None]:
# Make sure the all of the three inference services are ready
!kubectl -n kserve-inference get isvc bike-lgbm-1 bike-lgbm-2 bike-lgbm-3

Expected output:

```text
NAME          URL                                               READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                   AGE
NAME          URL                                               READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                   AGE
bike-lgbm-1   http://bike-lgbm-1.kserve-inference.example.com   True           100                              bike-lgbm-1-predictor-default-00001   132m
bike-lgbm-2   http://bike-lgbm-2.kserve-inference.example.com   True           100                              bike-lgbm-2-predictor-default-00001   132m
bike-lgbm-3   http://bike-lgbm-3.kserve-inference.example.com   True           100                              bike-lgbm-3-predictor-default-00001   22s
```


In [None]:
# Make sure there are three pods running for the three inference services, respectively
!kubectl -n kserve-inference get pods -l "serving.kserve.io/inferenceservice in (bike-lgbm-1,bike-lgbm-2,bike-lgbm-3)"

Expected output:

```text
NAME                                                              READY   STATUS    RESTARTS   AGE
bike-lgbm-1-predictor-default-00001-deployment-794547df56-48dhk   2/2     Running   0          132m
bike-lgbm-2-predictor-default-00001-deployment-cf7b449b5-rjc2q    2/2     Running   0          132m
bike-lgbm-3-predictor-default-00001-deployment-7684784979-zmmgw   2/2     Running   0          42s
```

In [None]:
# Deploy the second inference graph ("my-graph2")
!kubectl apply -f manifests/inference-graph2.yaml

Expected output:

```text
inferencegraph.serving.kserve.io/my-graph2 created
```

In [None]:
# Make sure the inference graph named "my-graph2" is ready
!kubectl -n kserve-inference get ig my-graph2

Expected output:

```text
NAME        URL                                             READY   AGE
my-graph2   http://my-graph2.kserve-inference.example.com   True    35s
```

In [None]:
# Also make sure there is one pod running for the "ensemble" inference graph
!kubectl -n kserve-inference get pods -l  serving.kserve.io/inferencegraph=my-graph2

Expected output:

```text
NAME                                          READY   STATUS    RESTARTS   AGE
my-graph2-00001-deployment-557678dfbd-zglcg   2/2     Running   0          49s
```

In [None]:
# Send some requests
from send_request import send_request
send_request(to_ig=True, ig_name="my-graph2", user_type="basic")

The response is expected to contain predictions from both "bike-lgbm-1" and "bike-lgbm-2" inference services.

Example output:

```text
{'bike-lgbm-v1': {'id': '03f3bc30-a628-418f-9539-2372f66d353f', 'model_name': 'bike-lgbm-1', 'model_version': None, 'outputs': [{'data': [51.00457318737209, 35.13687405851507], 'datatype': 'FP64', 'name': 'predict', 'parameters': None, 'shape': [2]}], 'parameters': None}, 'bike-lgbm-v2': {'id': 'a8a68d72-03b4-4b2b-91a9-3e5a08cbefbd', 'model_name': 'bike-lgbm-2', 'model_version': None, 'outputs': [{'data': [34.87125805456099, 32.68341881111533], 'datatype': 'FP64', 'name': 'predict', 'parameters': None, 'shape': [2]}], 'parameters': None}}
```

In [None]:
send_request(to_ig=True, ig_name="my-graph2", user_type="advanced")

The response should only have predictions from the "bike-lgbm-3" inference service.

Example output:
```text
{'model_name': 'bike-lgbm-3', 'model_version': None, 'id': '6bfde99b-e23a-47cd-8192-a1cdc0773c4a', 'parameters': None, 'outputs': [{'name': 'predict', 'shape': [2], 'datatype': 'FP64', 'parameters': None, 'data': [43.59313309843417, 32.17377957904267]}]}
```

In [None]:
send_request(to_ig=True, ig_name="my-graph2", user_type="random")

The request should be directly returned.

Example output:

```text
{'parameters': {'content_type': 'pd'}, 'inputs': [{'name': 'season', 'shape': [2], 'datatype': 'UINT64', 'data': [1, 1]}, {'name': 'holiday', 'shape': [2], 'datatype': 'UINT64', 'data': [0, 0]}, {'name': 'workingday', 'shape': [2], 'datatype': 'UINT64', 'data': [0, 0]}, {'name': 'weather', 'shape': [2], 'datatype': 'UINT64', 'data': [1, 1]}, {'name': 'temp', 'shape': [2], 'datatype': 'FP64', 'data': [9.84, 9.02]}, {'name': 'atemp', 'shape': [2], 'datatype': 'FP64', 'data': [14.395, 13.635]}, {'name': 'humidity', 'shape': [2], 'datatype': 'UINT64', 'data': [81, 80]}, {'name': 'windspeed', 'shape': [2], 'datatype': 'FP64', 'data': [0.0, 0.0]}, {'name': 'hour', 'shape': [2], 'datatype': 'UINT64', 'data': [0, 1]}, {'name': 'day', 'shape': [2], 'datatype': 'UINT64', 'data': [1, 1]}, {'name': 'month', 'shape': [2], 'datatype': 'UINT64', 'data': [1, 1]}], 'userType': 'random'}
```

Clean up

In [None]:
# Delete all of the three inference services
!kubectl -n kserve-inference delete isvc bike-lgbm-1 bike-lgbm-2 bike-lgbm-3

Expected output:
```text
inferenceservice.serving.kserve.io "bike-lgbm-1" deleted
inferenceservice.serving.kserve.io "bike-lgbm-2" deleted
inferenceservice.serving.kserve.io "bike-lgbm-3" deleted
```

In [None]:
# Delete all inference graphs
!kubectl -n kserve-inference delete ig my-graph1 my-graph2

Expected output:
```text
inferencegraph.serving.kserve.io "my-graph1" deleted
inferencegraph.serving.kserve.io "my-graph2" deleted
```

---
Please make sure you have the following files in your submission:
- This notebook (with up-to-date outputs of the code cells)
- model-settings.json 
- all YAML files listed in the "manifests" directory