# Hosting LightGBM Models using SageMaker Inference

In this notebook, we will train a binary classifier on Breast Cancer detection problem, then use SageMaker Realtime Endpoint and Serverless Inference to deploy the model.

> It's highly recommended to create a python virtual env to run the notebook, and install python packages on `requirements.txt`.

## Train a binary classifier

In [1]:
import numpy as np 
import pandas as pd

import warnings

from sklearn.model_selection import train_test_split


warnings.filterwarnings("ignore")

### Dataset

We will use [Breast Cancer Prediction Dataset](https://www.kaggle.com/datasets/merishnasuwal/breast-cancer-prediction-dataset). You may sign in Kaggle so as to download the data set to local. By default, the dataset csv file is provided under `data` folder - [./data/Breast_cancer_data.csv](./data/Breast_cancer_data.csv).

In [2]:
# read data
df = pd.read_csv('./data/Breast_cancer_data.csv')
df.head()

Unnamed: 0,mean_radius,mean_texture,mean_perimeter,mean_area,mean_smoothness,diagnosis
0,17.99,10.38,122.8,1001.0,0.1184,0
1,20.57,17.77,132.9,1326.0,0.08474,0
2,19.69,21.25,130.0,1203.0,0.1096,0
3,11.42,20.38,77.58,386.1,0.1425,0
4,20.29,14.34,135.1,1297.0,0.1003,0


In [3]:
df['diagnosis'].value_counts()

diagnosis
1    357
0    212
Name: count, dtype: int64

* The target variable is diagnosis. It contains 2 values - 0 and 1.
* 0 is for Negative prediction and 1 for Positive prediction.

split the training and test dataset.

In [4]:
X = df[['mean_radius','mean_texture','mean_perimeter','mean_area','mean_smoothness']]
y = df['diagnosis']

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

### Model training

Training a classifier using LightGBM with default setting.

In [6]:
import lightgbm as lgb
clf = lgb.LGBMClassifier()
clf.fit(X_train, y_train)

[LightGBM] [Info] Number of positive: 290, number of negative: 165
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000173 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 760
[LightGBM] [Info] Number of data points in the train set: 455, number of used features: 5
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.637363 -> initscore=0.563935
[LightGBM] [Info] Start training from score 0.563935


### Model Prediction

To verify how the model perform on the testing dataset.

In [7]:
y_pred = clf.predict(X_test)

accuracy score is being used for simplicity for model evaluation. In reality, you may consider using F1 score, or AUC-ROC metric per business metric.

In [8]:
from sklearn.metrics import accuracy_score

def meansure_accuracy_score(y_pred, y_test):
    accuracy = accuracy_score(y_pred, y_test)   
    print(f"LightGBM model accuracy score: {accuracy:0.4f}")

meansure_accuracy_score(y_pred, y_test)

LightGBM model accuracy score: 0.9298


### Save the trained model

In [9]:
!mkdir model

mkdir: model: File exists


In [10]:
model_file_name = "demo-breast-cancer-classifier-model"
model_path = f'./model/{model_file_name}'
clf.booster_.save_model(model_path, num_iteration=clf._best_iteration)

# [alternative option] save model using joblib.

# import joblib
# joblib.dump(clf, model_file_name)

<lightgbm.basic.Booster at 0x169fe2550>

Load the model and run testing result. 

Please note that booster setting is actually getting the probability and may use numpy function to convert the result to be `1` or `0`.

In [11]:
loaded_model = lgb.Booster(model_file=model_path)

result = loaded_model.predict(X_test)
y_pred2 = np.where(result > 0.5, 1, 0)

meansure_accuracy_score(y_pred2, y_test)


LightGBM model accuracy score: 0.9298


create model artifact file using `tar.gz` format.

In [12]:
!tar czvf ./model/model.tar.gz -C ./model $model_file_name 

a demo-breast-cancer-classifier-model


## Model deployment

In [13]:
from dotenv import load_dotenv
import os

# using override so as to load changed variables
load_dotenv(override=True)

True

In [None]:
import sagemaker

session = sagemaker.Session()
bucket = session.default_bucket()
prefix = "lightgbm-demo"

model_uri = sagemaker.s3.S3Uploader.upload("./model/model.tar.gz", f"s3://{bucket}/{prefix}/model.tar.gz")

model_uri

### Using SageMaker Pre-built Container

In [15]:
from time import gmtime, strftime
from sagemaker.model import Model
from sagemaker.predictor import Predictor
import os

from sagemaker import image_uris, script_uris

model_id, model_version = "lightgbm-classification-model", "*"
inference_instance_type = "ml.m5.xlarge"

# Retrieve the inference docker container uri
deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,
    image_scope="inference",
    model_id=model_id,
    model_version=model_version,
    instance_type=inference_instance_type,
)
# Retrieve the inference script uri
deploy_source_uri = script_uris.retrieve(
    model_id=model_id, model_version=model_version, script_scope="inference"
)

print(f"deploy_image_uri: {deploy_image_uri}")
print(f"deploy_source_uri: {deploy_source_uri}")

Using model 'lightgbm-classification-model' with wildcard version identifier '*'. You can pin to version '2.1.0' for more stable results. Note that models may have different input/output signatures after a major version upgrade.


deploy_image_uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.0.1-cpu-py310
deploy_source_uri: s3://jumpstart-cache-prod-us-east-1/source-directory-tarballs/lightgbm/inference/classification/v1.2.2/sourcedir.tar.gz


In [17]:
model_name = 'js-prebuilt-lgb-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
role = os.environ['IAM_ROLE']

js_prebuilt_one_model = Model(
    image_uri=deploy_image_uri,
    model_data=model_uri,
    role=role,
    name=model_name,
    sagemaker_session=session,
    predictor_cls=Predictor
)

endpoint_name = "js-distr-lgb-" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())

js_prebuilt_one_predictor = js_prebuilt_one_model.deploy(
    instance_type=inference_instance_type,
    initial_instance_count=1,
    endpoint_name=endpoint_name,
    entry_point="inference.py",
    source_dir=deploy_source_uri,
    wait=True
)

-------!

Test the deployment

In [18]:
import boto3
runtime_client = boto3.client("runtime.sagemaker")

def predict(endpoint_name, payload):

    response = runtime_client.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType='text/csv',
        Body=payload,
    )
    result = response["Body"].read().decode()

    result = np.fromstring(result[1:-1], dtype=np.float64, sep=',' )

    predictions = np.where(result > 0.5, 1, 0)

    return predictions



In [21]:
# realtime endpoint inference
payload = X_test.iloc[0:1].to_csv(header=False, index=False, sep=",")

print(f'Using {endpoint_name}')
y_pred3 = predict(endpoint_name, payload)
print(y_pred3)
# meansure_accuracy_score(y_pred3, y_test)

Using js-distr-lgb-2024-09-10-20-28-19


ReadTimeoutError: Read timeout on endpoint URL: "https://runtime.sagemaker.us-east-1.amazonaws.com/endpoints/js-distr-lgb-2024-09-10-20-28-19/invocations"

### Using 'Bring Your Own Container'

Please refer to [README.md](./README.md) to build the docker container and push it to ECR repository. Then, follow up the below steps for deployment.

Create a SageMaker Model entity so as to deploy model.

In [None]:

model_name = model_file_name + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
role = os.environ['IAM_ROLE']
image_uri = os.environ['INFERENCE_IMAGE_URI']

model = Model(
    image_uri=image_uri,
    model_data=model_uri,
    role=role,
    name=model_name,
    sagemaker_session=session,
)


#### SageMaker Realtime Inference Deployment

In [None]:
endpoint_name = "demo-breast-cancer-classifier-" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())

inference_instance_type = 'ml.m5.xlarge'

model.deploy(
    instance_type=inference_instance_type,
    initial_instance_count=1,
    endpoint_name=endpoint_name,
    wait=True
)

#### Server endpoint deployment

In [None]:
from sagemaker.serverless.serverless_inference_config import ServerlessInferenceConfig

serverless_config = ServerlessInferenceConfig(
    memory_size_in_mb=4096,
    max_concurrency=10,
)

serverless_endpoint_name = "demo-breast-cancer-classifier-serverless-" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())

model.deploy(
    endpoint_name=serverless_endpoint_name,
    serverless_inference_config=serverless_config,
    wait=True
)

### Invoke endpoint and evaluate

In [None]:
payload = X_test.to_csv(header=False, index=False, sep=",")

In [None]:
# realtime endpoint inference
print(f'Using {endpoint_name}')
y_pred3 = predict(endpoint_name, payload)
meansure_accuracy_score(y_pred3, y_test)


In [None]:
# serverless inference
print(f'Using {serverless_endpoint_name}')
y_pred3 = predict(serverless_endpoint_name, payload)
meansure_accuracy_score(y_pred3, y_test)


#### Clean-up

In [None]:
import boto3 

sagemaker_client = boto3.client('sagemaker')

sagemaker_client.delete_endpoint_config(EndpointConfigName=endpoint_name)
sagemaker_client.delete_endpoint(EndpointName=endpoint_name)


In [None]:
sagemaker_client.delete_endpoint_config(EndpointConfigName=serverless_endpoint_name)
sagemaker_client.delete_endpoint(EndpointName=serverless_endpoint_name)

### Reference

Per csv content type, parsing the inference request data, which is being used at the inference code.

In [None]:
ba = bytearray(b'13.4,20.52,88.64,556.7,0.1106\n13.21,25.25,84.1,537.9,0.08791\n')
arr = ba.decode().split()
rows = list()
for row in arr:
    rows.append(np.fromstring(row, dtype=np.float64, sep=',' ))

# rows
data = np.vstack(rows)

data.shape