# XGBoost on SageMaker Serverless Inference

Got inspiration from this notebook:

https://github.com/aws/amazon-sagemaker-examples/blob/master/serverless-inference/Serverless-Inference-Walkthrough.ipynb


In [19]:
!pip install sagemaker botocore boto3 awscli --upgrade --quiet

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
aiobotocore 2.0.1 requires botocore<1.22.9,>=1.22.8, but you have botocore 1.23.48 which is incompatible.[0m
You should consider upgrading via the '/opt/conda/bin/python -m pip install --upgrade pip' command.[0m


In [20]:
# Setup clients
import boto3

client = boto3.client(service_name="sagemaker")
runtime = boto3.client(service_name="sagemaker-runtime")

### SageMaker Setup
To begin, we import the AWS SDK for Python (Boto3) and set up our environment, including an IAM role and an S3 bucket to store our data.

In [21]:
import boto3
import sagemaker
from sagemaker.estimator import Estimator

boto_session = boto3.session.Session()
region = boto_session.region_name
print(region)

sagemaker_session = sagemaker.Session()
base_job_prefix = "xgboost-example"
role = sagemaker.get_execution_role()
print(role)

default_bucket = sagemaker_session.default_bucket()
s3_prefix = base_job_prefix

training_instance_type = "ml.m5.xlarge"

eu-west-1
arn:aws:iam::077590795309:role/service-role/AmazonSageMaker-ExecutionRole-20191008T190827


Retrieve the Abalone dataset from a publicly hosted S3 bucket.

In [22]:
# retrieve data
! curl https://sagemaker-sample-files.s3.amazonaws.com/datasets/tabular/uci_abalone/train_csv/abalone_dataset1_train.csv > abalone_dataset1_train.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  131k  100  131k    0     0   228k      0 --:--:-- --:--:-- --:--:--  228k


Upload the Abalone dataset to the default S3 bucket.

In [23]:
# upload data to S3
!aws s3 cp abalone_dataset1_train.csv s3://{default_bucket}/xgboost-regression/train.csv

upload: ./abalone_dataset1_train.csv to s3://sagemaker-eu-west-1-077590795309/xgboost-regression/train.csv


## Model Training

In [24]:
from sagemaker.inputs import TrainingInput

training_path = f"s3://{default_bucket}/xgboost-regression/train.csv"
train_input = TrainingInput(training_path, content_type="text/csv")

In [25]:
model_path = f"s3://{default_bucket}/{s3_prefix}/xgb_model"

# retrieve xgboost image
image_uri = sagemaker.image_uris.retrieve(
    framework="xgboost",
    region=region,
    version="1.0-1",
    py_version="py3",
    instance_type=training_instance_type,
)

# Configure Training Estimator
xgb_train = Estimator(
    image_uri=image_uri,
    instance_type=training_instance_type,
    instance_count=1,
    output_path=model_path,
    sagemaker_session=sagemaker_session,
    role=role,
)

# Set Hyperparameters
xgb_train.set_hyperparameters(
    objective="reg:linear",
    num_round=50,
    max_depth=5,
    eta=0.2,
    gamma=4,
    min_child_weight=6,
    subsample=0.7,
    silent=0,
)

Train the model on the Abalone dataset.

In [26]:
# Fit model
xgb_train.fit({"train": train_input})

2022-02-03 20:28:39 Starting - Starting the training job...
2022-02-03 20:28:41 Starting - Launching requested ML instancesProfilerReport-1643920119: InProgress
...
2022-02-03 20:29:37 Starting - Preparing the instances for training.........
2022-02-03 20:30:55 Downloading - Downloading input data
2022-02-03 20:30:55 Training - Downloading the training image...
2022-02-03 20:31:37 Uploading - Uploading generated training model[34mINFO:sagemaker-containers:Imported framework sagemaker_xgboost_container.training[0m
[34mINFO:sagemaker-containers:Failed to parse hyperparameter objective value reg:linear to Json.[0m
[34mReturning the value itself[0m
[34mINFO:sagemaker-containers:No GPUs detected (normal if no gpus installed)[0m
[34mINFO:sagemaker_xgboost_container.training:Running XGBoost Sagemaker in algorithm mode[0m
[34mINFO:root:Determined delimiter of CSV input is ','[0m
[34mINFO:root:Determined delimiter of CSV input is ','[0m
[34m[20:31:33] 2923x8 matrix with 23384 ent

In [27]:
xgb_model = xgb_train.create_model()

In [28]:
xgb_model._create_sagemaker_model(instance_type="ml.m5.xlarge", accelerator_type=None, tags=None)

In [29]:
sagemaker_model = xgb_model.name

## Deployment

After training the model, retrieve the model artifacts so that we can deploy the model to an endpoint.

In [30]:
%%writefile lambda_handler.py
import json
import boto3
import os

runtime_client = boto3.client("runtime.sagemaker")
sagemaker_endpoint_name = os.environ["SAGEMAKER_ENDPOINT_NAME"]

def handler(event, context):
    
    body = json.loads(event["body"])
    data = body["data"]
    content_type = body["content_type"]
    
    print(f"making a prediction on the data: {data}")
    
    response = runtime_client.invoke_endpoint(
        EndpointName=sagemaker_endpoint_name,
        Body=data,
        ContentType=content_type,

    )
    prediction = response["Body"].read()

    
    print(f"prediction: {prediction}")
    return {
        'statusCode': 200,
        'body': prediction
    }


Overwriting lambda_handler.py


In [31]:
# function to write variables to a textfile
# https://github.com/ipython/ipython/issues/6701#issuecomment-382640776
from IPython.core.magic import register_line_cell_magic

@register_line_cell_magic
def writetemplate(line, cell):
    with open(line, 'w') as f:
        f.write(cell.format(**globals()))

In [39]:
%%writetemplate serverless.yml
service: huggingface-on-serverless-sagemaker

provider:
  name: aws
  region: eu-west-1 
  runtime: python3.8
  iam:
    role:
      managedPolicies: arn:aws:iam::aws:policy/AdministratorAccess


functions:
  huggingface:
    handler: lambda_handler.handler
    timeout: 120
    memorySize: 128 
    events:
      - http:
          path: prediction
          method: post
    environment:
      SAGEMAKER_ENDPOINT_NAME: !GetAtt SageMakerEndpoint.EndpointName

resources:
  Resources:
    SageMakerEndpointConfig:
      Type: AWS::SageMaker::EndpointConfig
      Properties:
        ProductionVariants:
          - ModelName: {sagemaker_model}
            InitialVariantWeight: 1.0
            VariantName: SageMakerModel
            ServerlessConfig:
              MaxConcurrency: 50
              MemorySizeInMB: 4096

    SageMakerEndpoint:
      Type: AWS::SageMaker::Endpoint
      Properties:
        EndpointConfigName: !GetAtt SageMakerEndpointConfig.EndpointConfigName
        EndpointName: huggingface-serverless-sagemaker-endpoint


#### 2.4 open an AWS cloud shell and deploy the application

```
git clone https://github.com/vincentclaes/xgboost-on-serverless-sagemaker.git

cd xgboost-on-serverless-sagemaker/

npm install serverless

/home/cloudshell-user/node_modules/serverless/bin/serverless.js deploy
```

# 3. Call Endpoint
## First call

In [33]:
%%time
!curl -d '{"data":".345,0.224414,.131102,0.042329,.279923,-0.110329,-0.099358,0.0", "content_type":"text/csv"}' -H "Content-Type: application/json" -X POST  https://bpaxi49zq2.execute-api.eu-west-1.amazonaws.com/dev/prediction


4.566554546356201CPU times: user 10.6 ms, sys: 3.27 ms, total: 13.9 ms
Wall time: 481 ms


## Burst of 10 000 (Serverless Sagemaker max concurrency = 1)
The min latency is 57 ms, the max is 4651 ms, and the average is 139 ms.
0 errors

In [None]:
%%time
from joblib import Parallel, delayed
import subprocess
import shlex
def process(i):
    print(".",)
    cmd = """curl -d '{"data":".345,0.224414,.131102,0.042329,.279923,-0.110329,-0.099358,0.0", "content_type":"text/csv"}' -H "Content-Type: application/json" -X POST  https://bpaxi49zq2.execute-api.eu-west-1.amazonaws.com/dev/prediction"""
    subprocess.check_call(shlex.split(cmd))
    

results = Parallel(n_jobs=100)(delayed(process)(i) for i in range(10_000))

__We set the concurrency to 1, which probably caused these bad results.__


![Screen Shot 2022-02-03 at 21.49.04.png](attachment:Screen Shot 2022-02-03 at 21.49.04.png)

## Burst of 10 000 (Serverless Sagemaker max concurrency = 50)

- Default max concurrency of a lambda function is 1000. 
- serverless sagemaker max concurrency is 50

In [None]:
results = Parallel(n_jobs=100)(delayed(process)(i) for i in range(10_000))

# 4. Remove the stack

```
/home/cloudshell-user/node_modules/serverless/bin/serverless.js remove
```