# Deploy Huggingface Model to Serverless Sagemaker

- set the kernel to a pytorch kernel
- load a model from huggingface hub
- register the model

### Links

- https://huggingface.co/docs/sagemaker/inference#deploy-a-model-from-the-hub
- https://www.youtube.com/watch?v=l9QZuazbzWM


In [5]:
!pip install sagemaker --upgrade --quiet



In [8]:
import sagemaker
from sagemaker.huggingface.model import HuggingFaceModel

## 1. Using Sagemaker SDK
#### 1.1 Create a Model

In [53]:
# Hub model configuration <https://huggingface.co/models>
hub = {
  'HF_MODEL_ID':'distilbert-base-uncased-finetuned-sst-2-english', # model_id from hf.co/models
  'HF_TASK':'sentiment-analysis'                           # NLP task you want to use for predictions
}

role = sagemaker.get_execution_role()

huggingface_model = HuggingFaceModel(
   env=hub,                                                # configuration for loading model from Hub
   role=role,                                              # IAM role with permissions to create an endpoint
   transformers_version="4.6",                             # Transformers version used
   pytorch_version="1.7",                                  # PyTorch version used
   py_version='py36',                                      # Python version used
)

#### 1.2 Deploy a Model and call it.

In [54]:
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.xlarge",
    serverless_inference_config={
        "MemorySizeInMB":1024,
        "MaxConcurrency":10
    }
)

----!

In [65]:
%%time

data = {
   "inputs": "I really like this place!"
}
predictor.predict(data)

CPU times: user 7.53 ms, sys: 10.6 ms, total: 18.1 ms
Wall time: 113 ms


[{'label': 'POSITIVE', 'score': 0.9998599290847778}]

#### 1.3 Remove the serverless sagemaker endpoint

In [67]:
!aws sagemaker delete-endpoint --endpoint-name $predictor.endpoint_name

## 2. Huggingface on Serverless Sagemaker deployed by Serverless.com

- Using sagemaker sdk you can deploy an endpoint, but you can only reach a sagemaker endpoint from within an AWS account.
- Typically you want to expose your model to the outside world, another network
- You want to deploy yoouor models from IAC.

#### 2.1 create a sagemaker moodel

In [25]:
# create a sagemaker model
# https://github.com/aws/sagemaker-python-sdk/blob/d635faff4ac54f80465f7bc7f3181f67336e249a/src/sagemaker/model.py#L261
# Maybe not the best way to create a sagemaker model, but i didn't found a better way.

huggingface_model._create_sagemaker_model(instance_type="ml.m5.xlarge", accelerator_type=None, tags=None)
sagemaker_model_name = huggingface_model.name
sagemaker_model_name

#### 2.2 create a lambda to call the sagemaker endpoint

In [63]:
%%writefile lambda_handler.py
import json
import boto3
import os

runtime_client = boto3.client("runtime.sagemaker")
sagemaker_endpoint_name = os.environ["SAGEMAKER_ENDPOINT_NAME"]

def handler(event, context):

    print(f"making a prediction on the text: {event['body']}")
    
    response = runtime_client.invoke_endpoint(
        Body=event["body"],
        EndpointName=sagemaker_endpoint_name,
        Accept="application/json",
        ContentType="application/json",
    )
    
    print(f"prediction: {response}")

    return {
        'statusCode': 200,
        'body': json.dumps(prediction)
    }


Overwriting lambda_handler.py


#### 2.3 create a serverless.yml file

In [43]:
# function to write variables to a textfile
# https://github.com/ipython/ipython/issues/6701#issuecomment-382640776
from IPython.core.magic import register_line_cell_magic

@register_line_cell_magic
def writetemplate(line, cell):
    with open(line, 'w') as f:
        f.write(cell.format(**globals()))

In [70]:
%%writetemplate serverless.yml
service: huggingface-on-serverless-sagemaker

provider:
  name: aws
  region: eu-west-1 
  runtime: python3.8
  iam:
    role:
      managedPolicies: arn:aws:iam::aws:policy/AdministratorAccess


functions:
  huggingface:
    handler: lambda_handler.handler
    timeout: 120
    memorySize: 128 
    events:
      - http:
          path: prediction
          method: post
    environment:
      SAGEMAKER_ENDPOINT_NAME: !GetAtt SageMakerEndpoint.EndpointName

resources:
  Resources:
    SageMakerEndpointConfig:
      Type: AWS::SageMaker::EndpointConfig
      Properties:
        ProductionVariants:
          - ModelName: {sagemaker_model_name}
            InitialVariantWeight: 1.0
            VariantName: SageMakerModel
            ServerlessConfig:
              MaxConcurrency: 1
              MemorySizeInMB: 1024

    SageMakerEndpoint:
      Type: AWS::SageMaker::Endpoint
      Properties:
        EndpointConfigName: !GetAtt SageMakerEndpointConfig.EndpointConfigName
        EndpointName: huggingface-serverless-sagemaker-endpoint


#### 2.4 open an AWS cloud shell and deploy the application

```
git clone https://github.com/vincentclaes/huggingface-on-serverless-sagemaker.git

cd huggingface-on-serverless-sagemaker/

npm install serverless

/home/cloudshell-user/node_modules/serverless/bin/serverless.js deploy
```

#### 2.6 Call the Endpoint

In [66]:
%%time
!curl -d '{"inputs":"some very much wow positive text!"}' -H "Content-Type: application/json" -X POST  https://2kqraqs038.execute-api.eu-west-1.amazonaws.com/dev/prediction


{"message": "Endpoint request timed out"}CPU times: user 451 ms, sys: 96.2 ms, total: 547 ms
Wall time: 29.3 s


#### 2.5 remove the stack

```
/home/cloudshell-user/node_modules/serverless/bin/serverless.js remove
```