# Deploy Huggingface Model to Serverless Sagemaker

- set the kernel to a pytorch kernel
- load a model from huggingface hub
- register the model

### Links

- https://huggingface.co/docs/sagemaker/inference#deploy-a-model-from-the-hub
- https://www.youtube.com/watch?v=l9QZuazbzWM


In [2]:
!pip install sagemaker transformers --upgrade --quiet



In [3]:
import sagemaker
from transformers import pipeline
import torch
from sagemaker.huggingface.model import HuggingFaceModel

## 1. Using Sagemaker SDK
#### 1.1 Create a Model

In [4]:
pretrained_classifier = pipeline("sentiment-analysis")
pretrained_classifier.save_pretrained("./model/")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


In [5]:
! cd model/ && tar zcvf model.tar.gz *

config.json
model.tar.gz
pytorch_model.bin
special_tokens_map.json
tokenizer.json
tokenizer_config.json
vocab.txt


In [6]:
sess = sagemaker.Session()
default_bucket = sess.default_bucket()
model_path = f"s3://{default_bucket}/sagemaker-studio/huggingface-on-serverless-sagemaker/distilbert-base-uncased-finetuned-sst-2-english/model.tar.gz"

In [7]:
!aws s3 cp model/model.tar.gz $model_path

upload: model/model.tar.gz to s3://sagemaker-eu-west-1-077590795309/sagemaker-studio/huggingface-on-serverless-sagemaker/distilbert-base-uncased-finetuned-sst-2-english/model.tar.gz


In [8]:
# Hub model configuration <https://huggingface.co/models>

role = sagemaker.get_execution_role()

huggingface_model = HuggingFaceModel(
    model_data=model_path,  # path to your trained SageMaker model                                                # configuration for loading model from Hub
    role=role,                                              # IAM role with permissions to create an endpoint
    transformers_version="4.12",                             # Transformers version used
    pytorch_version="1.9",                                  # PyTorch version used
    py_version='py38',                                      # Python version used
)

#### 1.2 Deploy a Model and call it.

In [9]:
from sagemaker.serverless.serverless_inference_config import ServerlessInferenceConfig

predictor = huggingface_model.deploy(
#     initial_instance_count=1,
    instance_type="ml.m5.xlarge",
    serverless_inference_config=ServerlessInferenceConfig(memory_size_in_mb=1024, max_concurrency=5)
)

-----!

In [10]:
%%time

data = {
   "inputs": "I really like this place!"
}
predictor.predict(data)

CPU times: user 17.4 ms, sys: 0 ns, total: 17.4 ms
Wall time: 10.7 s


[{'label': 'POSITIVE', 'score': 0.9998599290847778}]

In [11]:
predictor.endpoint_name

'huggingface-pytorch-inference-2022-04-25-14-11-24-594'

#### 1.3 Remove the serverless sagemaker endpoint

In [12]:
!aws sagemaker delete-endpoint --endpoint-name $predictor.endpoint_name

## 2. Huggingface on Serverless Sagemaker deployed by Serverless.com

- Using sagemaker sdk you can deploy an endpoint, but you can only reach a sagemaker endpoint from within an AWS account.
- Typically you want to expose your model to the outside world, another network
- You want to deploy yoouor models from IAC.

#### 2.1 create a sagemaker moodel

In [13]:
# create a sagemaker model
# https://github.com/aws/sagemaker-python-sdk/blob/d635faff4ac54f80465f7bc7f3181f67336e249a/src/sagemaker/model.py#L261
# Maybe not the best way to create a sagemaker model, but i didn't found a better way.

huggingface_model._create_sagemaker_model(instance_type="ml.m5.xlarge", accelerator_type=None, tags=None)
sagemaker_model_name = huggingface_model.name
sagemaker_model_name

'huggingface-pytorch-inference-2022-04-25-14-18-11-101'

#### 2.2 create a lambda to call the sagemaker endpoint

In [17]:
%%writefile lambda_handler.py
import json
import boto3
import os

runtime_client = boto3.client("runtime.sagemaker")
sagemaker_endpoint_name = os.environ["SAGEMAKER_ENDPOINT_NAME"]

def handler(event, context):

    print(f"making a prediction on the text: {event['body']}")
    
    response = runtime_client.invoke_endpoint(
        Body=event["body"],
        EndpointName=sagemaker_endpoint_name,
        Accept="application/json",
        ContentType="application/json",
    )
    
    print(f"prediction: {response}")

    return {
        'statusCode': 200,
        'body': json.dumps(prediction)
    }


Writing lambda_handler.py


#### 2.3 create a serverless.yml file

In [18]:
# function to write variables to a textfile
# https://github.com/ipython/ipython/issues/6701#issuecomment-382640776
from IPython.core.magic import register_line_cell_magic

@register_line_cell_magic
def writetemplate(line, cell):
    with open(line, 'w') as f:
        f.write(cell.format(**globals()))

In [20]:
%%writetemplate serverless.yml
service: huggingface-on-serverless-sagemaker

provider:
  name: aws
  region: eu-west-1 
  runtime: python3.8
  iam:
    role:
      managedPolicies: arn:aws:iam::aws:policy/AdministratorAccess


functions:
  huggingface:
    handler: lambda_handler.handler
    timeout: 120
    memorySize: 128 
    events:
      - http:
          path: prediction
          method: post
    environment:
      SAGEMAKER_ENDPOINT_NAME: !GetAtt SageMakerEndpoint.EndpointName

resources:
  Resources:
    SageMakerEndpointConfig:
      Type: AWS::SageMaker::EndpointConfig
      Properties:
        ProductionVariants:
          - ModelName: {sagemaker_model_name}
#             InitialInstanceCount: 1
            InitialVariantWeight: 1.0
            VariantName: SageMakerModel
            ServerlessConfig:
              MaxConcurrency: 1
              MemorySizeInMB: 4096

    SageMakerEndpoint:
      Type: AWS::SageMaker::Endpoint
      Properties:
        EndpointConfigName: !GetAtt SageMakerEndpointConfig.EndpointConfigName
        EndpointName: huggingface-serverless-sagemaker-endpoint


#### 2.4 open an AWS cloud shell and deploy the application

```
git clone https://github.com/vincentclaes/huggingface-on-serverless-sagemaker.git

cd huggingface-on-serverless-sagemaker/

npm install serverless

/home/cloudshell-user/node_modules/serverless/bin/serverless.js deploy
```

#### 2.6 Call the Endpoint

In [71]:
%%time
!curl -d '{"inputs":"some very much wow positive text!"}' -H "Content-Type: application/json" -X POST  https://2kqraqs038.execute-api.eu-west-1.amazonaws.com/dev/prediction


{"message": "Endpoint request timed out"}CPU times: user 449 ms, sys: 111 ms, total: 560 ms
Wall time: 29.3 s


#### 2.5 remove the stack

```
/home/cloudshell-user/node_modules/serverless/bin/serverless.js remove
```

In [51]:
import boto3
data = '{"inputs":"some very much wow positive text!"}'

response = boto3.client("runtime.sagemaker").invoke_endpoint(
        Body=data,
#         EndpointName="huggingface-serverless-sagemaker-endpoint",
        EndpointName="huggingface-pytorch-inference-2022-02-01-13-58-48-937",
        Accept="application/json",
        ContentType="application/json",
    )

InternalDependencyException: An error occurred (InternalDependencyException) when calling the InvokeEndpoint operation: An exception occurred from internal dependency. Please contact customer support regarding request 4bfc2335-f7b9-450b-ac5f-fe09c4866de4.

In [None]:
response