# Deploy Huggingface Model to Serverless Sagemaker

- set the kernel to a pytorch kernel
- load a model from huggingface hub
- register the model

### Links

- https://huggingface.co/docs/sagemaker/inference#deploy-a-model-from-the-hub
- https://www.youtube.com/watch?v=l9QZuazbzWM


In [5]:
!pip install sagemaker --upgrade --quiet



In [8]:
import sagemaker
from sagemaker.huggingface.model import HuggingFaceModel

## 1. Using Sagemaker SDK
#### 1.1 Create a Model

In [12]:
# Hub model configuration <https://huggingface.co/models>
hub = {
  'HF_MODEL_ID':'distilbert-base-uncased-finetuned-sst-2-english', # model_id from hf.co/models
  'HF_TASK':'sentiment-analysis'                           # NLP task you want to use for predictions
}

role = sagemaker.get_execution_role()

huggingface_model = HuggingFaceModel(
   env=hub,                                                # configuration for loading model from Hub
   role=role,                                              # IAM role with permissions to create an endpoint
   transformers_version="4.6",                             # Transformers version used
   pytorch_version="1.7",                                  # PyTorch version used
   py_version='py36',                                      # Python version used
)

#### 1.2 Deploy a Model and call it.

In [13]:
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.xlarge",
    serverless_inference_config={
        "MemorySizeInMB":1024,
        "MaxConcurrency":10
    }
)

-----!

In [14]:
%%time

data = {
   "inputs": "I really like this place!"
}
predictor.predict(data)

CPU times: user 8.67 ms, sys: 3.47 ms, total: 12.1 ms
Wall time: 166 ms


[{'label': 'POSITIVE', 'score': 0.9998599290847778}]

#### 1.3 Remove the serverless sagemaker endpoint

In [21]:
!aws sagemaker delete-endpoint --endpoint-name $predictor.endpoint_name

## 2. Huggingface on Serverless Sagemaker deployed by Serverless.com

- Using sagemaker sdk you can deploy an endpoint, but you can only reach a sagemaker endpoint from within an AWS account.
- Typically you want to expose your model to the outside world, another network
- You want to deploy yoouor models from IAC.

#### 2.1 create a sagemaker moodel

In [25]:
# create a sagemaker model
# https://github.com/aws/sagemaker-python-sdk/blob/d635faff4ac54f80465f7bc7f3181f67336e249a/src/sagemaker/model.py#L261
# Maybe not the best way to create a sagemaker model, but i didn't found a better way.

huggingface_model._create_sagemaker_model(instance_type="ml.m5.xlarge", accelerator_type=None, tags=None)
sagemaker_model_name = huggingface_model.name
sagemaker_model_name

#### 2.2 create a lambda to call the sagemaker endpoint

In [36]:
%%writefile lambda_handler.py
import json
import boto3
import os

runtime_client = boto3.client("runtime.sagemaker")
sagemaker_endpoint_name = os.environ["SAGEMAKER_ENDPOINT_NAME"]

def handler(event, context):

    data = json.loads(event["body"])["data"]
    print(f"making a prediction on the text: {data}")
    
    response = runtime_client.invoke_endpoint(
        Body=json.dumps(data),
        EndpointName=sagemaker_endpoint_name,
        Accept="application/json",
        ContentType="application/json",
    )
    
    print(f"prediction: {response}")

    return {
        'statusCode': 200,
        'body': json.dumps(prediction)
    }

Overwriting lambda_handler.py


#### 2.3 create a serverless.yml file

In [43]:
# function to write variables to a textfile
# https://github.com/ipython/ipython/issues/6701#issuecomment-382640776
from IPython.core.magic import register_line_cell_magic

@register_line_cell_magic
def writetemplate(line, cell):
    with open(line, 'w') as f:
        f.write(cell.format(**globals()))

In [44]:
%%writetemplate serverless.yml
service: huggingface-on-serverless-sagemaker

provider:
  name: aws
  region: eu-west-1 

functions:
  huggingface:
    handler: lambda_handler.py
    timeout: 120
    provisionedConcurrency: 1
    memorySize: 128 
    events:
      - http:
          path: prediction
          method: post
    environment:
      SAGEMAKER_ENDPOINT_NAME: !GetAtt SageMakerEndpoint.EndpointName

resources:
  Resources:
    SageMakerEndpointConfig:
      Type: AWS::SageMaker::EndpointConfig
      Properties:
        ProductionVariants:
          - ModelName: {sagemaker_model_name}
            VariantName: SageMakerModel
            ServerlessConfig:
              MaxConcurrency: 1
              MemorySizeInMB: 1024

    SageMakerEndpoint:
      Type: AWS::SageMaker::Endpoint
      Properties:
        EndpointConfigName: !GetAtt SageMakerEndpointConfig.EndpointConfigName
        EndpointName: huggingface-serverless-sagemaker-endpoint


In [24]:
# huggingface_model.register(content_types=["application/json"], response_types=["application/json"], inference_instances=["ml.m5.xlarge"], transform_instances=["ml.m5.xlarge"])

ValueError: SageMaker Model Package cannot be created without model data.

In [18]:
#  huggingface_model._create_sagemaker_model()

INFO:sagemaker:Creating model with name: huggingface-pytorch-inference-2022-01-30-13-16-40-093


In [19]:
#  huggingface_model.prepare_container_def()

{'Image': '763104351884.dkr.ecr.eu-west-1.amazonaws.com/huggingface-pytorch-inference:1.8.1-transformers4.6-cpu-py36-ubuntu18.04',
 'Environment': {'HF_MODEL_ID': 'distilbert-base-uncased-finetuned-sst-2-english',
  'HF_TASK': 'sentiment-analysis',
  'SAGEMAKER_PROGRAM': '',
  'SAGEMAKER_SUBMIT_DIRECTORY': '',
  'SAGEMAKER_CONTAINER_LOG_LEVEL': '20',
  'SAGEMAKER_REGION': 'eu-west-1'}}