> **_NOTE:_**  **This script is supposed to be executed at SageMaker Notebook!**

## prerequesites
- We have setup an **SageMaker Notebook**, the **S3 bucket** to store the bindle, and config their permission

## Step 1
Use git to clone this branch to your SageMaker Notebook instance, and open this run.ipynb at your SageMaker Notebook

## Step 2
Wrap the handler folder to a tarball. And upload it to your S3 bucket.

In handler/neural_sparse_handler.py, we define the model loading, pre-process, inference and post-process. We use mixed-precision to accelerate the inference.

In handler/neural_sparse_config.yaml, we define some configs for the torch serve (include dynamic micro-batching)

> **_NOTE:_**  By default we deploy the opensearch-project/opensearch-neural-sparse-encoding-v1 model. To deploy other models, please change the model_id parameter at handler/neural_sparse_handler.py

In [None]:
# run this cell
!tar -czvf neural-sparse-handler.tar.gz -C handler/ .
!aws s3 cp neural-sparse-handler.tar.gz s3://{YOUR_BUCKET_PREFIX}/neural-sparse-handler.tar.gz

## Step 3
Use SageMaker python SDK to deploy the tarball on a real-time inference endpoint

Here we use ml.g5.xlarge. It's a GPU instance with good price-performance.

Please modify the region base according to your settings

In [None]:
# run this cell
import boto3
import sagemaker
from sagemaker.model import Model
from sagemaker.predictor import Predictor

role = sagemaker.get_execution_role()
sess = boto3.Session()
sm = sess.client("sagemaker")
region = sess.region_name
account = boto3.client("sts").get_caller_identity().get("Account")
smsess = sagemaker.Session(boto_session=sess)
envs = {
    "TS_ASYNC_LOGGING":"true",
    "TS_JOB_QUEUE_SIZE":"1000"
}

baseimage = sagemaker.image_uris.retrieve(
    framework="pytorch",
    region='us-east-1',
    py_version="py312",
    image_scope="inference",
    version="2.6",
    instance_type="ml.g6.xlarge",
)

model = Model(model_data = "s3://{YOUR_BUCKET_PREFIX}/neural-sparse-handler.tar.gz",
    image_uri = baseimage,
    role = role,
    predictor_cls = Predictor,
    name = "ns-handler",
    sagemaker_session = smsess,
    env=envs
)

from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer
endpoint_name = "ns-handler"
predictor = model.deploy(instance_type='ml.g5.xlarge',
                         initial_instance_count=1,
                         endpoint_name = endpoint_name,
                         serializer=JSONSerializer(),
                         deserializer=JSONDeserializer(),
                        ModelDataDownloadTimeoutInSeconds=3600,
                        ContainerStartupHealthCheckTimeoutInSeconds=3600,
                        VolumeSizeInGB=64)
predictor.endpoint_name

## Step 4

After we create the endpoint, use some sample request to see how it works

In [None]:
# run this cell
import json

body = ["hello world"]
amz = boto3.client('sagemaker-runtime')

response = amz.invoke_endpoint(
    EndpointName=predictor.endpoint_name,
    Body=json.dumps(body),
    ContentType="application/json"
)

res = response['Body'].read()
results = json.loads(res.decode("utf8"))
results

## Step 5
> **_NOTE:_**  **This step is supposed to be executed at an instance have access to OpenSearch cluster!**

Register this SageMaker endpoint at your OpenSearch cluster

Please check the OpenSearch doc for more information. Here we provide one demo request body using access_key and secret_key. Please choose the authentication according to your use case.

### create connector

(Please fill the predictor.endpoint_name at the url)
```json
{
  "name": "test",
  "description": "Test connector for Sagemaker model",
  "version": 1,
  "protocol": "aws_sigv4",
  "credential": {
    "access_key": "your access key",
    "secret_key": "your secret key"
  },
  "parameters": {
    "region": "{region}",
    "service_name": "sagemaker",
    "input_docs_processed_step_size": 2,
  },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
      "headers": {
        "content-type": "application/json"
      },
      "url": "https://runtime.sagemaker.{region}.amazonaws.com/endpoints/{predictor.endpoint_name}/invocations",
      "request_body": "${parameters.input}"
    }
  ],
  "client_config":{
      "max_retry_times": -1,
      "max_connection": 60,
      "retry_backoff_millis": 10
  }
}
```

### register model
```json
{
  "name": "test",
  "function_name": "remote",
  "version": "1.0.0",
  "connector_id": "{connector id}",
  "description": "Test connector for Sagemaker model"
}
```