# SageMaker Example

## 1. Create your container repository

open aws console and create a repository for your container: https://us-west-2.console.aws.amazon.com/ecr/create-repository?region=us-west-2

for example `236995464743.dkr.ecr.us-west-2.amazonaws.com/sagemaker_endpoint/vllm`

In [1]:
# login
!aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 236995464743.dkr.ecr.us-west-2.amazonaws.com

VLLM_VERSION = "v0.6.0"
REPO_NAME = "sagemaker_endpoint/vllm"
CONTAINER = f"236995464743.dkr.ecr.us-west-2.amazonaws.com/{REPO_NAME}:{VLLM_VERSION}"

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded


## 2. Build the container

demo codes are in `app/`
build and push the docker with following commands:

In [2]:
!docker build --build-arg VLLM_VERSION={VLLM_VERSION} -t {REPO_NAME}:{VLLM_VERSION} .
!docker tag {REPO_NAME}:{VLLM_VERSION} {CONTAINER}
!docker push {CONTAINER}

Sending build context to Docker daemon  109.1kB
Step 1/9 : ARG VLLM_VERSION
Step 2/9 : FROM vllm/vllm-openai:$VLLM_VERSION
 ---> 714424fc682c
Step 3/9 : WORKDIR /app
 ---> Using cache
 ---> cdbc00b24d70
Step 4/9 : COPY app/ /app
 ---> Using cache
 ---> 161d806cbc9c
Step 5/9 : RUN export PYTHON_SITEPACKAGES=`python3 -c "import site; print(site.getsitepackages()[0])"`; sed -i '/if __name__ == "__main__":/i\@router.get("/ping")\nasync def ping() -> Response:\n\    return await health()\n\nfrom typing import Union\n@router.post("/invocations")\nasync def invocations(request: Union[ChatCompletionRequest, CompletionRequest],\n\                                 raw_request: Request):\n\    if isinstance(request, ChatCompletionRequest):\n\        return await create_chat_completion(request, raw_request)\n\    elif isinstance(request, CompletionRequest):\n\        return await create_completion(request, raw_request)\n\    else:\n\        return JSONResponse("unknow request paras",\n\            

## 3. Deploy on SageMaker

define the model and deploy on SageMaker


### 3.1 Init SageMaker session

In [3]:
!pip install boto3 sagemaker transformers
import re
import json

import boto3
import sagemaker
from sagemaker import Model

sess = sagemaker.Session()
role = sagemaker.get_execution_role()

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


### 3.2 Prepare model file: deploy vllm by scripts

In [4]:
!echo the entrypoint of the endpoint is "start.sh"
!echo ====================================================
!cat vllm_by_scripts/start.sh
!echo ====================================================

!rm vllm_by_scripts.tar.gz
!tar czvf vllm_by_scripts.tar.gz vllm_by_scripts/

model_name = "deepseek-coder-6-7b-instruct"

s3_code_prefix = f"sagemaker_endpoint/vllm/{model_name}"
bucket = sess.default_bucket() 
code_artifact = sess.upload_data("vllm_by_scripts.tar.gz", bucket, s3_code_prefix)
print(f"S3 Code or Model tar ball uploaded to --- > {code_artifact}")

the entrypoint of the endpoint is start.sh
#!/bin/bash

# port needs to be $SAGEMAKER_BIND_TO_PORT

python3 -m vllm.entrypoints.openai.api_server \
    --port $SAGEMAKER_BIND_TO_PORT \
    --trust-remote-code \
    --max-model-len 8192 \
    --model deepseek-ai/deepseek-coder-6.7b-instruct
vllm_by_scripts/
vllm_by_scripts/start.sh
vllm_by_scripts/.ipynb_checkpoints/
vllm_by_scripts/.ipynb_checkpoints/start-checkpoint.sh
S3 Code or Model tar ball uploaded to --- > s3://sagemaker-us-west-2-236995464743/sagemaker_endpoint/vllm/deepseek-coder-6-7b-instruct/vllm_by_scripts.tar.gz


### 3.3 Deploy model

In [5]:
model = Model(
    name=model_name,
    model_data=code_artifact,
    image_uri=CONTAINER,
    role=role,
)

# 部署模型到endpoint
endpoint_name = sagemaker.utils.name_from_base(model_name)
print(f"endpoint_name: {endpoint_name}")
predictor = model.deploy(
    initial_instance_count=1,
    instance_type='ml.g5.2xlarge',
    endpoint_name=endpoint_name,
)

endpoint_name: deepseek-coder-6-7b-instruct-2024-09-06-04-25-09-527


Using already existing model: deepseek-coder-6-7b-instruct


--------------!

## 4. Test

you can invoke your model with SageMaker SDK

### 4.1 Message api non-stream mode

In [6]:
runtime = boto3.client('runtime.sagemaker')

payload = {
    # "model": "deepseek-ai/deepseek-coder-6.7b-base",
    "messages": [
    {
        "role": "user",
        "content": "Write a quick sort in python"
    }
    ],
    "max_tokens": 1024,
    "stream": False
}
response = runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType='application/json',
    Body=json.dumps(payload)
)

print(json.loads(response['Body'].read())["choices"][0]["message"]["content"])

Sure, here is a basic implementation of the Quick Sort algorithm in Python:

```python
def quick_sort(arr):
    if len(arr) <= 1:
        return arr
    else:
        pivot = arr[0]
        less_than_pivot = [x for x in arr[1:] if x <= pivot]
        greater_than_pivot = [x for x in arr[1:] if x > pivot]
        return quick_sort(less_than_pivot) + [pivot] + quick_sort(greater_than_pivot)

# Test the function
arr = [7, 2, 1, 6, 8, 5, 3, 4]
print(quick_sort(arr))  # Output: [1, 2, 3, 4, 5, 6, 7, 8]
```

Quick sort is a divide and conquer algorithm. It works by selecting a 'pivot' element from the array and partitioning the other elements into two sub-arrays, according to whether they are less than or greater than the pivot. The sub-arrays are then recursively sorted.

This implementation uses list comprehension to create the less_than_pivot and greater_than_pivot arrays, which can be more efficient than using the append method.



### 4.2 Message api stream mode

In [7]:
payload = {
    "model": "deepseek-ai/deepseek-coder-6.7b-instruct",
    "messages": [
    {
        "role": "user",
        "content": "Write a quick sort in python"
    }
    ],
    "max_tokens": 1024,
    "stream": True
}

response = runtime.invoke_endpoint_with_response_stream(
    EndpointName=endpoint_name,
    ContentType='application/json',
    Body=json.dumps(payload)
)

buffer = ""
for t in response['Body']:
    buffer += t["PayloadPart"]["Bytes"].decode()
    last_idx = 0
    for match in re.finditer(r'^data:\s*(.+?)(\n\n)', buffer):
        try:
            data = json.loads(match.group(1).strip())
            last_idx = match.span()[1]
            print(data["choices"][0]["delta"]["content"], end="")
        except (json.JSONDecodeError, KeyError, IndexError) as e:
            pass
    buffer = buffer[last_idx:]

Sure, here is a simple implementation of Quick Sort in Python:

```python
def quick_sort(arr):
    if len(arr) <= 1:
        return arr
    else:
        pivot = arr[len(arr)//2]
        left = [x for x in arr if x < pivot]
        middle = [x for x in arr if x == pivot]
        right = [x for x in arr if x > pivot]
        return quick_sort(left) + middle + quick_sort(right)

# Test the function
print(quick_sort([3,6,8,10,1,2,1]))
```

In this code, we first check if the array has one or no elements, in which case it is already sorted. If the array has more than one element, we select a pivot (in this case, the middle element of the array), and create three lists: one for elements less than the pivot, one for elements equal to the pivot, and one for elements greater than the pivot. We then recursively sort the "less than" and "greater than" lists. The sorted lists are then concatenated with the middle list (which contains all the pivot elements), resulting in a fully sorted array.


### 4.3 Completion api non-stream mode

In [8]:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True)
messages=[
    { 'role': 'user', 'content': "write a quick sort algorithm in python."}
]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)

payload = {
    # "model": "deepseek-ai/deepseek-coder-6.7b-instruct",
    "prompt": prompt,
    "max_tokens": 1024,
    "stream": False
}

response = runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType='application/json',
    Body=json.dumps(payload)
)

print(json.loads(response['Body'].read())["choices"][0]["text"])

Here is a simple implementation of the Quick Sort algorithm in Python:

```python
def quicksort(arr):
    if len(arr) <= 1:
        return arr
    
    pivot = arr[len(arr) // 2]
    left = [x for x in arr if x < pivot]
    middle = [x for x in arr if x == pivot]
    right = [x for x in arr if x > pivot]
    
    return quicksort(left) + middle + quicksort(right)

# Demonstration
arr = [3,6,8,10,1,2,1]
print(quicksort(arr))  # Outputs: [1, 1, 2, 3, 6, 8, 10]
```

This is a divide-and-conquer algorithm. It works by selecting a 'pivot' element from the array and partitioning the other elements into two sub-arrays, according to whether they are less than or greater than the pivot. The avarage time complexity of this algorithm is O(n log n), but in the worst case, it can be O(n^2).



### 4.4 Completion api stream mode

In [9]:
payload = {
    "prompt": prompt,
    "max_tokens": 1024,
    "stream": True
}

response = runtime.invoke_endpoint_with_response_stream(
    EndpointName=endpoint_name,
    ContentType='application/json',
    Body=json.dumps(payload)
)

buffer = ""
for t in response['Body']:
    buffer += t["PayloadPart"]["Bytes"].decode()
    last_idx = 0
    for match in re.finditer(r'^data:\s*(.+?)(\n\n)', buffer):
        try:
            data = json.loads(match.group(1).strip())
            last_idx = match.end()
            # print(data)
            print(data["choices"][0]["text"], end="")
        except (json.JSONDecodeError, KeyError, IndexError) as e:
            pass
    buffer = buffer[last_idx:]

Sure, here's a simple implementation of the Quick Sort algorithm in Python:

```python
def quick_sort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[len(arr) // 2]
    left = [x for x in arr if x < pivot]
    middle = [x for x in arr if x == pivot]
    right = [x for x in arr if x > pivot]
    return quick_sort(left) + middle + quick_sort(right)

# Example usage:
print(quick_sort([3, 6, 8, 10, 1, 2, 1]))
# Output: [1, 1, 2, 3, 6, 8, 10]
```

This implementation uses a divide-and-conquer strategy. The pivot chosen for this implementation is the middle element of the array, but there are many other ways to choose the pivot. The algorithm is then recursively applied to the sub-arrays on the left and right of the pivot. 

It's important to note that while Quick Sort is typically faster in practice than other O(n log n) algorithms, it is not in-place, meaning it uses extra space proportional to the depth of the recursion, and therefore in the worst-case is slower than other 