# Deploy Writer Palmyra-Med-70B-32K Model Package from AWS Marketplace

Palmyra-Med-70B-32k, developed by Writer, is a powerful Large Language Model (LLM) specifically designed for healthcare applications. This model offers an extended context length of 32,768 tokens and is trained on high-quality biomedical data. It outperforms larger models like GPT-4, Claude Opus, Gemini, and Med-PaLM-2 on various biomedical benchmarks, achieving an average score of 85.87%.

Palmyra-Med-70B-32k, is meticulously designed to meet the unique linguistic and knowledge demands of the medical and life sciences sectors. It has been fine-tuned on an extensive collection of high-quality biomedical data, ensuring it can comprehend and generate text with precise domain-specific accuracy and fluency. Palmyra-Med-70B-32k excels in analyzing and summarizing complex clinical notes, EHR data, and discharge summaries, extracting key information to generate concise, structured summaries.

> **Note**: This is a reference notebook and it cannot run unless you make changes suggested in the notebook.

## Pre-requisites:
1. Before running this notebook, please make sure you got this notebook from the model catalog on SageMaker AWS Management Console.
2. **Note**: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
3. Ensure that IAM role used has **AmazonSageMakerFullAccess**.

## Contents:
1. [Select model package](#1.-Subscribe-to-the-model-package)
2. [Create an endpoint and perform real-time inference](#2.-Create-an-endpoint-and-perform-real-time-inference)
   1. [Create an endpoint](#A.-Create-an-endpoint)
   2. [Create input payload](#B.-Create-input-payload)
   3. [Perform real-time inference](#C.-Perform-real-time-inference)
   4. [Visualize output](#D.-Visualize-output)
   5. [Delete the endpoint](#E.-Delete-the-endpoint)
3. [Clean-up](#4.-Clean-up)
    1. [Delete the model](#A.-Delete-the-model)
   
## Usage instructions
You can run this notebook one cell at a time (By using Shift+Enter for running a cell).

# 1. Subscribe to the model package

To subscribe to the model package:

1. Open the model package listing page <link>
2. On the AWS Marketplace listing, click on the **Continue to subscribe** button.
3. On the **Subscribe to this software** page, review and click on **"Accept Offer"**. (if you are already subscribed skip this step)
4. Once you click on **Continue to configuration button** and then choose a region, you will see a **Product Arn** displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3. Copy the ARN corresponding to your region and specify the same in the following cell.

In [None]:
model_package_map = {
    "us-west-2": "arn:aws:sagemaker:us-west-2:767397687672:model-package/Palmyra-Med-70B-32K",
}

# imports 

In [None]:
import re
import json
from sagemaker import ModelPackage
from sagemaker import get_execution_role
from sagemaker import ModelPackage
import sagemaker as sage
import boto3

In [None]:
region = boto3.Session().region_name
if region not in model_package_map.keys():
    raise ("UNSUPPORTED REGION")

model_package_arn = model_package_map[region]

In [None]:
role = get_execution_role()
sagemaker_session = sage.Session()

runtime_sm_client = boto3.client("runtime.sagemaker")

## 2. Create an endpoint and perform real-time inference

In [None]:
model_name = "palmyra-med-70b-32k"

content_type = "application/json"

real_time_inference_instance_type = "ml.p4d.24xlarge"

### A. Create an endpoint

In [None]:
# create a deployable model from the model package.
model = ModelPackage(
    role=role, model_package_arn=model_package_arn, sagemaker_session=sagemaker_session
)

# Deploy the model
predictor = model.deploy(
    initial_instance_count=1,
    instance_type=real_time_inference_instance_type,
    endpoint_name=model_name,
    model_data_download_timeout=3600,
    container_startup_health_check_timeout=600,
)

Once endpoint has been created, you would be able to perform real-time inference.

### B. Create input payload

In [None]:
payload = "Does danzhi Xiaoyao San ameliorate depressive-like behavior by shifting toward serotonin via the downregulation of hippocampal indoleamine 2,3-dioxygenase?"

### C. Perform real-time inference

In [None]:
def run_inference(payload, endpoint_name, subpath="/v1/chat/completions"):

    def clean_output(text):
        return re.split(r"<\|eot_id\|>", text)[0]

    messages = [
        {
            "role": "system",
            "content": "You are a highly knowledgeable and experienced expert in the healthcare and biomedical field, possessing extensive medical knowledge and practical expertise.",
        },
        {
            "role": "user",
            "content": payload,
        },
    ]

    payload = {
        "messages": messages,
        "max_tokens": 512,
        "do_sample": False,
        "subpath": subpath,
    }

    response_model = runtime_sm_client.invoke_endpoint(
        EndpointName=endpoint_name,
        Body=json.dumps(payload),
        ContentType="application/json",
    )

    try:
        response = json.loads(response_model["Body"].read().decode("utf8"))

        if "choices" in response and response["choices"]:
            if (
                "message" in response["choices"][0]
                and "content" in response["choices"][0]["message"]
            ):
                response["choices"][0]["message"]["content"] = clean_output(
                    response["choices"][0]["message"]["content"]
                )
            else:
                print("Warning: Unexpected response structure")
        else:
            print("Warning: No choices in response")

        return response
    except json.JSONDecodeError:
        print("Error: Failed to decode JSON response")
        return None
    except Exception as e:
        print(f"Error: An unexpected error occurred: {str(e)}")
        return None


output = run_inference(payload, model_name)

### D. Visualize output

In [None]:
print(output)

### E. Delete the endpoint

Now that you have successfully performed a real-time inference, you do not need the endpoint any more. You can terminate the endpoint to avoid being charged.

In [None]:
model.sagemaker_session.delete_endpoint(model_name)
model.sagemaker_session.delete_endpoint_config(model_name)

## 4. Clean-up

### A. Delete the model

In [None]:
model.delete_model()