# Deploy <font color='#6957FF'>Widn Tower Anthill</font> Model Package from AWS Marketplace 


---

**Widn Tower Anthill** is a multilingual LLM based on Unbabel's powerful Tower LLMs, optimized for high-quality translation use cases across multiple domains. It is the smallest and fastest model offered by Widn.

This sample notebook shows you how to deploy <font color='#C9FF33'>[Widn Tower Anthill](https://aws.amazon.com/marketplace/pp/prodview-xskn6yectpscq)</font> using Amazon SageMaker.

> **Note**: This model package only supports SageMaker Realtime Inference. Sagemaker doesn't yet support Batch Transform for this type of model package; this is a limitation of the SageMaker Batch Transform service.


## Pre-requisites:
1. Ensure that IAM role used has **AmazonSageMakerFullAccess**
1. To deploy this ML model successfully, ensure that:
    1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used: 
        1. **aws-marketplace:ViewSubscriptions**
        1. **aws-marketplace:Unsubscribe**
        1. **aws-marketplace:Subscribe**  
    2. or your AWS account has a subscription to <font color='#C9FF33'>[Widn Tower Anthill](https://aws.amazon.com/marketplace/pp/prodview-xskn6yectpscq)</font>. If so, skip step [Subscribe to the model package](#1.-Subscribe-to-the-model-package)

## Contents:
1. [Subscribe to the model package](#1.-Subscribe-to-the-model-package)
2. [Create an endpoint and perform real-time inference](#2.-Create-an-endpoint-and-perform-real-time-inference)
   1. [Create an endpoint](#A.-Create-an-endpoint)
   2. [Create input payload](#B.-Create-input-payload)
   3. [Perform real-time inference](#C.-Perform-real-time-inference)
   4. [Visualize output](#D.-Visualize-output)
   5. [Streaming output](#E.-Streaming-output)
3. [Example use cases](#3.-Example-use-cases) 
4. [Clean-up](#4.-Clean-up)
    1. [Delete the model](#A.-Delete-endpoint-and-model)
    2. [Unsubscribe to the listing (optional)](#B.-Unsubscribe-to-the-listing-(optional))
    

## Usage instructions
You can run this notebook one cell at a time (by using Shift+Enter for running a cell).

## 1. Subscribe to the model package

To subscribe to the model package:
1. Open the model package listing page <font color='#C9FF33'>[Widn Tower Anthill](https://aws.amazon.com/marketplace/pp/prodview-xskn6yectpscq)</font> in AWS Marketplace.
1. On the AWS Marketplace listing, click on the **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you and your organization agrees with EULA, pricing, and support terms. 
1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn** displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3. Copy the ARN corresponding to your region and specify the same in the following cell.

In [1]:
# Set the region to which you subscribed to the model package
region = "us-east-1"

model_package_arn = f"arn:aws:sagemaker:{region}:571600842260:model-package/widn-tower-3-0-anthill-241001"

If you need to load environment variables from a .env file, you can use the following code:

In [None]:
!pip install python-dotenv
import dotenv

dotenv.load_dotenv()

In [None]:
import json

from sagemaker import ModelPackage, Predictor, Session
from sagemaker import deserializers, get_execution_role, serializers

In [None]:
session = Session()
region = session.boto_region_name
role = get_execution_role(session)  # Or directly specify a suitable role
role

In [None]:
# Choose an instance type to run the model
instance_type = "ml.g5.xlarge"

In [None]:
model_name = model_package_arn.split("/")[-1]

# Create a deployable model from the model package
model = ModelPackage(
    role=role,
    model_package_arn=model_package_arn,
    sagemaker_session=session
)

print(f"{model_name=}")

## 2. Create an endpoint and perform real-time inference

If you want to understand how real-time inference with Amazon SageMaker works, [see their documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html).

### A. Create an endpoint

In [None]:
# Deploy the model
model.deploy(
    initial_instance_count=1,
    instance_type=instance_type,
    endpoint_name=model_name,
    container_startup_health_check_timeout=500,  # Change if necessary; about 8 minutes is usually enough
)

Once endpoint has been created, you should be able to perform real-time inference.

### B. Create input payload

In [None]:
# Load the input samples from a jsonlines file in data/input/real-time/source_sentences.jsonl
file_name = "data/input/real-time/source_sentences.jsonl"

with open(file_name, "r") as f:
    inputs = [json.loads(line) for line in f]
inputs

In [None]:
# Use the following template to create a suitable prompt for translation
prompt_template = (
    """
    Translate the following text from {source_language} into {target_language}.
    {source_language}: {source_text}
    {target_language}:
    """
)

### C. Perform real-time inference

In [None]:
predictor = Predictor(
    endpoint_name=model_name,
    sagemaker_session=session,
    serializer=serializers.JSONSerializer(),
    deserializer=deserializers.JSONDeserializer(),
)

In [None]:
outputs = (
    predictor.predict(
        {
            "messages": [
                {"role": "user", "content": prompt_template.format(**sample)},
            ],
            "max_tokens": 256,
            "temperature": 0.0,
            # Check available parameters here: https://docs.djl.ai/master/docs/serving/serving/docs/lmi/user_guides/chat_input_output_schema.html#request-schema
        }
    )
    for sample in inputs
)

### D. Visualize output

In [None]:
for output in outputs:
    # output is a dictionary with the final response
    print(output["choices"][0]["message"]["content"])

### E. Streaming output

In [None]:
# Call the model and handle the streaming response
# There are three important differences to note:
# 1. The method is `predict_stream` and the "stream" parameter is set to True;
# 2. The response is a generator that yields partial JSON objects;
# 3. The response schema is different from the non-streaming case: use the `delta` field instead of `message`.

outputs = (
    predictor.predict_stream(
        {
            "messages": [
                {"role": "user", "content": prompt_template.format(**sample)},
            ],
            "max_tokens": 256,
            "temperature": 0.0,
            "stream": True,
        }
    )
    for sample in inputs
)

for output in outputs:
    bytes_accumulated = bytearray()
    for payload in output:
        bytes_accumulated.extend(payload)
        try:
            item = json.loads(bytes_accumulated)
        except json.decoder.JSONDecodeError:
            # payload contained only a partial JSON object
            continue
    
        bytes_accumulated = bytearray()
        if item is not None:
            print(item["choices"][0]["delta"]["content"], end="")
    print()

## 3. Example use cases

You can experiment with your own prompts but in this section we show some examples on how to use it for some more complex translation tasks.

### Document-level translation

The model is trained with a context length of 4096 tokens and thus we recommend translation of documents that have less than 4k tokens (which are roughly 14k characters).

Example prompt:

In [None]:
prompt = """
You are an expert translation large language model tasked with translating texts from English to German.

Translate the following text into German:
### Early Life and Career Beginnings
Born on February 5, 1985, in Madeira, Portugal, Cristiano Ronaldo began his football journey at Sporting CP before catching the eye of major European clubs.

### Professional Career
- **Manchester United** (2003-2009, 2021-2022): Rose to fame, winning multiple Premier League titles and the 2008 Champions League.
- **Real Madrid** (2009-2018): Cemented his legacy, becoming the club’s all-time top scorer and winning four Champions League titles.
- **Juventus** (2018-2021): Continued success in Italy, securing two Serie A titles.
- **Al-Nassr** (2023-Present): Currently playing in Saudi Arabia, continuing to break records.

### Playing Style
Renowned for his versatility, speed, and goal-scoring prowess, Ronaldo is known for his athleticism and precision both in the air and on the ground.

### Achievements
- **Ballon d'Or**: 5 wins
- **Champions League Titles**: 5
- **UEFA Euro Champion**: 2016 with Portugal

### Personal Life and Influence
Beyond football, Ronaldo is a global icon with massive influence, known for his philanthropy and endorsements, making him one of the world’s most marketable athletes.
"""

In [None]:
# Call the model and show the output
output = predictor.predict(
    {
        "messages": [
            {"role": "user", "content": prompt},
        ],
        "max_tokens": 1000,
        "temperature": 0.0,
    }
)

print(output["choices"][0]["message"]["content"])

### Glossaries, tone, and more

You can give the model a couple more details from the expected translation. Things like tone, glossaries and few-shot examples. You can also pass those individually but to simplify, in this example, we mix all together.

Example prompt:

In [None]:
prompt = """
Translation Task:
- From: English
- To: Portuguese (Portugal)
- Style: formal
Reference Translations:
Example 1:
  English: To keep your trip on track, we've got you a place to stay:
  Portuguese (Portugal): Para que possa continuar a sua viagem, encontrámos um local para a sua estadia:
Example 2:
  English: To keep your trip on track, we've found you a similar place to stay:
  Portuguese (Portugal): Para que possa continuar a sua viagem, encontrámos um local semelhante para a sua estadia:
Example 3:
  English: To keep your trip on track, we've found you a similar place to stay:
  Portuguese (Portugal): Para o ajudarmos a reorganizar a sua viagem, encontrámos um alojamento semelhante para si:
Example 4:
  English: To keep your trip on track, we've found you a similar place to stay:
  Portuguese (Portugal): Para poder continuar com a sua viagem, encontrámos um local semelhante para a sua estadia:
Example 5:
  English: To keep your trip on track, we've found you a similar place to stay:
  Portuguese (Portugal): Para poder continuar com a sua viagem, encontrámos um local semelhante onde poderá ficar hospedado:

Text to Translate:
To keep your trip on track, we've found you a alternative place to stay:
"""

In [None]:
# Call the model and show the output
output = predictor.predict(
    {
        "messages": [
            {"role": "user", "content": prompt},
        ],
        "max_tokens": 256,
        "temperature": 0.0,
    }
)

print(output["choices"][0]["message"]["content"])

### In-context translation

Sometimes, when documents are too large, we might need to break translations into paragraphs and perform `incontext translations`.

Example prompt:

In [None]:
prompt = """
Consider the following source side context:

A giant Buddha Amitabha statue, 27 metres tall and weighing 3,000 tonnes, was installed outside on the Lan Kha Mountain in 2010. It was adapted from a similar structure from the Ly Dynasty.
The Phat Tich Pagoda is associated with Tu Thuc’s meeting with a fairy. As the legend goes, there were endless peonies on Lan Kha Mountain and in the pagoda, leading a young woman to visit the pagoda one day to see the flowers. She carelessly broke a tree branch and was fined by the monks, but a local scholar, Tu Thuc, was also visiting the pagoda and offered his coat to compensate for the broken branch. They became friends and continued to meet at the pagoda. The woman ultimately invited Tu Thuc to visit her house, leading him to a peony forest and into a cave on the mountainside with an imperial palace with high walls and stone footsteps. She revealed that she was a fairy and they got married.
Every year, people visit the pagoda to take part in the peony festival, where they enjoy looking at the flowers, listening to quan ho (love duets) and poem recitations, and playing traditional games. The festival usually lasts two days.
The pagoda was recognised as a national relic site in 1962 and a special national relic site in 2014.

Using the source context, translate the sentence below into Chinese (Simplified):
The Phat Tich pagoda, a special national relic site located just 25 kilometres northeast of Hanoi, was built in 1057 on a mountain called Lan Kha during the reign of King Ly Thanh Tong (1054-72). It was reduced to ashes by French colonialists in 1948 and restored in 1987.
"""

In [None]:
# Call the model and show the output
output = predictor.predict(
    {
        "messages": [
            {"role": "user", "content": prompt},
        ],
        "max_tokens": 256,
        "temperature": 0.0,
    }
)

print(output["choices"][0]["message"]["content"])

## 4. Clean-up

### A. Delete endpoint and model

Once you are done using the endpoint, to avoid unnecessary costs, you can delete it and the model by running the following commands:

In [None]:
model.sagemaker_session.delete_endpoint(model_name)
model.sagemaker_session.delete_endpoint_config(model_name)

### B. Unsubscribe to the listing (optional)

If you would like to unsubscribe to the model package, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package. Note - You can find this information by looking at the container name associated with the model. 

**Steps to unsubscribe to product from AWS Marketplace**:
1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)
2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__  to cancel the subscription.

