<a href="https://colab.research.google.com/github/withpi/cookbook-withpi/blob/main/colabs/Sagemaker_Ranking.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a href="https://withpi.ai"><img src="https://withpi.ai/logo/logoFullBlack.svg" width="240px"></a>

<a href="https://code.withpi.ai"><font size="4">Documentation</font></a>

<a href="https://withpi.ai"><font size="4">Copilot</font></a>

# Ranking

Pi has published its Pi Ranking model for deployment on AWS Sagemaker.

Deploy to Sagemaker for inference in your own AWS account.  This notebook shows how to perform inference with it.

You will need appropriate secrets in your notebook to access your account, such as `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` and `AWS_SESSION_TOKEN`.  When running locally authenticate to AWS in the normal manner.

Start by installing packages and adding environment variables.

In [None]:
%pip install boto3 tqdm


import os
from google.colab import userdata

os.environ["AWS_ACCESS_KEY_ID"] = userdata.get('AWS_ACCESS_KEY_ID')
os.environ["AWS_SECRET_ACCESS_KEY"] = userdata.get("AWS_SECRET_ACCESS_KEY")
os.environ["AWS_SESSION_TOKEN"] = userdata.get("AWS_SESSION_TOKEN")



## Sample inference

Run the below cell to test if everything is working.

You will need to plug in the name of your Sagemaker endpoint and the region it is located in below.

In [None]:
import boto3
import json
import time

# Initialize the SageMaker runtime client
# Update the region if needed
sagemaker_runtime = boto3.client('sagemaker-runtime', region_name='us-east-1')

# Your endpoint configuration
endpoint_name = 'PiScorer'

latencies = []
for _ in range(10):
  start = time.perf_counter()
  response = sagemaker_runtime.invoke_endpoint(
      EndpointName=endpoint_name,
      ContentType='application/json',
      Body=json.dumps(
          {"query": "What is the capital of France?","passages": ["Paris is the capital of France.","Berlin is the capital of Germany.","Madrid is the capital of Spain."]}
      )
  )
  stop = time.perf_counter()
  latencies.append(f"{stop-start:.3f}")

print(f"Latencies: {latencies}")
results = json.loads(response['Body'].read().decode())
display("Inference complete")
display(f"{results}")