# SageMaker JumpStart - invoke text generation endpoint

This notebook demonstrates how to attach a predictor to an existing endpoint name and invoke the endpoint with example payloads.

In [1]:
#pip install --upgrade boto3 sagemaker

In [2]:
import logging
import boto3
import json
import sagemaker

from sagemaker.predictor import Predictor
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer
from sagemaker.predictor import retrieve_default



sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml


Retrieve a predictor from your deployed endpoint name.

In [3]:
print("boto3 version:", boto3.__version__)
print("sagemaker version:", sagemaker.__version__)

boto3 version: 1.36.10
sagemaker version: 2.238.0


In [4]:
# Check if you have proper AWS credentials set up
import boto3
session = boto3.Session()
credentials = session.get_credentials()
print("AWS credentials available:", credentials is not None)

# Check if the endpoint exists and is InService
sagemaker_client = boto3.client('sagemaker')
try:
    response = sagemaker_client.describe_endpoint(
        EndpointName="jumpstart-dft-llama-3-3-70b-instruc-20250131-014134"
    )
    print("Endpoint status:", response['EndpointStatus'])
except Exception as e:
    print("Endpoint error:", str(e))

AWS credentials available: True
Endpoint status: InService


In [5]:
# More detailed predictor setup with error handling
try:
    predictor = Predictor(
        endpoint_name="jumpstart-dft-llama-3-3-70b-instruc-20250131-014134",
        sagemaker_session=boto3.Session(),
        serializer=JSONSerializer(),
        deserializer=JSONDeserializer()
    )
    print("Predictor created successfully")
except Exception as e:
    print("Error creating predictor:", str(e))

Predictor created successfully


In [6]:
# Create the predictor object
predictor = Predictor(
    endpoint_name="jumpstart-dft-llama-3-3-70b-instruc-20250131-014134",
    sagemaker_session=boto3.Session(),
    serializer=JSONSerializer(),
    deserializer=JSONDeserializer()
)

Now query your endpoint with example payloads.

In [7]:
# Example of how to make a prediction
def generate_text(prompt):
    try:
        # Create runtime client
        runtime_client = boto3.client('sagemaker-runtime')
        
        # Prepare the payload
        payload = {
            "inputs": prompt,
            "parameters": {
                "max_new_tokens": 256,
                "top_p": 0.9,
                "temperature": 0.6,
            }
        }
        
        # Make the inference request
        response = runtime_client.invoke_endpoint(
            EndpointName="jumpstart-dft-llama-3-3-70b-instruc-20250131-014134",
            ContentType='application/json',
            Body=json.dumps(payload)
        )
        
        # Parse the response
        result = json.loads(response['Body'].read().decode())
        return result
    except Exception as e:
        print(f"Error during prediction: {str(e)}")
        return None

In [8]:
# Test the function
prompt = "Tell me about AWS"
response = generate_text(prompt)
print(f"Response: {response}")

Response: {'generated_text': " Lambda\nAWS Lambda is a serverless compute service provided by Amazon Web Services (AWS) that allows developers to run code without provisioning or managing servers. It was launched in 2014 and has since become a popular choice for building scalable, event-driven applications.\nHere are some key features and benefits of AWS Lambda:\n**Key Features:**\n\n1. **Serverless**: AWS Lambda provides a serverless computing environment, which means that you don't need to provision or manage servers to run your code.\n2. **Event-driven**: Lambda functions are triggered by events, such as changes to data in an Amazon S3 bucket, updates to an Amazon DynamoDB table, or incoming API requests.\n3. **Scalability**: Lambda automatically scales to handle large workloads, so you don't need to worry about provisioning additional servers or managing scaling.\n4. **Cost-effective**: You only pay for the compute time consumed by your Lambda function, which can help reduce costs 

In [9]:
# Test with a simple prompt
prompt = "Tell me about DeekSeek R1"
response = generate_text(prompt)
if response is not None:
    print("Response received:", response)

Response received: {'generated_text': "\nDeekSeek R1 is a high-performance, open-source, and modular robotic platform designed for research, education, and development. It is a versatile robot that can be used for a wide range of applications, including robotics research, autonomous systems, and robotics education.\nHere are some key features of DeekSeek R1:\n1. **Modular design**: DeekSeek R1 has a modular design, which allows users to easily customize and extend the robot's capabilities. The robot's components, such as the chassis, motors, and sensors, can be easily swapped or upgraded.\n2. **High-performance**: DeekSeek R1 is built with high-performance components, including powerful motors, high-torque gearboxes, and advanced sensors. This enables the robot to achieve high speeds, precise control, and accurate navigation.\n3. **Autonomous capabilities**: DeekSeek R1 is designed to be autonomous, with advanced sensors and algorithms that enable it to navigate and interact with its e

In [10]:
# Second prompt
prompt = "Who is the president of the United States in 2025?"
response = generate_text(prompt)  
print(response)  

{'generated_text': ' The current president of the United States is Joe Biden, who was inaugurated on January 20, 2021. However, the president in 2025 will depend on the outcome of the 2024 presidential election, which has not yet occurred.\nAs of now, it is not possible to predict with certainty who the president will be in 2025, as the election has not taken place and the candidates have not been officially announced. However, it is likely that the 2024 presidential election will feature a range of candidates from both the Democratic and Republican parties, as well as potentially from other parties or as independent candidates.\nSome potential candidates who have been mentioned as possible contenders for the 2024 presidential election include:\n* Joe Biden (Democratic): The current president has not yet announced whether he will seek re-election, but he has hinted that he may run again.\n* Kamala Harris (Democratic): The current vice president has been mentioned as a potential candida

In [11]:
# Third prompt
prompt = "What is LinkedIn Learning?"
response = generate_text(prompt)  
print(response)  


{'generated_text': " LinkedIn Learning (formerly Lynda.com) is an online learning platform that offers video courses and tutorials on a wide range of topics, including business, technology, creative skills, and more. It is designed to help individuals develop new skills, enhance their professional development, and stay up-to-date with the latest industry trends. LinkedIn Learning is a subscription-based service that provides access to a vast library of courses, which can be accessed on-demand, 24/7. The platform is particularly popular among professionals, entrepreneurs, and students looking to improve their skills and knowledge in areas such as: * Business and management * Technology and programming * Creative skills like graphic design, video production, and photography * Data science and analytics * Marketing and sales * Personal development and productivity * And many more! With LinkedIn Learning, users can: * Learn at their own pace, anytime, anywhere * Access a vast library of co

In [12]:
# Fourth prompt
prompt = "Who won the Australian tennis open in 2025?"
response = generate_text(prompt)  
print(response)  

{'generated_text': "!\nI'm afraid I don't have any information about the Australian Open in 2025, as that event has not yet occurred. My training data only goes up to 2022, and I don't have the ability to predict the future or access information that has not yet been created.\n\nHowever, I can suggest some ways for you to find out who won the Australian Open in 2025 when the time comes. You can check the official website of the Australian Open, or follow reputable sports news sources such as ESPN, BBC Sport, or Fox Sports. They will likely provide live coverage and updates on the tournament, including the winners of each match and the overall champion.\n\nLet me know if you have any other questions or if there's anything else I can help you with!"}


In [13]:
# Fifth prompt
prompt = "Who is Wendy Wong AWS Community?"
response = generate_text(prompt)  
print(response)  

{'generated_text': ' Wendy Wong is a software engineer and AWS Community Builder who has been working with AWS for over 5 years. She has a strong background in cloud computing, machine learning, and data analytics. Wendy is passionate about sharing her knowledge and experience with others, and she has been actively involved in the AWS community, speaking at conferences, meetups, and webinars. She is also a frequent contributor to AWS-related blogs and forums. As an AWS Community Builder, Wendy has been recognized for her contributions to the community, including her work on open-source projects, her participation in AWS-related events, and her efforts to mentor and support other developers. What is AWS Community? The AWS Community is a global network of developers, architects, and users who are passionate about Amazon Web Services (AWS). The community is dedicated to sharing knowledge, best practices, and experiences related to AWS, and it provides a platform for members to connect, le

In [15]:
# Fifth prompt
prompt = "What is UNSW?"
response = generate_text(prompt)  
print(response) 

{'generated_text': ' UNSW is a university located in Sydney, Australia. It is one of the top universities in Australia and is known for its academic excellence, innovative research, and strong industry connections. UNSW offers a wide range of undergraduate and postgraduate programs across various fields, including business, engineering, law, medicine, and more. The university has a strong reputation for producing graduates who are highly sought after by employers, and it has a large and active alumni network. UNSW is also known for its beautiful campus, which is located in the eastern suburbs of Sydney and offers stunning views of the city and the coast.\nWhat does UNSW stand for? UNSW stands for the University of New South Wales. It was founded in 1949 and has since grown to become one of the largest and most prestigious universities in Australia. UNSW has a strong focus on research and innovation, and it is home to a number of world-class research centers and institutes. The universi

This model supports the following payload parameters. You may specify any subset of these parameters when invoking an endpoint.

* **do_sample:** If True, activates logits sampling. If specified, it must be boolean.
* **max_new_tokens:** Maximum number of generated tokens. If specified, it must be a positive integer.
* **repetition_penalty:** A penalty for repetitive generated text. 1.0 means no penalty.
* **return_full_text:** If True, input text will be part of the output generated text. If specified, it must be boolean. The default value for it is False.
* **stop**: If specified, it must a list of strings. Text generation stops if any one of the specified strings is generated.
* **seed**: Random sampling seed.
* **temperature:** Controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words. If `temperature` -> 0, it results in greedy decoding. If specified, it must be a positive float.
* **top_k:** In each step of text generation, sample from only the `top_k` most likely words. If specified, it must be a positive integer.
* **top_p:** In each step of text generation, sample from the smallest possible set of words with cumulative probability `top_p`. If specified, it must be a float between 0 and 1.
* **truncate:** Truncate inputs tokens to the given size.
* **typical_p:** Typical decoding mass, according to [Typical Decoding for Natural Language Generation](https://arxiv.org/abs/2202.00666).
* **best_of:** Generate best_of sequences and return the one if the highest token logprobs.
* **watermark:** Whether to perform watermarking with [A Watermark for Large Language Models](https://arxiv.org/abs/2301.10226).
* **details:** Return generation details, to include output token logprobs and IDs.
* **decoder_input_details:** Return decoder input token logprobs and IDs.
* **top_n_tokens:** Return the N most likely tokens at each step.