# Using Nova Lite as a sub-agent to Nova Pro model

In this notebook, we'll demonstrate how to analyze Amazon's 2023 quartely earnings reports using Nova Pro model to extract relevant information from quarterly earnings PDFs. We'll then use Nova Lite to answer our questions and create a graph using matplotlib to complement the responses.

## Step 1: Environment Setup
Let's install the required libraries and set up the Nova API client.

In [None]:
%pip install IPython PyMuPDF matplotlib --quiet

In [None]:
#Lets import the modules needed for this exercise
import boto3
import json
from typing import Optional
import fitz
import base64
from PIL import Image
import io
import requests
import os
from concurrent.futures import ThreadPoolExecutor

# Set up Bedrock Nova API client
client = boto3.client(
    "bedrock-runtime",
    region_name="us-east-1"
)

## Step 2: Gather the documents and ask a question
We will be using Amazon's financial statements from the 2023 and 2024 financial year and asking questions on net sales for the year.

In [None]:
# List of Amazon's earnings release PDF URLs from 2023 and 2024. In this example we will consider 2023 quarterly
# results.

#2024 
#     "https://s2.q4cdn.com/299287126/files/doc_financials/2024/q1/AMZN-Q1-2024-Earnings-Release.pdf",
#     "https://s2.q4cdn.com/299287126/files/doc_financials/2024/q2/AMZN-Q2-2024-Earnings-Release.pdf",
#     "https://s2.q4cdn.com/299287126/files/doc_financials/2024/q3/AMZN-Q3-2024-Earnings-Release.pdf"

pdf_urls = [ 
    "https://s2.q4cdn.com/299287126/files/doc_financials/2023/q1/Q1-2023-Amazon-Earnings-Release.pdf",
    "https://s2.q4cdn.com/299287126/files/doc_financials/2023/q2/Q2-2023-Amazon-Earnings-Release.pdf",
    "https://s2.q4cdn.com/299287126/files/doc_financials/2023/q3/AMZN-Q3-2023-Earnings-Release.pdf",
    "https://s2.q4cdn.com/299287126/files/doc_financials/2023/q4/AMZN-Q4-2023-Earnings-Release.pdf"    
]

# User's question
QUESTION = "How did Amazon's net sales change quarter to quarter in the 2023 financial year and what contributed to the changes?"

## Step 3: Download and convert PDFs to images
In this step, we'll define functions to download the earnings PDFs and convert them to base64-encoded PNG images. We need to do this as these PDFs are full of tables which are hard to parse with traditional PDF parsers. It's easier if we just convert them to images and pass the images to Nova Model.

`download_pdf`: function downloads a PDF file from the URL's provided in the previous step and saves it to the specified folder.
`pdf_to_base64_pngs`: function converts a PDF to a list of base64-encoded PNG images.

In [None]:
# Function to download PDF file from an URL and save it to a specified folder
def download_pdf(url, folder):
    try:
        response = requests.get(url, stream=True)
        response.raise_for_status()
        file_name = os.path.join(folder, url.split("/")[-1])
        with open(file_name, "wb") as file:
            for chunk in response.iter_content(chunk_size=8192):
                file.write(chunk)
        return file_name
    except requests.RequestException as e:
        print(f"Failed to download PDF from {url}: {e}")
        return None
#Function to convert PDF files to a list of base64-encoded PNG images
def pdf_to_base64_pngs(pdf_path, quality=75, max_size=(1024, 1024)):
    base64_encoded_pngs = []
    with fitz.open(pdf_path) as doc:
        for page in doc:
            pix = page.get_pixmap(matrix=fitz.Matrix(300/72, 300/72))
            image = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
            if image.size[0] > max_size[0] or image.size[1] > max_size[1]:
                image.thumbnail(max_size, Image.Resampling.LANCZOS)
            with io.BytesIO() as image_data:
                image.save(image_data, format='PNG', optimize=True, quality=quality)
                base64_encoded = base64.b64encode(image_data.getvalue()).decode('utf-8')
                base64_encoded_pngs.append(base64_encoded)
    return base64_encoded_pngs

#Function to return both the local path of the PDF files and the list of base64-encoded images
def process_pdf(url, folder):
    pdf_path = download_pdf(url, folder)
    if pdf_path:
        return pdf_path, pdf_to_base64_pngs(pdf_path)
    return None

# Folder to save the downloaded PDFs
folder = "../downloads/pdfs"

# Create the directory if it doesn't exist
os.makedirs(folder, exist_ok=True)

# Process PDFs concurrently
with ThreadPoolExecutor() as executor:
    pdf_results = list(executor.map(
        process_pdf, pdf_urls, [folder] * len(pdf_urls)))

    # Remove any None values (failed downloads or conversions) from results
pdf_results = [r for r in pdf_results if r is not None]


## Step 4: Generate a specific prompt for Nova Lite using Pro
Let's use Nova Pro Model as an orchestrator and have it write specific prompt for Lite Model based on the user provided question.

Instructions:
Summarize the overall trend in Amazon's net sales for the 2023 financial year.
Highlight the significant changes and the corresponding contributing factors.
Offer insights into how these factors might impact future performance and what strategies Amazon might consider based on the analysis.

These prompts will guide the Nova model through the process of analyzing Amazon's net sales data, identifying trends, and understanding the contributing factors, ultimately providing a comprehensive analysis of the company'

In [None]:
def generate_prompt_for_lite(question: str) -> Optional[str]:
    if not question or not isinstance(question, str):
        raise ValueError("Question must be a non-empty string")

    try:
        request_body = {
            "messages": [
                {
                    "role": "user",
                    "content": [
                        {
                            "text": f'''Based on the following question,
                            please generate a specific prompt for an LLM agent to extract relevant information from an earning's report PDF.
                            Each sub-agent only has access to a single quarter's earnings report.
                            Output only the prompt and nothing else.
                            \n\nQuestion: {question}'''
                        }
                    ]
                }
            ],
            "inferenceConfig": {
                "max_new_tokens": 300,
                "top_p": 0.9,
                "top_k": 20,
                "temperature": 0.7,
            }
        }

        response = client.invoke_model(
            modelId="us.amazon.nova-pro-v1:0",
            body=json.dumps(request_body)
        )

        model_response = json.loads(response["body"].read().decode('utf-8'))
        content_text = model_response["output"]["message"]["content"][0]["text"]
        print("\n\033[1m[Formatted Response Content Text]\033[0m")
        print(content_text)
        print('-' * 60)
        return content_text

    except Exception as e:
        print(f"Error generating prompt: {str(e)}")
        return None
    
if __name__ == "__main__":

    # Generate the lite prompt
    lite_prompt = generate_prompt_for_lite(QUESTION)
    if not lite_prompt:
        print("Failed to generate prompt. Exiting.")
        exit(1)

## Step 5: Extract information from PDFs
Now, let's define our question and extract information from the PDFs using sub-agent Lite models. We format the information from each model.

In [None]:
def extract_info(pdf_path, base64_encoded_pngs, lite_prompt):
    request_body = {
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "image": {
                            "source": {"bytes": base64_encoded_png},
                            "format": "png"
                        }
                    } for base64_encoded_png in base64_encoded_pngs
                ] + [
                    {
                        "text": lite_prompt
                    }
                ]
            }
        ]
    }
    response = client.invoke_model(
        modelId="us.amazon.nova-lite-v1:0",
        body=json.dumps(request_body)
    )

    model_response = json.loads(response["body"].read().decode('utf-8'))
    return model_response["output"]["message"]["content"][0]["text"], pdf_path


if __name__ == "__main__":

    # Extract information from PDFs
    with ThreadPoolExecutor() as executor:
        extracted_info_list = list(executor.map(
            lambda x: extract_info(x[0], x[1], lite_prompt), pdf_results))

    # Process and display the extracted information
    extracted_info = ""
    for info, pdf_path in extracted_info_list:
        quarter = os.path.basename(pdf_path).split("-")[1]  # Extracting quarter from filename
        extracted_info += f"<info quarter=\"{quarter}\">{info}</info>\n"

    print(extracted_info)

We extract information from the PDFs concurrently using sub-agent models and combine the extracted information. We then prepare the messages for the powerful Pro model, including the question and the extracted information, and ask it to generate a response and matplotlib code.

## Step 6: Pass the information to Nova Pro model to generate Python code to create a graph
Now that we have fetched the information from each PDF using the sub-agents, let's call Pro model to actually answer the question and write code to create a graph

In [None]:
# Prepare the messages for the Pro model
request_body = {
    "messages": [
    {
        "role": "user",
        "content": [
            {"text": f'''Based on the following extracted information from Amazon's earnings releases, 
             please provide a response to the question: {QUESTION}\n\n
             Also, please generate Python code using the matplotlib library to accompany your response. 
             Enclose the code within <python_code> tags.\n\nExtracted Information:\n{extracted_info}'''}
        ]
    }
]
}

# Generate the matplotlib code using the powerful model
response = client.invoke_model(
            modelId="us.amazon.nova-pro-v1:0",
            body=json.dumps(request_body)
)

model_response = json.loads(response["body"].read().decode('utf-8'))
content_text = model_response["output"]["message"]["content"][0]["text"]
print("\n\033[1m[Formatted Response Content Text]\033[0m")
print(content_text)

## Step 7: Extract response and execute Matplotlib code
Finally, let's extract the matplotlib code from the generated response above and use it to visualize the revenue growth trend.

We define the ```extract_code_and_response``` function to extract the matplotlib code and non-code response from the generated response in the previous step.


In [None]:
import matplotlib.pyplot as plt


def extract_code(response):
    start = response.find("<python>")
    end = response.find("</python>")
    if start != -1 and end != -1:
        return response[start+6:end].strip(), response[:start].strip()
    return None, response.strip()

# Assuming content_text is defined
code, text = extract_code(content_text)

print(text)
if code:
    try:
        exec(code)
    except Exception as e:
        print(f"Error executing code: {e}")
        
        
# Data for the net sales in each quarter
quarters = ['Q1 2023', 'Q2 2023', 'Q3 2023', 'Q4 2023']
net_sales = [127.4, 129.4, 130.3, 133.0]

# Plotting the net sales
plt.figure(figsize=(10, 6))
plt.plot(quarters, net_sales, marker='o', linestyle='-', color='b', label='Net Sales (in billions)')

# Adding titles and labels
plt.title('Amazon Net Sales by Quarter in 2023')
plt.xlabel('Quarter')
plt.ylabel('Net Sales (in billions)')
plt.grid(True)

# Adding labels for each data point
for i, value in enumerate(net_sales):
    plt.text(quarters[i], value, f'{value:.1f}', ha='right', va='bottom')

# Adding legend
plt.legend()

# Show the plot
plt.show()