# Summarize Customer Reviews Using Amazon Bedrock

Before starting, please make sure this notebook is using **conda_python3** kernel from the top right!

To run this notebook, go to **Cell -> Run All**. Inspect the output of each cell block.

<b> Please read the following instructions carefully! </b>
    <ul>
    <li>We highly recommend you to run all cells and inspect output rather than running the cells individually to save time as well as avoid any issues.  
    <li>Please read the comments in the markdown and inspect the output of each cell.
    <li>Limit experimentations within the notebook since we have **Explore** section dedicated to that. 
    <li>We have provided quick explanation of all the concepts you need to know for this feature within this notebook. We have also provided some hyperlinks for further study. In the interest of time, we recommend you to review the supplement details in the hyperlinks **after** the workshop. You can always find this notebook in this [Github repository](https://github.com/aws-samples/retails-generative-ai-workshop/blob/main/notebooks/summarize_customer_reviews.ipynb).
    </ul>

### Introduction

In this notebook, let's learn how to use Amazon Bedrock to summarize the customer reviews for a product. Usually, we can summarize reviews using a simple prompt such as: 

prompt = """ Summarize the following customer reviews: \
  &emsp;&emsp;&emsp;&emsp;&emsp;&emsp; \{all customer reviews\} \
&emsp;&emsp;&emsp;&emsp;&emsp;&ensp;""" 

This would work well for products with fewer number of reviews. However, if a product has a large number of reviews, as you would see in a real life retail website, passing all of these reviews to the LLM will lead to into **Out-Of-Memory** errors or all of the reviews may not fit into the model's context length. In order to avoid these issues, we use the Langchain's [TextSplitter](https://js.langchain.com/docs/modules/data_connection/document_transformers/) transformer. TextSplitter allows you to split up the large number of reviews into chunks. These chunks are then passed to the LLM to generate the overall summary. 


### Architecture

![Text Summarization](../images/text-summarization.png)

<h3> Install required dependencies </h3>
<p> <b>Note:</b> If you notice any ERRORs from the following cell, ignore them and proceed with the next cells.</p><br>

In [None]:
%pip install --quiet --no-build-isolation --upgrade \
    "boto3==1.28.63" \
    "awscli==1.29.63" \
    "botocore==1.31.63" \
    "langchain==0.0.309" \
    "transformers==4.34.0" \
    "tensorflow==2.15.0"

In [None]:
import json
import os
import sys
import boto3
import botocore
from langchain import PromptTemplate
from langchain.llms.bedrock import Bedrock

module_path = ".."
sys.path.append(os.path.abspath(module_path))
from utils import bedrock, print_ww

<h3> Initialize Bedrock client </h3><br>

In [None]:
 boto3_bedrock = bedrock.get_bedrock_client(
    assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),
    region=os.environ.get("AWS_DEFAULT_REGION", None)
)

<h3> Initialize LLM </h3><br>

<p> Using Langchain, initialize ClaudeInstant LLM for text summarization. </p>

In [None]:
textsumm_llm = Bedrock(
                model_id="anthropic.claude-instant-v1",
                client=boto3_bedrock)

<h3> Create sample customer reviews </h3>

Create 4 sample customer reviews for the product *Treadlite Shoe*. This data will be used to construct our prompt template which will be passed to the LLM to summarize customer reviews. 

We will also enclose the customer reviews in \<review\>\<\/review\> tags and load it into a variable called *review_digest* to pass to the LLM. 

In [None]:
product_name="Treadlite Shoe"

# There are 4 customer reviews for this product 

product_reviews = ["""I've owned these shoes for about 6 months now and have put over 300 miles on them. They are super lightweight and provide excellent cushioning and support for long runs. The breathable mesh keeps my feet from overheating even on hot summer days. The only minor issue is that the tread is starting to show some light wear but for a $200 shoe I'm really impressed with the durability. Overall these are a fantastic value for any serious runner.""",
                   """While the Treadlite shoes look sleek and are very lightweight, I'm finding they don't provide enough support for high intensity workouts. On long runs or easy miles they feel fine but anytime I do speedwork or hill repeats my feet and ankles feel beat up afterwards. The cushioning also seems to flatten out quickly. I'm only a few months into a training plan and they are barely holding up. For the price I expected them to last longer. I like the brand but may look elsewhere for my next pair of shoes.""",
                   """I bought these shoes to use for light gym workouts and occasional runs but after a few weeks I realized they weren't supportive enough for any intense exercise. The upper material is very thin and offers little protection or structure for high-impact activities. On my long run last weekend my feet and ankles were sore afterwards. They look and feel lightweight but lack sturdiness. Fine for walking around casually but I wouldn't recommend them for serious athletes or those training for races and would look for a shoe with better cushioning and stability.""",
                   """Treadlite shoes promise lightweight comfort, but they failed to deliver for me. Within a few weeks of regular use, the thin material started wearing down already. There are holes forming on the sides and small tears along the seams. The fabric just does not feel durable enough. Additionally, I found these shoes to be poorly cushioned. After a 5 mile run, my feet and knees were sore from the lack of support and bounce. It was like running directly on the hard pavement. The shoe provides almost no impact absorption for a runner. The sizing is also off. I bought my normal size but the shoes feel restrictive, like my feet are being squeezed. They did not stretch or mold to the shape of my foot over time like other running shoes. It's an uncomfortable fit that leaves my feet feeling constricted after runs. Between the lack of cushioning, poor durability of materials, and sizing issues, these Treadlite shoes have been a big disappointment. For a brand focused on running, the design flaws mean they are not well-suited for the needs of active individuals. I cannot recommend these shoes and would not purchase from this brand again based on my experience. Runners deserve better quality and performance than what Treadlite provided."""]

# Loading this list in HTML tag format to pass as a prompt to the LLM. 
# Doing it this way helps LLM understand our instruction better

review_digest = ''

for review in product_reviews:
    review_digest += "<review>" + '\n'
    review_digest += review + '\n'
    review_digest += "</review>" + '\n\n'
        
print_ww(review_digest)

<h4> Let's check the total number of tokens in the 4 sample customer reviews </h4>

In [None]:
total_num_tokens = textsumm_llm.get_num_tokens(review_digest)


print(
    f"The entire review_digest has {total_num_tokens} tokens. Let's split it into chunks using Langchain's TextSplitter"
)

<h3>Split the customer reviews into chunks </h3>

*chunk_size* controls the max size (in terms of number of characters) of the chunks, if splitting is possible. We set this to **1000**. <br>

*chunk_overlap* specifies how much overlap there should be between chunks. This is used to make sure that the text isn't split weirdly and maintain context continuity between the chunks. A larger chunk overlap will result in more chunks sharing common characters, while a smaller chunk overlap will result in fewer chunks sharing common characters. We set this to **100**.  

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

chunk_size=1000
chunk_overlap=100

text_splitter = RecursiveCharacterTextSplitter(
    separators=["\n\n", "\n"], chunk_size=chunk_size, chunk_overlap=chunk_overlap
)

customer_reviews = text_splitter.create_documents([review_digest])

In [None]:
num_docs = len(customer_reviews)

num_tokens_first_doc = textsumm_llm.get_num_tokens(customer_reviews[0].page_content)

print(f"After splitting the customer reviews, we have {num_docs} chunks.")

j=1
for chunk in customer_reviews:
    num_tokens_in_chunk = textsumm_llm.get_num_tokens(chunk.page_content)
    print(f"Chunk {j} has {num_tokens_in_chunk} tokens")
    j=j+1

<p> Create a prompt template with the variables: product name and consolidated list of customer reviews for the product. </p> 

In [None]:
summary_prompt='''

            Human: 

            Your task is to summarize the customer reviews for the product {product_name}. 
            Following are the customer reviews enclosed in <customer_reviews> tag. 
            
            <customer_reviews>
                `{text}`
            </customer_reviews>
            
            <example_review_summary_format>

            Here's a customer review summary of {product_name}
            Pros:
                
                - pro 1
                - pro 2 
                
            Cons:
            
                - con 1 
                - con 2
            
            Overall summary of the customer reviews. 

            </example_review_summary_format>

            Do not suggest the customer to make a purchasing decision. 
            Overall summary should be objective and should only echo the customer reviews.
            
            
            Assistant:
            
        '''

#### Create Prompt Template with input variables

In [None]:
summary_prompt_template = PromptTemplate(
    template=summary_prompt, 
    input_variables=['product_name','text']
)

#### Summarize the product reviews with Bedrock

Use Langchain's [load_summarize_chain](https://python.langchain.com/docs/use_cases/summarization) to summarize the product reviews. **load_summarize_chain()** is used to generate a summarization chain with the customer reviews we split into chunks. The generated chain is applied to the input text, resulting in the generation of a concise summary

[stuff](https://python.langchain.com/docs/modules/chains/document/stuff) chain type takes the list of customer reviews, loads them all into a prompt and passes that prompt to an LLM. This is the simplest chain type. But if the number of reviews is very large, say 5000 or 10000 reviews, this may still hit the maximum limit of tokens for the model.

[map_reduce](https://python.langchain.com/docs/modules/chains/document/map_reduce) chain type summarizes each chunk, combines the summary, and finally summarizes the combined summary. This chain type can be used for very large number of reviews. 

If the number of reviews is fewer, say less than 100, **map_reduce** will add some latency because it summarizes each individual chunk with the LLM. Since the maximum number of reviews for a product is 50 in our retail website, let's use the **stuff** chain type. 

We call the LLM using the following input variavbles: 

1. *product_name* is Treadlite Shoes 
2. *text* is the customer_reviews i.e., the customer reviews split into chunks using Langchain's TextSplitter API

In [None]:
# Set verbose=True if you want to see the prompts being used
from langchain.chains.summarize import load_summarize_chain

summary_chain = load_summarize_chain (
    llm=textsumm_llm,
    chain_type='stuff',
    prompt=summary_prompt_template,
    verbose=False
)

summary=summary_chain.run({
           "product_name": product_name,
           "input_documents": customer_reviews
           })

In [None]:
print_ww(summary.strip())

#### Now let's try Amazon Titan Text LLM with the same inputs

In [None]:
textsumm_llm = Bedrock(
                model_id="amazon.titan-text-express-v1",
                model_kwargs={
                        "maxTokenCount": 512,
                        "stopSequences": [],
                        "temperature": 0,
                        "topP": 1,
                    },
                client=boto3_bedrock)

In [None]:
summary_chain = load_summarize_chain (
    llm=textsumm_llm,
    chain_type='stuff',
    prompt=summary_prompt_template,
    verbose=False
)

summary=summary_chain.run({
           "product_name": product_name,
           "input_documents": customer_reviews
           })

print_ww(summary.strip())

<h3> You've successfully summarized customer reviews for a product with Amazon Bedrock!</h3>

<p> Please stop the notebook kernel before proceeding. </p>

<h4> Now, let's learn how to integrate Amazon Bedrock and Langchain into your web application to do the same. Please go back to Workshop Studio and follow the instructions to replicate this code into your Cloud9 environment. </h4>