# Baseline Model
## Extractive Summarization

The code aims to generate an extractive summary for a specific product's reviews. It starts by filtering the DataFrame to obtain reviews for a particular product based on its product id. The goal is to identify the most relevant sentences from these reviews to create a concise summary that highlights the key points and sentiments expressed by customers. To achieve this, it utilizes BERT embeddings and cosine similarity as follows:


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from transformers import BertTokenizer, BertModel
import torch
from sklearn.metrics.pairwise import cosine_similarity

import networkx as nx
from transformers import BertTokenizer, BertModel
from nltk.tokenize import sent_tokenize

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Read the data
df_neg = pd.read_json('/Users/williamfussell/Downloads/df_neg_sample.json')
df_pos = pd.read_json('/Users/williamfussell/Downloads/df_pos_sample.json')


In [3]:
# Inspect the data
df_neg.head()

Unnamed: 0,asin,title,reviewText
0,B0000AXRH5,FloTool 10701 Spill Saver Multi-Purpose Funnel,product is in good shape but very small. wasn'...
1,B0000AXU02,Hopkins 48735 4 Wire Flat Weatherproof Replace...,Does not fit properly and pops open often as a...
2,B0000AY3DS,"CIPA 11119 Economy 2.5"" x 8"" Marine Mirror",Flimsy cheap mirror that has an extremely dist...
3,B0000AY3SR,Meguiar's PlastX Clear Plastic Cleaner &amp; P...,Waste of money. Virtually no improvement.\nI g...
4,B0000AZ9KS,Keeper 05059 Ratcheting Cargo Bar,"item is to lite for my needs\nI agree, it is f..."


In [5]:
df_pos.head()

Unnamed: 0,asin,title,reviewText
0,B0012TYYKU,Show Chrome Replacement Black Grommets Honda 5...,looks good a lot cheaper than oem\nThese items...
1,B00NGT7QI6,Polaris 2879969 Rear View Mirror,Wide view.\nGood product\nFit my '17 570 range...
2,B00GHT95EK,DORMAN 924-807 Replacement Center Console Latch,Fits perfectly and so much better than the dea...
3,B005FL6K1W,Power Stop K2391 Rear Z23 Evolution Brake Kit ...,Used it for my Honda Accord 2007 EX rear break...
4,B0078U83CM,Yukon Gear &amp; Axle (AK 1559) Torrington 2.5...,appears to be the same Torrington brand as OEM...


In [8]:
print(df_neg['reviewText'][2])

Flimsy cheap mirror that has an extremely distorted view.  Not worth the price.ther
Cheesy!
Really small
Very dissatisfied with mounted it on my pontoon boat drove to the lake when I got to the lake did not realize it the glass part of it was missing now all I have is a plastic frame you wood thinking would last a lot longer  then one trip than that very very dissatisfied
No good, rusted out after one season. Boooooo!
great price, looks good, but hard to see any thing with it.
Cheap plastic with a mirror finish. Doubt it will last out the season.
no universel mounting bracket!
Junk, avoid this.  It will not stay clipped on.  The plastic is too flimsy for the clamp to stay attached.
Cheap mirror..  Gets job done but wouldn't recommend for anything more than that.
I wouldn't put this on a jon boat.  Straight to the trash.
Cheap product, too small, flimsy. Would not buy it again if I would need to. I just used the clamp from it that I fabricated to attached to the old better - bigger mirr

---
In the next bit of code we will:

- BERT (Bidirectional Encoder Representations from Transformers) is used to convert each sentence in the reviews into numerical embeddings, capturing the semantic meaning of the text.

- Cosine similarity is computed between all pairs of sentences to measure their similarity in the high-dimensional BERT embedding space. This helps identify sentences with similar content and sentiment.

- Average cosine similarity scores are calculated for each review. Reviews with higher average scores are considered more representative and informative.

- The code selects the top three reviews with the highest average cosine similarity scores, ensuring that they likely contain the most relevant information.

- For each of these top reviews, up to three sentences are extracted to form the final extractive summary. These sentences are selected based on their importance and coherence within the review.


In [7]:
# Filter the DataFrame 
product_asin = df_neg['asin'][2]
product_reviews = df_neg[df_neg['asin'] == product_asin]['reviewText'].tolist()

#number of sentences we will keep in the summary
num_sentences_in_summary = 3

# load BERT tokenizer and model
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertModel.from_pretrained("bert-base-uncased")

# Function to generate BERT embeddings 
def get_bert_embedding(sentence):
    input_ids = tokenizer.encode(sentence, add_special_tokens=True)
    inputs = torch.tensor([input_ids])
    with torch.no_grad():
        outputs = model(inputs)
        return outputs.last_hidden_state.mean(dim=1).numpy()

# generate BERT embeddings 
sentence_embeddings = [get_bert_embedding(sentence) for sentence in product_reviews]

# Calculate cosine similarity between sentences
cosine_similarities = cosine_similarity(np.vstack(sentence_embeddings))

# Get the average cosine similarity score for each review
average_cosine_similarities = cosine_similarities.mean(axis=1)

# Select the indices of the top three reviews based on their average cosine similarity score
top_review_indices = np.argsort(average_cosine_similarities)[-num_sentences_in_summary:]

# Geerate the extractive summary by selecting sentences from the top reviews
extractive_summary = []
for i in top_review_indices:
    sentences = product_reviews[i].split('. ')  # Split the review into sentences
    selected_sentences = sentences[:min(3, len(sentences))]  # Select up to 3 sentences
    extractive_summary.extend(selected_sentences)

# create a string with extrcted sentences 
formatted_summary = '\n'.join(['"' + sentence + '"\n' for sentence in extractive_summary])

print(formatted_summary)
    


"Flimsy cheap mirror that has an extremely distorted view"

" Not worth the price.ther
Cheesy!
Really small
Very dissatisfied with mounted it on my pontoon boat drove to the lake when I got to the lake did not realize it the glass part of it was missing now all I have is a plastic frame you wood thinking would last a lot longer  then one trip than that very very dissatisfied
No good, rusted out after one season"

"Boooooo!
great price, looks good, but hard to see any thing with it.
Cheap plastic with a mirror finish"



---

NExt we will need to evaluate the quality of an extractive summary compared to a reference summary using the ROUGE metric in Python. ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a widely used metric for assessing the quality of text summaries. It measures the overlap in content between the generated summary and the reference summary.

***What ROUGE Does***:
ROUGE is designed to evaluate the performance of text summarization systems by measuring how well the generated summary aligns with a reference or ground truth summary.
It assesses various aspects of overlap, including word overlap (ROUGE-1), bigram overlap (ROUGE-2), and longest common subsequence (ROUGE-L), among others.
ROUGE provides a way to quantify the quality of summarization outputs. 

The reference summary we will use was generated using OPEN AI's API with GPT 3.5 TURBO

---

In [15]:
from rouge import Rouge

# Instantiate the ROUGE scorer
rouge = Rouge()

# Define your extractive summary and reference summary as strings
extractive_summary = (
"Flimsy cheap mirror that has an extremely distorted view"

" Not worth the price.ther"
"Cheesy!"
"Really small"
"Very dissatisfied with mounted it on my pontoon boat drove to the lake when I got to the lake did not realize it the glass part of it was missing now all I have is a plastic frame you wood thinking would last a lot longer  then one trip than that very very dissatisfied"
"No good, rusted out after one season"

"Boooooo!"
"great price, looks good, but hard to see any thing with it."
"Cheap plastic with a mirror finish"
)
# chat gpt reference summary.
reference_summary = (
    'The product is described as flimsy and cheap, with an extremely distorted view, making it not worth the price.\n'
    "Many customers expressed dissatisfaction with its quality, mentioning issues like rusting after one season, difficulty in seeing clearly, and the plastic frame not lasting long.\n"
    'Several users recommended avoiding this product and suggested investing in a higher-quality mirror for better visibility and durability.')

# Calculate ROUGE scores
scores = rouge.get_scores(extractive_summary, reference_summary)

# Access specific ROUGE metrics (e.g., ROUGE-2, ROUGE-L)
rouge_2_f1 = scores[0]["rouge-2"]["f"]
rouge_l_f1 = scores[0]["rouge-l"]["f"]

# Print the scores
print("ROUGE-2 F1 Score:", rouge_2_f1)
print("ROUGE-L F1 Score:", rouge_l_f1)

ROUGE-2 F1 Score: 0.08053690791585993
ROUGE-L F1 Score: 0.2079999950924801


---

The ROUGE-2 F1 score is approximately 0.0805, indicating that the extractive summary has a relatively low overlap of consecutive word pairs (bigrams) with the reference summary. This suggests that the extractive summary may not effectively capture important phrases and content from the reference summary.

The ROUGE-L F1 score is approximately 0.2080, indicating a slightly higher but still relatively low score. This suggests that while the extractive summary may share some common subsequence of words with the reference summary, there is still significant room for improvement in terms of replicating the structural and content-related aspects of the reference summary.

In summary, both ROUGE-2 and ROUGE-L F1 scores suggest that the extractive summary falls short in terms of content overlap and structural coherence when compared to the reference summary. This leads us to the need for a more robust state-of-the-art model.

---