## Detailed Article Explaination

The detailed code explanation for this article is available at the following link:

https://www.daniweb.com/programming/computer-science/tutorials/542292/gpt-4o-snapshot-vs-meta-llama-3-1-70b-for-zero-shot-text-summarization
    
For my other articles for Daniweb.com, please see this link:

https://www.daniweb.com/members/1235222/usmanmalik57

Importing and Installing Required Libraries

## Installing and Importing Required Libraries

In [6]:
!pip install openai
!pip install groq
!pip install rouge-score
!pip install --upgrade openpyxl
!pip install pandas openpyxl



In [8]:
import os
import time
import pandas as pd
from rouge_score import rouge_scorer
from openai import OpenAI
from groq import Groq

  from pandas.core import (


## Importing the Dataset

In [35]:
# Kaggle dataset download link
# https://github.com/reddzzz/DataScience_FP/blob/main/dataset.xlsx


dataset = pd.read_excel(r"D:\Datasets\dataset.xlsx")
dataset = dataset.sample(frac=1)
dataset['summary_length'] = dataset['human_summary'].apply(len)
average_length = dataset['summary_length'].mean()
print(f"Average length of summaries: {average_length:.2f} characters")
print(dataset.shape)
dataset.head()

Average length of summaries: 1168.78 characters
(1000, 11)


Unnamed: 0.1,Unnamed: 0,id,human_summary,publication,author,date,year,month,theme,content,summary_length
578,259,17966,President Donald J. Trump’s decision to build ...,New York Times,Azam Ahmed,2017-01-27,2017.0,1.0,politics,MEXICO CITY — President Donald J. Trump’s d...,1259
790,259,18229,Pinault the chief executive of kering suggeste...,New York Times,Vanessa Friedman,2017-02-03,2017.0,2.0,entertainment,Yet another seismic shift is taking place in F...,993
390,259,17751,"Senator Bernie Sanders of Vermont, an independ...",New York Times,"Jennifer Steinhauer, John Schwartz, Emmarie Hu...",2017-01-19,2017.0,1.0,politics,Democrats complained mightily about their ques...,1028
25,0,17314,Istanbul the Islamic state on Monday issued a ...,New York Times,Tim Arango,2017-01-03,2017.0,1.0,crime,ISTANBUL — The Islamic State on Monday issu...,879
265,259,17599,Good morning. Here’s what you need to know: • ...,New York Times,Charles McDermid,2017-01-13,2017.0,1.0,politics,Good morning. Here’s what you need to know: •...,891


## Text Summarization with GPT-4o Snapshot

In [26]:
client = OpenAI(
    # This is the default and can be omitted
    api_key = os.environ.get('OPENAI_API_KEY'),
)

# Function to calculate ROUGE scores
def calculate_rouge(reference, candidate):
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
    scores = scorer.score(reference, candidate)
    return {key: value.fmeasure for key, value in scores.items()}


In [21]:
%%time

results = []

i = 0

for _, row in dataset[:20].iterrows():
    article = row['content']
    human_summary = row['human_summary']
    
    i = i + 1
    print(f"Summarizing article {i}.")
    
    prompt = f"Summarize the following article in 1150 characters. The summary should look like human created:\n\n{article}\n\nSummary:"
    
    response = client.chat.completions.create(
        model= "gpt-4o-2024-08-06",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=1150,
        temperature=0.7
    )
    generated_summary = response.choices[0].message.content
    rouge_scores = calculate_rouge(human_summary, generated_summary)
    
    results.append({
    'article_id': row.id,
    'generated_summary': generated_summary,
    'rouge1': rouge_scores['rouge1'],
    'rouge2': rouge_scores['rouge2'],
    'rougeL': rouge_scores['rougeL']
    })

Summarizing article 1.
Summarizing article 2.
Summarizing article 3.
Summarizing article 4.
Summarizing article 5.
Summarizing article 6.
Summarizing article 7.
Summarizing article 8.
Summarizing article 9.
Summarizing article 10.
Summarizing article 11.
Summarizing article 12.
Summarizing article 13.
Summarizing article 14.
Summarizing article 15.
Summarizing article 16.
Summarizing article 17.
Summarizing article 18.
Summarizing article 19.
Summarizing article 20.
CPU times: total: 391 ms
Wall time: 59.4 s


In [22]:
results_df = pd.DataFrame(results)
mean_values = results_df[["rouge1", "rouge2", "rougeL"]].mean()
print(mean_values)

rouge1    0.386724
rouge2    0.100371
rougeL    0.187491
dtype: float64


In [27]:
def llm_evaluate_summary(article, summary):
    prompt = f"""Evaluate the following summary for the given article. Rate it on a scale of 1-10 for:
    1. Completeness: Does it capture all key points?
    2. Conciseness: Is it brief and to the point?
    3. Coherence: Is it well-structured and easy to understand?

    Article: {article}

    Summary: {summary}

    Provide the ratings as a comma-separated list (completeness,conciseness,coherence).
    """
    response = client.chat.completions.create(
        model= "gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=100,
        temperature=0.7
    )
    return [float(score) for score in response.choices[0].message.content.strip().split(',')]

In [28]:
scores_dict = {'completeness': [], 'conciseness': [], 'coherence': []}

i = 0
for _, row in results_df.iterrows():
    i = i + 1
    # Corrected method to access content by article_id
    article = dataset.loc[dataset['id'] == row['article_id'], 'content'].iloc[0]
    scores = llm_evaluate_summary(article, row['generated_summary'])
    print(f"Article ID: {row['article_id']}, Scores: {scores}")

    # Store the scores in the dictionary
    scores_dict['completeness'].append(scores[0])
    scores_dict['conciseness'].append(scores[1])
    scores_dict['coherence'].append(scores[2])

Article ID: 18009, Scores: [8.0, 9.0, 9.0]
Article ID: 17531, Scores: [8.0, 9.0, 9.0]
Article ID: 18159, Scores: [8.0, 9.0, 9.0]
Article ID: 18450, Scores: [8.0, 7.0, 9.0]
Article ID: 17751, Scores: [8.0, 9.0, 8.0]
Article ID: 17648, Scores: [8.0, 9.0, 9.0]
Article ID: 17937, Scores: [8.0, 9.0, 9.0]
Article ID: 17469, Scores: [8.0, 9.0, 9.0]
Article ID: 17955, Scores: [8.0, 9.0, 9.0]
Article ID: 18298, Scores: [8.0, 9.0, 9.0]
Article ID: 17367, Scores: [8.0, 9.0, 9.0]
Article ID: 17542, Scores: [8.0, 9.0, 8.0]
Article ID: 17606, Scores: [8.0, 9.0, 9.0]
Article ID: 17800, Scores: [8.0, 9.0, 9.0]
Article ID: 18329, Scores: [8.0, 9.0, 9.0]
Article ID: 17887, Scores: [8.0, 9.0, 8.0]
Article ID: 18275, Scores: [8.0, 9.0, 8.0]
Article ID: 18297, Scores: [8.0, 9.0, 9.0]
Article ID: 18274, Scores: [8.0, 9.0, 9.0]
Article ID: 18126, Scores: [8.0, 9.0, 9.0]


In [29]:
# Calculate the average scores
average_scores = {
    'completeness': sum(scores_dict['completeness']) / len(scores_dict['completeness']),
    'conciseness': sum(scores_dict['conciseness']) / len(scores_dict['conciseness']),
    'coherence': sum(scores_dict['coherence']) / len(scores_dict['coherence']),
}

# Convert to DataFrame for better visualization (optional)
average_scores_df = pd.DataFrame([average_scores])
average_scores_df.columns = ['Completeness', 'Conciseness', 'Coherence']

# Display the DataFrame
average_scores_df.head()

Unnamed: 0,Completeness,Conciseness,Coherence
0,8.0,8.9,8.8


## Text Summarization with Llama 3.1 70b

In [30]:
client = Groq(
    api_key=os.environ.get("GROQ_API_KEY"),
)

In [32]:
%%time

results = []

i = 0

for _, row in dataset[:20].iterrows():
    article = row['content']
    human_summary = row['human_summary']
    
    i = i + 1
    print(f"Summarizing article {i}.")
    
    prompt = f"Summarize the following article in 1150 characters. The summary should look like human created:\n\n{article}\n\nSummary:"
    
    response = client.chat.completions.create(
          model="llama-3.1-70b-versatile",
          temperature = 0.7,
          max_tokens = 1150,
          messages=[
                {"role": "user", "content":  prompt}
            ]
    )

    generated_summary = response.choices[0].message.content
    rouge_scores = calculate_rouge(human_summary, generated_summary)
    
    results.append({
    'article_id': row.id,
    'generated_summary': generated_summary,
    'rouge1': rouge_scores['rouge1'],
    'rouge2': rouge_scores['rouge2'],
    'rougeL': rouge_scores['rougeL']
    })

Summarizing article 1.
Summarizing article 2.
Summarizing article 3.
Summarizing article 4.
Summarizing article 5.
Summarizing article 6.
Summarizing article 7.
Summarizing article 8.
Summarizing article 9.
Summarizing article 10.
Summarizing article 11.
Summarizing article 12.
Summarizing article 13.
Summarizing article 14.
Summarizing article 15.
Summarizing article 16.
Summarizing article 17.
Summarizing article 18.
Summarizing article 19.
Summarizing article 20.
CPU times: total: 281 ms
Wall time: 24.7 s


In [34]:
results_df = pd.DataFrame(results)
mean_values = results_df[["rouge1", "rouge2", "rougeL"]].mean()
print(mean_values)

rouge1    0.335863
rouge2    0.080865
rougeL    0.170834
dtype: float64


In [33]:
client = OpenAI(
    # This is the default and can be omitted
    api_key = os.environ.get('OPENAI_API_KEY'),
)

scores_dict = {'completeness': [], 'conciseness': [], 'coherence': []}

i = 0
for _, row in results_df.iterrows():
    i = i + 1
    # Corrected method to access content by article_id
    article = dataset.loc[dataset['id'] == row['article_id'], 'content'].iloc[0]
    scores = llm_evaluate_summary(article, row['generated_summary'])
    print(f"Article ID: {row['article_id']}, Scores: {scores}")

    # Store the scores in the dictionary
    scores_dict['completeness'].append(scores[0])
    scores_dict['conciseness'].append(scores[1])
    scores_dict['coherence'].append(scores[2])
    
# Calculate the average scores
average_scores = {
    'completeness': sum(scores_dict['completeness']) / len(scores_dict['completeness']),
    'conciseness': sum(scores_dict['conciseness']) / len(scores_dict['conciseness']),
    'coherence': sum(scores_dict['coherence']) / len(scores_dict['coherence']),
}

# Convert to DataFrame for better visualization (optional)
average_scores_df = pd.DataFrame([average_scores])
average_scores_df.columns = ['Completeness', 'Conciseness', 'Coherence']

# Display the DataFrame
average_scores_df.head()

Article ID: 18009, Scores: [8.0, 9.0, 9.0]
Article ID: 17531, Scores: [8.0, 9.0, 8.0]
Article ID: 18159, Scores: [8.0, 9.0, 9.0]
Article ID: 18450, Scores: [8.0, 7.0, 9.0]
Article ID: 17751, Scores: [8.0, 9.0, 9.0]
Article ID: 17648, Scores: [8.0, 9.0, 8.0]
Article ID: 17937, Scores: [8.0, 9.0, 9.0]
Article ID: 17469, Scores: [8.0, 9.0, 9.0]
Article ID: 17955, Scores: [8.0, 9.0, 8.0]
Article ID: 18298, Scores: [8.0, 9.0, 9.0]
Article ID: 17367, Scores: [8.0, 9.0, 9.0]
Article ID: 17542, Scores: [8.0, 9.0, 8.0]
Article ID: 17606, Scores: [8.0, 9.0, 9.0]
Article ID: 17800, Scores: [8.0, 9.0, 9.0]
Article ID: 18329, Scores: [8.0, 9.0, 9.0]
Article ID: 17887, Scores: [8.0, 9.0, 8.0]
Article ID: 18275, Scores: [8.0, 9.0, 8.0]
Article ID: 18297, Scores: [8.0, 9.0, 9.0]
Article ID: 18274, Scores: [8.0, 9.0, 8.0]
Article ID: 18126, Scores: [8.0, 9.0, 9.0]


Unnamed: 0,Completeness,Conciseness,Coherence
0,8.0,8.9,8.65
