<a href="https://colab.research.google.com/github/teja-1403/TextSummarization-Using-PEGASUS-BART/blob/main/Text_Summazation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# News Article Summarization Using PEGASUS and BART Model

In this project, we explore the task of news article summarization using two state-of-the-art models: PEGASUS and BART. We will compare the effectiveness of both models in terms of generating human-like, coherent, and concise summaries. The dataset consists of articles with the following features: Newspaper Name, Published Date, URL, Headline, Content, Human Summary, and Category. Our goal is to generate automatic summaries and evaluate their performance based on various metrics.

**1. Libraries and Dependencies**

In this section, we will install and import the necessary libraries to load, preprocess, and train the models for summarization tasks.

In [2]:
# Install required libraries
!pip install transformers datasets rouge-score

Collecting datasets
  Downloading datasets-3.2.0-py3-none-any.whl.metadata (20 kB)
Collecting rouge-score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.9.0,>=2023.1.0 (from fsspec[http]<=2024.9.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.9.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-3.2.0-py3-none-any.whl (480 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m15.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [24]:
# Import necessary libraries
import pandas as pd
import nltk
from nltk.tokenize import word_tokenize
from datasets import Dataset
from transformers import PegasusForConditionalGeneration, PegasusTokenizer, Trainer, TrainingArguments
from sklearn.model_selection import train_test_split
from rouge_score import rouge_scorer
import torch

**2. Loading and Preprocessing the Data**
   
We will load the dataset and preprocess it for use with PEGASUS and BART models.

In [6]:
# Load the dataset
data_path = "/Copy of Synthetic News Dataset - Synthetic News Dataset.csv"
df = pd.read_csv(data_path)

In [7]:
# Display dataset information
print(df.head())

   Sr. No    Newspaper Name Published Date  \
0       1         The Hindu     2023-12-01   
1       2   Hindustan Times     2022-08-15   
2       3    Indian Express     2021-04-10   
3       4     The Telegraph     2023-05-18   
4       5  Deccan Chronicle     2020-10-05   

                                                 URL  \
0  https://www.thehindu.com/news/national/sample-...   
1  https://www.hindustantimes.com/india/sample-ne...   
2   https://www.indianexpress.com/news/sample-news-3   
3  https://www.telegraphindia.com/nation/sample-n...   
4  https://www.deccanchronicle.com/nation/sample-...   

                                            Headline  \
0        "India Launches Chandrayaan-4 Successfully"   
1  "PM Announces Digital India 2.0 on Independenc...   
2              "Economic Growth Rebounds in Q1 2021"   
3  "Cyclone Yaas Causes Widespread Damage in East...   
4         "Hyderabad Emerges as India’s Vaccine Hub"   

                                             Cont

In [8]:
print(df.describe())

           Sr. No
count  112.000000
mean    56.500000
std     32.475632
min      1.000000
25%     28.750000
50%     56.500000
75%     84.250000
max    112.000000


In [9]:
print(df.columns)

Index(['Sr. No', 'Newspaper Name', 'Published Date', 'URL', 'Headline',
       'Content', 'Human Summary', 'Category'],
      dtype='object')


**3. Preprocessing the Dataset**
   
We preprocess the Content and Human Summary columns by stripping any unwanted spaces.

In [10]:
# Preprocessing
def preprocess_text(text):
    return text.strip()

# Preprocessing the 'Content' and 'Human Summary' columns
df['article'] = df['Content'].apply(preprocess_text)
df['summary'] = df['Human Summary'].apply(preprocess_text)

**4. Split the Dataset into Train, Validation, and Test Sets**
   
We split the dataset into training, validation, and test sets for model evaluation.

In [11]:
# Split the dataset into train, validation, and test sets (60% train, 20% test, 20% validation)
train_df, temp_df = train_test_split(df, test_size=0.4, random_state=42)  # 60% for train, 40% for temp (split into val + test)
val_df, test_df = train_test_split(temp_df, test_size=0.5, random_state=42)  # Split remaining 40% into 20% val and 20% test

print("Training Data Size:", len(train_df))
print("Validation Data Size:", len(val_df))
print("Test Data Size:", len(test_df))

Training Data Size: 67
Validation Data Size: 22
Test Data Size: 23


**5. Convert DataFrames to Hugging Face Dataset Format**
   
We convert the pandas DataFrames into the Hugging Face Dataset format.

In [12]:
# Convert the DataFrames to the Hugging Face Dataset format
train_dataset = Dataset.from_pandas(train_df[['article', 'summary']])
val_dataset = Dataset.from_pandas(val_df[['article', 'summary']])
test_dataset = Dataset.from_pandas(test_df[['article', 'summary']])

**6. Load Pre-trained PEGASUS Model and Tokenizer**
   
We load the pre-trained PEGASUS model and tokenizer from Hugging Face.

In [13]:
# Load pre-trained PEGASUS model and tokenizer
model_name = "google/pegasus-xsum"
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/87.0 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/1.91M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/3.52M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.39k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-xsum and are newly initialized: ['model.decoder.embed_positions.weight', 'model.encoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


generation_config.json:   0%|          | 0.00/259 [00:00<?, ?B/s]

**7. Tokenize the Data**

We tokenize the text data for both the input articles and target summaries.

In [14]:
# Tokenization function
def tokenize_function(examples):
    model_inputs = tokenizer(examples['article'], padding="max_length", truncation=True, max_length=512)
    # Generate decoder_input_ids for the summary
    with tokenizer.as_target_tokenizer():
        labels = tokenizer(examples['summary'], padding="max_length", truncation=True, max_length=150)
    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

# Apply tokenization
train_dataset = train_dataset.map(tokenize_function, batched=True)
val_dataset = val_dataset.map(tokenize_function, batched=True)
test_dataset = test_dataset.map(tokenize_function, batched=True)

Map:   0%|          | 0/67 [00:00<?, ? examples/s]



Map:   0%|          | 0/22 [00:00<?, ? examples/s]

Map:   0%|          | 0/23 [00:00<?, ? examples/s]

In [15]:
# Training Arguments
training_args = TrainingArguments(
    output_dir='./results',
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    num_train_epochs=10,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=10,
    save_total_limit=2,
    report_to="none"
)



In [16]:
from transformers import DataCollatorForSeq2Seq

# Define the DataCollator for PEGASUS
data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)

In [17]:
# Define Trainer
trainer = Trainer(
    model=model,                         # The model being trained
    args=training_args,                  # The training arguments
    train_dataset=train_dataset,         # Training dataset
    eval_dataset=val_dataset,            # Validation dataset
    tokenizer=tokenizer,                 # Tokenizer
    data_collator=data_collator,         # Data collator for Seq2Seq
)

  trainer = Trainer(


**8. Train the Model**
   
We start training the PEGASUS model on the training dataset.

In [18]:
# Train the model
trainer.train()

Epoch,Training Loss,Validation Loss
1,6.1415,5.596849
2,7.1695,5.352637
3,5.6634,5.205256
4,6.0772,5.117069
5,5.4577,5.054686
6,5.9066,5.015361
7,5.7254,4.991099
8,5.7591,4.971573
9,4.6919,4.963615
10,5.2133,4.959926




TrainOutput(global_step=340, training_loss=5.578684144861558, metrics={'train_runtime': 381.1528, 'train_samples_per_second': 1.758, 'train_steps_per_second': 0.892, 'total_flos': 967970578759680.0, 'train_loss': 5.578684144861558, 'epoch': 10.0})

**9. Evaluate the Model Using ROUGE Metrics**
    
We define a function to evaluate the model's output using ROUGE scores and Average Precision.

In [19]:
# Evaluation: Generate summaries for test set
def generate_summary(texts):
    device = "cuda" if torch.cuda.is_available() else "cpu"  # Use GPU if available
    model.to(device)  # Move the model to the correct device

    # Tokenize the input and move tensors to the correct device
    inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=512).to(device)

    # Generate summaries
    summaries = model.generate(inputs['input_ids'], max_length=150, num_beams=4, early_stopping=True)

    # Decode the generated summaries
    return tokenizer.decode(summaries[0], skip_special_tokens=True)

# Generate summaries for the test set
test_df['generated_summary'] = test_df['article'].apply(generate_summary)

In [23]:
# Evaluation
nltk.download('punkt_tab')

def evaluate_metrics(predictions, references):
    # Initialize ROUGE scorer
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
    scores = [scorer.score(ref, pred) for ref, pred in zip(references, predictions)]

    # Compute ROUGE scores
    rouge1 = sum(score['rouge1'].fmeasure for score in scores) / len(scores)
    rouge2 = sum(score['rouge2'].fmeasure for score in scores) / len(scores)
    rougeL = sum(score['rougeL'].fmeasure for score in scores) / len(scores)

    # Compute Average Precision
    precisions = []
    for pred, ref in zip(predictions, references):
        pred_tokens = word_tokenize(pred.lower())  # Tokenize prediction
        ref_tokens = word_tokenize(ref.lower())    # Tokenize reference
        pred_set, ref_set = set(pred_tokens), set(ref_tokens)

        # Precision calculation
        true_positives = len(pred_set & ref_set)
        predicted_positives = len(pred_set)
        precisions.append(true_positives / predicted_positives if predicted_positives > 0 else 0)

    avg_precision = sum(precisions) / len(precisions) if precisions else 0

    return rouge1, rouge2, rougeL, avg_precision

# Calculate metrics for the test set
rouge1, rouge2, rougeL, avg_precision = evaluate_metrics(
    test_df['generated_summary'], test_df['summary']
)

# Display the evaluation results
print(f"Evaluation Metrics of PEGASUS:")
print(f"ROUGE-1: {rouge1:.4f}")
print(f"ROUGE-2: {rouge2:.4f}")
print(f"ROUGE-L: {rougeL:.4f}")
print(f"Average Precision: {avg_precision:.4f}")

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


Evaluation Metrics of PEGASUS:
ROUGE-1: 0.4103
ROUGE-2: 0.2144
ROUGE-L: 0.3142
Average Precision: 0.6169


**10. Summarize Text with PEGASUS Model**
    
Here, we use the PEGASUS model to generate summaries for new text samples.

In [25]:
# Function for summarization
# Set device (use GPU if available, otherwise fall back to CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


# Function for summarization
def summarize_text(text, model, tokenizer, max_length=120):
    """Generates a summary for the given text using the Pegasus model."""
    # Tokenize the input text and move tensors to the same device as the model
    tokens = tokenizer(text, truncation=True, padding="longest", return_tensors="pt").to(device)

    # Generate the summary
    summary_ids = model.generate(
        tokens["input_ids"],
        max_length=max_length,
        num_beams=5,
        early_stopping=True
    )

    # Decode the generated summary
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary

# Example Usage
# Sample text to summarize
sample_text = """
The meaning of NLP is Natural Language Processing (NLP) which is a fascinating and rapidly evolving field that intersects computer science, artificial intelligence, and linguistics.
NLP focuses on the interaction between computers and human language, enabling machines to understand, interpret, and generate human language in a way that is both meaningful and useful.
With the increasing volume of text data generated every day, from social media posts to research articles, NLP has become an essential tool for extracting valuable insights and automating various tasks.
In this article, we will explore the fundamental concepts and techniques of Natural Language Processing, shedding light on how it transforms raw text into actionable information.
From tokenization and parsing to sentiment analysis and machine translation, NLP encompasses a wide range of applications that are reshaping industries and enhancing human-computer interactions.
Whether you are a seasoned professional or new to the field, this overview will provide you with a comprehensive understanding of NLP and its significance in today’s digital age.
"""

# Generate and print the summary
summary = summarize_text(sample_text, model, tokenizer)
print("Original Text:", sample_text)
print("\nGenerated Summary (PEGASUS):", summary)

Original Text: 
The meaning of NLP is Natural Language Processing (NLP) which is a fascinating and rapidly evolving field that intersects computer science, artificial intelligence, and linguistics. 
NLP focuses on the interaction between computers and human language, enabling machines to understand, interpret, and generate human language in a way that is both meaningful and useful. 
With the increasing volume of text data generated every day, from social media posts to research articles, NLP has become an essential tool for extracting valuable insights and automating various tasks.
In this article, we will explore the fundamental concepts and techniques of Natural Language Processing, shedding light on how it transforms raw text into actionable information. 
From tokenization and parsing to sentiment analysis and machine translation, NLP encompasses a wide range of applications that are reshaping industries and enhancing human-computer interactions. 
Whether you are a seasoned professi

**11. Save the Fine-Tuned Model and Tokenizer**

Finally, we save the fine-tuned PEGASUS model and tokenizer for later use.

In [26]:
# Save the fine-tuned Pegasus model
model.save_pretrained("/colab/working/pegasus_model")
print("Pegasus model saved!!!")
# Save the tokenizer
tokenizer.save_pretrained("/colab/working/pegasus_tokenizer")

Pegasus model saved!!!


('/colab/working/pegasus_tokenizer/tokenizer_config.json',
 '/colab/working/pegasus_tokenizer/special_tokens_map.json',
 '/colab/working/pegasus_tokenizer/spiece.model',
 '/colab/working/pegasus_tokenizer/added_tokens.json')

**12. Compare PEGASUS with BART Model**
    
Now, we compare the PEGASUS model's performance with the BART model on summarization tasks.

In [27]:
# Compare with another summarization model (BART)
from transformers import BartTokenizer, BartForConditionalGeneration

In [28]:
# Initialize BART model and tokenizer
bart_model = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')
bart_tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn')

config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [29]:
# Evaluation: Generate summaries for test set
def generate_bart_summary(texts):
    device = "cuda" if torch.cuda.is_available() else "cpu"  # Use GPU if available
    bart_model.to(device)  # Move the model to the correct device

    # Tokenize the input and move tensors to the correct device
    inputs = bart_tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=512).to(device)

    # Generate summaries
    summaries = bart_model.generate(inputs['input_ids'], max_length=150, num_beams=4, early_stopping=True)

    # Decode the generated summaries
    return bart_tokenizer.decode(summaries[0], skip_special_tokens=True)

# Generate summaries for the test set
test_df['generated_summary'] = test_df['article'].apply(generate_bart_summary)


In [30]:
rouge1, rouge2, rougeL, avg_precision = evaluate_metrics(
    test_df['generated_summary'], test_df['summary']
)

# Display the evaluation results
print(f"Evaluation Metrics of BART:")
print(f"ROUGE-1: {rouge1:.4f}")
print(f"ROUGE-2: {rouge2:.4f}")
print(f"ROUGE-L: {rougeL:.4f}")
print(f"Average Precision: {avg_precision:.4f}")

Evaluation Metrics of BART:
ROUGE-1: 0.4258
ROUGE-2: 0.2063
ROUGE-L: 0.3060
Average Precision: 0.5170


In [31]:
# Set device (use GPU if available, otherwise fall back to CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Function for summarization
def summarize_text(text, model, tokenizer, max_length=120):
    """Generates a summary for the given text using the BART model."""
    # Move the model to the same device
    model.to(device)

    # Tokenize the input text and move tensors to the same device as the model
    tokens = tokenizer(text, truncation=True, padding="longest", return_tensors="pt").to(device)

    # Generate the summary
    summary_ids = model.generate(
        tokens["input_ids"],
        max_length=max_length,
        num_beams=5,
        early_stopping=True
    )

    # Decode the generated summary
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary


# Sample text to summarize
sample_text = """
The meaning of NLP is Natural Language Processing (NLP) which is a fascinating and rapidly evolving field that intersects computer science, artificial intelligence, and linguistics.
NLP focuses on the interaction between computers and human language, enabling machines to understand, interpret, and generate human language in a way that is both meaningful and useful.
With the increasing volume of text data generated every day, from social media posts to research articles, NLP has become an essential tool for extracting valuable insights and automating various tasks.
In this article, we will explore the fundamental concepts and techniques of Natural Language Processing, shedding light on how it transforms raw text into actionable information.
From tokenization and parsing to sentiment analysis and machine translation, NLP encompasses a wide range of applications that are reshaping industries and enhancing human-computer interactions.
Whether you are a seasoned professional or new to the field, this overview will provide you with a comprehensive understanding of NLP and its significance in today’s digital age.
"""

# Generate and print the summary
summary_bart = summarize_text(sample_text, bart_model, bart_tokenizer)
print("Original Text:", sample_text)
print("\nGenerated Summary (BART):", summary_bart)


Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


Original Text: 
The meaning of NLP is Natural Language Processing (NLP) which is a fascinating and rapidly evolving field that intersects computer science, artificial intelligence, and linguistics. 
NLP focuses on the interaction between computers and human language, enabling machines to understand, interpret, and generate human language in a way that is both meaningful and useful. 
With the increasing volume of text data generated every day, from social media posts to research articles, NLP has become an essential tool for extracting valuable insights and automating various tasks.
In this article, we will explore the fundamental concepts and techniques of Natural Language Processing, shedding light on how it transforms raw text into actionable information. 
From tokenization and parsing to sentiment analysis and machine translation, NLP encompasses a wide range of applications that are reshaping industries and enhancing human-computer interactions. 
Whether you are a seasoned professi

In [32]:
# Save the Bart model
bart_model.save_pretrained("/colab/working/bart_model")
print("Bart model saved!!!")
# Save the tokenizer
bart_tokenizer.save_pretrained("/colab/working/bart_tokenizer")




Bart model saved!!!


('/colab/working/bart_tokenizer/tokenizer_config.json',
 '/colab/working/bart_tokenizer/special_tokens_map.json',
 '/colab/working/bart_tokenizer/vocab.json',
 '/colab/working/bart_tokenizer/merges.txt',
 '/colab/working/bart_tokenizer/added_tokens.json')

### **Conclusions:**

1. **Performance of BART:**
   BART demonstrated strong summarization capabilities with ROUGE-1, ROUGE-2, and ROUGE-L scores of 0.4258, 0.2063, and 0.3060, respectively. These results indicate that BART effectively generates summaries with a good balance between coverage and precision. The Average Precision score of 0.5170 highlights its ability to rank relevant content accurately in the generated summaries.

2. **Performance of PEGASUS:**
   PEGASUS, after being fine-tuned and trained, produced ROUGE-1, ROUGE-2, and ROUGE-L scores of 0.4103, 0.2144, and 0.3142, respectively. While its ROUGE-1 score was slightly lower than BART's, PEGASUS outperformed BART in ROUGE-2 and ROUGE-L, indicating its superior ability to generate coherent summaries with more precise information. Additionally, PEGASUS achieved a higher Average Precision score of 0.6169, further emphasizing its strength in prioritizing key information. This demonstrates the effectiveness of fine-tuning in improving PEGASUS's overall summarization performance, making it a competitive choice alongside BART.

3. **Fine-Tuning Impact:**
   The fine-tuning of PEGASUS significantly improved its precision in capturing relevant content, as evidenced by its higher ROUGE-2 and ROUGE-L scores compared to BART. However, despite these improvements, PEGASUS's overall performance was comparable to BART, which was not fine-tuned for the task. This highlights that while fine-tuning enhances model performance, its effectiveness depends on the quality, size, and diversity of the training data used.

4. **Dataset and Model Behavior:**
   The dataset's limited size was a significant constraint in fully leveraging the potential of these models. The results suggest that both pre-trained and fine-tuned models require larger and more diverse datasets to reach their full capability, particularly for tasks involving complex and varied text like news articles.

---

### **Future Scope:**

1. **Larger and More Diverse Dataset:**
   A larger, more diverse dataset would likely help in improving the performance of both BART and PEGASUS. Expanding the dataset could better represent the variety of topics and writing styles in news articles, allowing the models to better generalize.

2. **Fine-Tuning with Larger Data:**
   Future work should involve fine-tuning PEGASUS on a much larger and more diverse dataset. Given its potential for summarization, PEGASUS could perform better if trained with sufficient data, especially with more domain-specific text. The performance improvement could be particularly noticeable in tasks requiring nuanced understanding.

3. **Model Comparison with More Models:**
   To further enhance the evaluation, more summarization models like T5, BERTSUM, and GPT-based models should be incorporated into future experiments. This would provide a better understanding of the strengths and weaknesses of various models and lead to more informed decisions regarding model selection.

4. **Hyperparameter Optimization:**
   Both BART and PEGASUS could benefit from hyperparameter tuning. Fine-tuning the learning rate, batch size, and other model-specific hyperparameters may improve the overall performance, particularly for PEGASUS, which might have been affected by the current settings.

5. **Incorporating Human Evaluation:**
   While ROUGE scores offer valuable insight into the performance of the models, human evaluation is necessary for assessing summary quality in terms of readability, coherence, and informativeness. This can be especially useful for real-world applications where user experience is critical.

6. **Real-Time Summarization:**
   Optimizing the models for real-time summarization can be explored. The current models, although effective, could benefit from speed improvements for applications that require on-the-fly summarization, such as in news aggregation or content curation platforms.

7. **Cross-Domain Summarization:**
   Future research could involve applying these models to other domains such as healthcare, technology, or scientific research. Domain-specific fine-tuning can significantly enhance the model’s ability to generate accurate and informative summaries tailored to particular industries or topics.
