# Summarization

***

In this notebook, you will work with two different approaches for text summarization - extractive and abstractive summarization. We ask you to create summaries for both approaches based on the CNN-Dailynews data set, evaluate both approaches, and discuss your results.




In [1]:
!pip install bert-extractive-summarizer rouge-score -q --disable-pip-version-check --root-user-action=ignore

In [2]:
from transformers import logging, pipeline
from rouge_score import rouge_scorer
from summarizer import Summarizer
from tqdm.auto import tqdm
import numpy as np
import pandas as pd
import torch
import json
import csv
import os

os.environ["KMP_SETTINGS"] = "false"
logging.set_verbosity_error()
tqdm.pandas()

In [3]:
from IPython.display import display, HTML

def displaySummary(article = "", summary_ext = "", summary_abs = "", title = ""):
    
    msg = "<div style='position:relative;padding:0.75rem 1.25rem;\
            margin-bottom:1rem;border:1px solid transparent;\
            border-radius:.25rem;background-color:#fdf7e2;\
            border-color:#D6B656;color:#3c4046'>"
    if title: 
        msg += "<h2>"+ title +"</h2>"
    if article: 
        msg += "<b>Article:</b><br>" + article
    if summary_ext:
        msg += "<hr><b>Summary (Extractive):</b><br>" + summary_ext
    if summary_abs:
         msg += "<hr><b>Summary (Abstractive):</b><br>" + summary_abs
            
    msg += "</div>"
    display(HTML(msg))


## Task 1: Extractive Summarization

Extractive summarization is the task of summarizing a longer document by 
selecting a subset of sentences that (1) contain the key points of the 
original text, and (2) have minimal overlap. Since the content of the 
sentences is not changed, this can be modelled as a (joint) 
classification task. In this part of the assignment, we ask you to 
perform extractive summarization on the [CNN/DailyMail news data set](https://aclanthology.org/K16-1028.pdf) (test set) and 
evaluate your performance using the [ROUGE metric](https://huggingface.co/spaces/evaluate-metric/rouge).

Please implement the following steps:
1. Add the CNN_dailynews data set to your notebook and load the data. For our evaluative purposes, we only need the test data set.
Link to Kaggle dataset: https://www.kaggle.com/gowrishankarp/newspaper-text-summarization-cnn-dailymail.
2. Use the [Summarizer()](https://pypi.org/project/bert-extractive-summarizer/) class to create summaries for the first 200 articles. We already pre-define the code for downloading the model, which is a BERT-based architecture tweaked for extractive text summarization. The Summarizer() provides high-level implementations of summarization models so that the user does not have to worry much about coding. A tutorial on how to work with the class can be found here: https://analyticsindiamag.com/hands-on-guide-to-extractive-text-summarization-with-bertsum/. 
3. Calculate the average Rouge 1 and Rouge 2 scores based on your generated summary and the reference summary included in the CNN-dailynews corpus. 

*Hint:* As long as you have enabled GPU usage via the Accelerator tab, Summarizer() automatically uses GPU resources. You do not have to  push the model and data explicitly to the GPU.


In [4]:
# Read CNN/daily news data set 
path = "../input/newspaper-text-summarization-cnn-dailymail/cnn_dailymail/train.csv"
df = pd.read_csv(path, engine="python", index_col = False, sep=",", nrows = 200)
df.head()

Unnamed: 0,id,article,highlights
0,0001d1afc246a7964130f43ae940af6bc6c57f01,By . Associated Press . PUBLISHED: . 14:11 EST...,"Bishop John Folda, of North Dakota, is taking ..."
1,0002095e55fcbd3a2f366d9bf92a95433dc305ef,(CNN) -- Ralph Mata was an internal affairs li...,Criminal complaint: Cop used his role to help ...
2,00027e965c8264c35cc1bc55556db388da82b07f,A drunk driver who killed a young woman in a h...,"Craig Eccleston-Todd, 27, had drunk at least t..."
3,0002c17436637c4fe1837c935c04de47adb18e9a,(CNN) -- With a breezy sweep of his pen Presid...,Nina dos Santos says Europe must be ready to a...
4,0003ad6ef0c37534f80b55b4235108024b407f0b,Fleetwood are the only team still to have a 10...,Fleetwood top of League One after 2-0 win at S...


In [5]:
# Load model --> model is set to "bert-base-uncased" 
# which is not fine-tuned on the cnn_dailymail dataset
summarizer = Summarizer(model="distilbert-base-uncased")

Downloading:   0%|          | 0.00/483 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/256M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

In [6]:
# Get predicted summaries for first 200 articles

def getSummary(summarizer, article):
    result = summarizer(article, min_length = 20)
    summary = "".join(result)
    return summary

df["summary"] = df["article"].progress_apply(
    lambda row: getSummary(summarizer, row))

  0%|          | 0/200 [00:00<?, ?it/s]

In [7]:
# Print the first five articles
df[["highlights", "summary", "article"]].head()

Unnamed: 0,highlights,summary,article
0,"Bishop John Folda, of North Dakota, is taking ...","14:11 EST, 25 October 2013 . The bishop of the...",By . Associated Press . PUBLISHED: . 14:11 EST...
1,Criminal complaint: Cop used his role to help ...,(CNN) -- Ralph Mata was an internal affairs li...,(CNN) -- Ralph Mata was an internal affairs li...
2,"Craig Eccleston-Todd, 27, had drunk at least t...",A drunk driver who killed a young woman in a h...,A drunk driver who killed a young woman in a h...
3,Nina dos Santos says Europe must be ready to a...,(CNN) -- With a breezy sweep of his pen Presid...,(CNN) -- With a breezy sweep of his pen Presid...
4,Fleetwood top of League One after 2-0 win at S...,Fleetwood are the only team still to have a 10...,Fleetwood are the only team still to have a 10...


In [8]:
# Print the full content and the summary of a single article
displaySummary(df["article"][1], df["summary"][1])

In [9]:
# Calculate rouge 1 and rouge 2 score (precision,recall,and f1)

def calcRougeScore(highlights, summary):
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2'], use_stemmer=True)
    scores = scorer.score(highlights, summary)
    return scores

df["score"] = df[["highlights", "summary"]].progress_apply(
    lambda row: calcRougeScore(row[0], row[1]), axis = 1)

df["r1_precision"] = df["score"].progress_apply(lambda row: row["rouge1"][0])
df["r1_recall"] = df["score"].progress_apply(lambda row: row["rouge1"][1])
df["r1_fmeasure"] = df["score"].progress_apply(lambda row: row["rouge1"][2])

# Rouge 2
df["r2_precision"] = df["score"].progress_apply(lambda row: row["rouge2"][0])
df["r2_recall"] = df["score"].progress_apply(lambda row: row["rouge2"][1])
df["r2_fmeasure"] = df["score"].progress_apply(lambda row: row["rouge2"][2])

  0%|          | 0/200 [00:00<?, ?it/s]

  0%|          | 0/200 [00:00<?, ?it/s]

  0%|          | 0/200 [00:00<?, ?it/s]

  0%|          | 0/200 [00:00<?, ?it/s]

  0%|          | 0/200 [00:00<?, ?it/s]

  0%|          | 0/200 [00:00<?, ?it/s]

  0%|          | 0/200 [00:00<?, ?it/s]

In [10]:
# Print average scores (precision,recall,and f1)
df[["r1_precision", "r1_recall", "r1_fmeasure",
    "r2_precision", "r2_recall", "r2_fmeasure"]].mean()

r1_precision    0.209724
r1_recall       0.553023
r1_fmeasure     0.290870
r2_precision    0.077360
r2_recall       0.195652
r2_fmeasure     0.105586
dtype: float64

## Task 2: Abstractive Summarization

Unlike extractive summarization, abstractive summarization aims to 
produce a coherent summary of the input texts. Formally, it can be 
modelled as a Sequence-2-Sequence task. In this part of the assignment, we 
ask you to perform abstractive summarization on the CNN/DailyMail news 
data set and evaluate your performance using the ROUGE metric.

Please implement the following steps:
1. Use the *pipeline* framework provided by HuggingFace to create summaries for the first 200 articles of the CNN_dailynews data set. The pipeline framework includes high-level implementations for abstractive summarization. Set up a [summarization pipeline](https://huggingface.co/docs/transformers/v4.15.0/en/main_classes/pipelines#transformers.SummarizationPipeline) using the [distilbart-cnn-12-6](https://huggingface.co/sshleifer/distilbart-cnn-12-6) or [bart-base](https://huggingface.co/facebook/bart-base) model and its tokenizer. You are welcome to experiment with [other models](https://huggingface.co/models?pipeline_tag=summarization&sort=downloads) fine-tuned for summarization. Take care of truncating the input to the max length the model can process (1024) and enable GPU usage. The BART model uses an encoder-decoder architecture and achieves near state-of-the-art results in different summarization tasks. The official documentation of the pipeline framework can be found [here](https://huggingface.co/docs/transformers/v4.15.0/en/main_classes/pipelines#transformers.SummarizationPipeline). 
2. Calculate the Rouge 1 and Rouge 2 metric for your abstractive summaries.

In [11]:
# Instantiate summarization pipeline
model_name = "sshleifer/distilbart-cnn-12-6"
summarizer = pipeline("summarization", model=model_name, tokenizer=model_name, 
                      framework="pt", truncation=True, device = 0)

Downloading:   0%|          | 0.00/1.76k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.14G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

In [12]:
# Read CNN/daily news data set 
path = "../input/newspaper-text-summarization-cnn-dailymail/cnn_dailymail/train.csv"
df = pd.read_csv(path, engine="python", index_col = False, sep=",", nrows=200)
df.head()

Unnamed: 0,id,article,highlights
0,0001d1afc246a7964130f43ae940af6bc6c57f01,By . Associated Press . PUBLISHED: . 14:11 EST...,"Bishop John Folda, of North Dakota, is taking ..."
1,0002095e55fcbd3a2f366d9bf92a95433dc305ef,(CNN) -- Ralph Mata was an internal affairs li...,Criminal complaint: Cop used his role to help ...
2,00027e965c8264c35cc1bc55556db388da82b07f,A drunk driver who killed a young woman in a h...,"Craig Eccleston-Todd, 27, had drunk at least t..."
3,0002c17436637c4fe1837c935c04de47adb18e9a,(CNN) -- With a breezy sweep of his pen Presid...,Nina dos Santos says Europe must be ready to a...
4,0003ad6ef0c37534f80b55b4235108024b407f0b,Fleetwood are the only team still to have a 10...,Fleetwood top of League One after 2-0 win at S...


In [13]:
# Get predicted summaries for first 200 articles

def getSummaryAbstr(summarizer, article):
    result = summarizer(article, min_length = 60)
    summary = result[0]["summary_text"]
    return summary

article = df["article"][50]
getSummaryAbstr(summarizer, article)

' 42 per cent of those housed at Category A Whitemoor jail consider themselves to be of Islamic faith . More than a quarter of inmates in London jails are Muslim, figures show . Experts fear large numbers are being radicalised on the inside, where they say the spread of Jihadist ideas is rife . Shadow Justice Minister Sadiq Khan says ministers are not doing enough to tackle issue of radicalisation .'

In [14]:
df["summary"] = df["article"].progress_apply(lambda row: getSummaryAbstr(summarizer, row))

  0%|          | 0/200 [00:00<?, ?it/s]



In [15]:
df[["highlights", "summary"]].head()

Unnamed: 0,highlights,summary
0,"Bishop John Folda, of North Dakota, is taking ...",Bishop John Folda of the Fargo Catholic Dioce...
1,Criminal complaint: Cop used his role to help ...,Ralph Mata was an internal affairs lieutenant...
2,"Craig Eccleston-Todd, 27, had drunk at least t...","Craig Eccleston-Todd, 27, was driving home fr..."
3,Nina dos Santos says Europe must be ready to a...,EU must be ready to accept sanctions are a tw...
4,Fleetwood top of League One after 2-0 win at S...,Fleetwood are the only team still to have a 1...


In [16]:
# Print the full content and the summary of a single article
displaySummary(df["article"][1], None ,df["summary"][1])

In [17]:
# Calculate rouge 1 and rouge 2 score for abstractive summaries

def calcRougeScore(highlights, summary):
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2'], use_stemmer=True)
    scores = scorer.score(highlights, summary)
    return scores

df["score"] = df[["highlights", "summary"]].progress_apply(
    lambda row: calcRougeScore(row[0], row[1]), axis = 1)

# Rouge 1
df["r1_precision"] = df["score"].progress_apply(lambda row: row["rouge1"][0])
df["r1_recall"] = df["score"].progress_apply(lambda row: row["rouge1"][1])
df["r1_fmeasure"] = df["score"].progress_apply(lambda row: row["rouge1"][2])

# Rouge 2
df["r2_precision"] = df["score"].progress_apply(lambda row: row["rouge2"][0])
df["r2_recall"] = df["score"].progress_apply(lambda row: row["rouge2"][1])
df["r2_fmeasure"] = df["score"].progress_apply(lambda row: row["rouge2"][2])

  0%|          | 0/200 [00:00<?, ?it/s]

  0%|          | 0/200 [00:00<?, ?it/s]

  0%|          | 0/200 [00:00<?, ?it/s]

  0%|          | 0/200 [00:00<?, ?it/s]

  0%|          | 0/200 [00:00<?, ?it/s]

  0%|          | 0/200 [00:00<?, ?it/s]

  0%|          | 0/200 [00:00<?, ?it/s]

In [18]:
df[["r1_precision", "r1_recall", "r1_fmeasure",
    "r2_precision", "r2_recall", "r2_fmeasure"]].mean()

r1_precision    0.386087
r1_recall       0.544550
r1_fmeasure     0.440526
r2_precision    0.195945
r2_recall       0.274682
r2_fmeasure     0.222270
dtype: float64

## Task 3: Summarization Comparison

To compare the performance of extractive and abstractive summarization, 
we ask you to select 5 articles from our [news data set ](https://www.kaggle.com/datasets/julianschelb/newsdata)and repurpose the 
code from tasks 1 and 2 to create extractive and abstractive summaries. 
Compare your results and discuss the flaws of both approaches.


In [19]:
# Read news data
with open("/kaggle/input/newsdata/relevant_articles.json","rb") as d:
    df_articles = pd.json_normalize(json.load(d))
    
# Select five news articles
articles = df_articles.sample(n=5, random_state =42)

In [20]:
# Apply extractive summarization on selected articles
summarizer = Summarizer(model="bert-base-uncased")
articles["summary_ext"] = articles["text"].progress_apply(lambda row: getSummary(summarizer, row))

# Apply abstractive summarization on selected articles
model_name = "sshleifer/distilbart-cnn-12-6"
summarizer = pipeline("summarization", model=model_name, tokenizer=model_name, 
                      framework="pt", truncation=True, device = 0)

articles["summary_abs"] = articles["text"].progress_apply(lambda row: getSummaryAbstr(summarizer, row))

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/420M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

In [21]:
# Print out predicted answers for qualitative evaluation

for index, row in articles.iterrows():
    displaySummary(row["text"], row["summary_ext"] , row["summary_abs"], row["title"])

**Discuss your results:**

Abstractive summarization seems to tend towards shorter summaries. However, it is not 100% guaranteed that the generated text is grammatically correct. Special characters in the original text, such as bullets, seem to be ignored. In my opinion, this method nevertheless produces more useful summaries, at least with the models chosen here.

*Further observations:*

* Abstractive summaries tend to be slightly more coherent and fluent.
* Extractive summaries sound quite rough due to only concatenating sentences selected from the original article.
* Both approaches tend to take the beginning of the original articles as generated beginning.
* The abstractive approach yields substantially smaller summarizations.
* Considering the substantially higher Rouge metrics obtained in Task 1 and 2, we could state that abstractive summarization works better on our two news data sets.2. Calculate the Rouge 1 and Rouge 2 metric for your abstractive summaries.