# Common summarization approaches

In this notebook I'm going to introduce some common summarization approaches as well as third-party solutions that are capable of performing summarization. I will provide an example of usage as well as brief note on main ideas of the solution.

I've made a brief research on the topic using [paperswithcode](https://paperswithcode.com) web-resource. The impression of the state of the things is the following:
+ *The code which accompanies papers is rarely adapted to be used as a tool, mostly it is a proof of a concept, a mean to calculate metrics. So there are not so many examples below.*
+ The extractive summarization task - as it is - aims to eliminate less information-rich sentences from text, so lots of works generally do exactly it;
+ To achive this goal the two main steps are performed:
    1. Sentences are somehow measured by their importance;
    2. Some selection algorithm is used (e.g. take top-n most "important").

My further exploration have confirmed my point. The summarization task nowdays mostly consists of taking some BERT embeddings and creating an algorithm that chooses which of them to keep.
And these algorithms may vary a lot.

The example text body is from [The New Yorker](https://www.newyorker.com/culture/personal-history/the-author-the-work-and-the-no-1-fan)

In [5]:
body = "Resonance is the literary magazine put out by the students of Falmouth Academy, the Massachusetts private school I attended for six years, starting in the seventh grade. During my time at F.A., I had at least one poem published in each issue of Resonance. In high school, I was also a member of the staff. But that wasn’t why I loved it. I loved it — and I swear I am not exaggerating here — because I thought the writing in its pages was more beautiful than anything I’d ever read. I was not a happy or popular adolescent, and the emotional stance I adopted toward most of my peers at F.A. might best be described as a defensive crouch. I was scared of my classmates, and I resented them; I could tell they didn’t like me, but I couldn’t figure out why. To the extent that I was able to lift myself out of my own sodden self-loathing to contemplate their inner worlds, I imagined their minds to be filled, like mine, with a whirlwind of criticism and judgment. But, once a year, at the end of the spring semester, I would open my copy of Resonance and be forced to face the unsettling possibility that my classmates were not the shallow bullies I imagined them to be but actual people, with souls."

### bert-extractive-summarizer
[Link to repository](https://github.com/dmmiller612/bert-extractive-summarizer)

The method used in the library utilizes  the  BERT  model  for  text 
embeddings  and  K-Means  clustering  to  identify  sentences 
closest to the centroid for summary selection.

From https://arxiv.org/pdf/1906.04165.pdf:
>   When creating summaries \[...], the \[...]  engine  leveraged  a  pipeline  which 
> tokenized the incoming paragraph text into clean sentences, 
> passed  the  tokenized  sentences  to  the  BERT  model  for 
> inference  to  output  embeddings,  and  then  clustered  the 
> embeddings  with  K-Means,  **selecting  the  embedded 
> sentences that were closest to the centroid** as the candidate 
> summary sentences.

I briefly looked through the paper and did not find any benchmarks or metrics.

In [8]:
from summarizer import Summarizer

model = Summarizer()


Some weights of the model checkpoint at bert-large-uncased were not used when initializing BertModel: ['cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [7]:
# defaults
print(model(body))

Resonance is the literary magazine put out by the students of Falmouth Academy, the Massachusetts private school I attended for six years, starting in the seventh grade. But, once a year, at the end of the spring semester, I would open my copy of Resonance and be forced to face the unsettling possibility that my classmates were not the shallow bullies I imagined them to be but actual people, with souls.


In [9]:
print(model(body, ratio=0.2))  # Specified with ratio
print(model(body, num_sentences=3))  # Will return 3 sentences 

Resonance is the literary magazine put out by the students of Falmouth Academy, the Massachusetts private school I attended for six years, starting in the seventh grade. But, once a year, at the end of the spring semester, I would open my copy of Resonance and be forced to face the unsettling possibility that my classmates were not the shallow bullies I imagined them to be but actual people, with souls.
Resonance is the literary magazine put out by the students of Falmouth Academy, the Massachusetts private school I attended for six years, starting in the seventh grade. In high school, I was also a member of the staff. To the extent that I was able to lift myself out of my own sodden self-loathing to contemplate their inner worlds, I imagined their minds to be filled, like mine, with a whirlwind of criticism and judgment.


## AMR Summarizarion methods

I recently have learned about this field of reasearch, and luckily it suggests some approaches to the summarization task:
+ [Liu et al., 2015](http://www.cs.ucf.edu/~feiliu/papers/COLING2018_AMRSumm.pdf)
+ [Takase et al., 2016](https://aclanthology.org/D16-1112.pdf)
+ [Dohare et al., 2017](https://arxiv.org/abs/1706.01678)
+ [Liao et al., 2018](https://aclanthology.org/C18-1101/)


## Some tips for summarization alorithms
+ *Skip the sentence that has trigram overlapping with the previously selected sentences. Surprisingly, this simple method of removing duplication brings a remarkable performance improvement on CNN/DailyMail.* - from [Extractive Summarization as Text Matching, Ming Zhong et al., 2020, page 1](https://arxiv.org/pdf/2004.08795v1.pdf)