# Extended solution

*Papers that somewhat resemble my ideas or just may be useful*:  
+ [Neural Extractive Text Summarization with Syntactic Compression](https://aclanthology.org/D19-1324.pdf) *by Jiacheng Xu and Greg Durrett*:
    + The approach encodes the text and *then* performs the compression;
    + https://github.com/jiacheng-xu/neu-compression-sum (gosh, what a horrible code)
+ [Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting](https://aclanthology.org/P18-1063.pdf) *by Yen-Chun Chen and Mohit Bansal*:
    + This approach utilizes RL to rewrite sents, along with abstractive summarization;
    + https://github.com/ChenRocks/fast_abs_rl (much more readable repo)
+ [Extractive Summarization as Text Matching](https://arxiv.org/pdf/2004.08795v1.pdf) *by Ming Zhong, Pengfei Liu, Yiran Chen, Danqing Wang, Xipeng Qiu, Xuanjing Huang*:
    + https://github.com/maszhongming/MatchSum
    + CNN/DailyMail to a new level (44.41 in ROUGE-1);

### Work structure
#### 1. Dataset loading and preprocessing

#### 2. Implementing heuristics as standalone preprocessing functionality
This work suggest usiing some preprocessing tricks to improve the actual effect of summarization. The following ways are suggested (each is to be implemented in separate notebook):
1. Utilize coreference resolution among sentences so we won't miss important nouns in our summary.
2. Try to split compound sentences into few smaller ones.
3. Compress resulting sentences so we exclude some low-informative words without (hopefully) sacrifising the readability.
    - Named entities intuitively seem more important that common nouns, so they are not to be deleted.
    
#### 3. Implementation of summarization block per-se
#### 4. Evaluation and metrics


## 1. Dataset import

From here: https://cs.nyu.edu/~kcho/DMQA/

In [None]:
import spacy
from spacy import displacy
nlp = spacy.load("en_core_web_lg")

In [None]:
# We will still be using the same The New Yorker chunk as it's grammatical structure is quite diverse.
body = "Resonance is the literary magazine put out by the students of Falmouth Academy, the Massachusetts private school I attended for six years, starting in the seventh grade. During my time at F.A., I had at least one poem published in each issue of Resonance. In high school, I was also a member of the staff. But that wasn’t why I loved it. I loved it — and I swear I am not exaggerating here — because I thought the writing in its pages was more beautiful than anything I’d ever read. I was not a happy or popular adolescent, and the emotional stance I adopted toward most of my peers at F.A. might best be described as a defensive crouch. I was scared of my classmates, and I resented them; I could tell they didn’t like me, but I couldn’t figure out why. To the extent that I was able to lift myself out of my own sodden self-loathing to contemplate their inner worlds, I imagined their minds to be filled, like mine, with a whirlwind of criticism and judgment. But, once a year, at the end of the spring semester, I would open my copy of Resonance and be forced to face the unsettling possibility that my classmates were not the shallow bullies I imagined them to be but actual people, with souls."

In [94]:
body = "The things are quite complicated, we shall move on."

In [95]:
# Process the sentence with the spacy engine:
doc = nlp(body)
sentences = [sent for sent in doc.sents]

## 3. Summarization block

For my summarization block I'd like to make use of the novel method suggested by 

The main keypoint of the work:
> Instead of scoring and extracting sentences
> one by one to form a summary, we formulate
> extractive summarization as a semantic text matching
> problem and propose a novel summary-level
> framework. Our approach bypasses the difficulty
> of summary-level optimization by contrastive learning,
> that is, a good summary should be more
> semantically similar to the source document than the
> unqualified summaries.

*IMO, it resembles the centroidal approach from my baseline, but I cannot loose the chance to use some deep learning for that ;-)*

> Inspired by siamese network structure (Bromley
et al., 1994), we construct a Siamese-BERT archi
tecture to match the document D and the candidate
summary C. Our Siamese-BERT consists of two
BERTs with tied-weights and a cosine-similarity
layer during the inference phase.

Okay, we will need some BERT...
> we use the vector of the ‘[CLS]’ token from the top BERT layer as the representation of a document or summary.


We also compare simiarities...
> Let $r_D$ and $r_C$ denote the embeddings of the document D and candidate summary C. Their similarity score is measured by
$f(D,C) = cosine(r_D,r_C)$.

And the loss...
> In order to fine-tune Siamese-BERT, we use a margin-based triplet loss to update the weights

Doesn't search though all possible candidates can be of $\sum_{i=1}^{n}C_n^i$ variants? We'll see.
> In the inference phase, we formulate extractive summarization as a task to search for the best summary among all the candidates C extracted from the document D.

... and yeah, exactly:
> The matching idea is more intuitive while it suffers from combinatorial explosion problems. \[...] we introduce a content selection module to pre-select salient sentences.

## 4. Eval and metrics