## Data

In [3]:
abstract = """
For Pretrained Language Models (PLMs), their susceptibility to noise has recently been linked to subword segmentation. However, it is unclear which aspects of segmentation affect their understanding. This study assesses the robustness of PLMs against various disrupted segmentation caused by noise. An evaluation framework for subword segmentation, named Contrastive Lexical Semantic (CoLeS) probe, is proposed. It provides a systematic categorization of segmentation corruption under noise and evaluation protocols by generating contrastive datasets with canonical-noisy word pairs. 
Experimental results indicate that PLMs are unable to accurately compute word meanings if the noise introduces completely different subwords, small subword fragments, or a large number of additional subwords, particularly when they are inserted within other subwords.
These findings provide insight into the PLMs' performance improvement ( guiding future work to design methods to resolve specific corruption types one by one, which relieves the challenges of resolving them all at once) and protection from potential attacks that exploit these issues.
"""
bert_abstract = """
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.
BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
"""

shortcut_abstract = """
Recent studies report that many machine reading comprehension (MRC) models can perform closely to or even better than humans on benchmark datasets. However, existing works indicate that many MRC models may learn shortcuts to outwit these benchmarks, but the performance is unsatisfactory in real-world applications. In this work, we attempt to explore, instead of the expected comprehension skills, why these models learn the shortcuts. Based on the observation that a large portion of questions in current datasets have shortcut solutions, we argue that larger proportion of shortcut questions in training data make models rely on shortcut tricks excessively. To investigate this hypothesis, we carefully design two synthetic datasets with annotations that indicate whether a question can be answered using shortcut solutions. We further propose two new methods to quantitatively analyze the learning difficulty regarding shortcut and challenging questions, and revealing the inherent learning mechanism behind the different performance between the two kinds of questions. A thorough empirical analysis shows that MRC models tend to learn shortcut questions earlier than challenging questions, and the high proportions of shortcut questions in training sets hinder models from exploring the sophisticated reasoning skills in the later stage of training.
"""

dict_keys(['lay_summary', 'article', 'headings', 'keywords', 'id'])


## BART and PEGASUS

## CTRLSUM - Keyword Summarization and prompt summarization (no prompt in decoder)

In [2]:
from transformers import AutoModelForSeq2SeqLM, PreTrainedTokenizerFast
from transformers.models.bart.modeling_bart import BartForConditionalGeneration

# model = AutoModelForSeq2SeqLM.from_pretrained("hyunwoongko/ctrlsum-cnndm")
model = BartForConditionalGeneration.from_pretrained("hyunwoongko/ctrlsum-arxiv")
# model = AutoModelForSeq2SeqLM.from_pretrained("hyunwoongko/ctrlsum-bigpatent")

# tokenizer = PreTrainedTokenizerFast.from_pretrained("hyunwoongko/ctrlsum-cnndm")
tokenizer = PreTrainedTokenizerFast.from_pretrained("hyunwoongko/ctrlsum-arxiv")
# tokenizer = PreTrainedTokenizerFast.from_pretrained("hyunwoongko/ctrlsum-bigpatent")


In [1]:
def summarize(prompt, abstract, include_decoder_input=False):
    print("Keyword/Prompt \n \t ", prompt)
    data = tokenizer(abstract+prompt, return_tensors="pt")
    input_ids, attention_mask = data["input_ids"], data["attention_mask"]
    if include_decoder_input:
        summary =  tokenizer.batch_decode(model.generate(input_ids, attention_mask=attention_mask, num_beams=5, decoder_input_ids=tokenizer(prompt, return_tensors="pt")["input_ids"][:, :-1]))[0] 
    else:
        summary =  tokenizer.batch_decode(model.generate(input_ids, attention_mask=attention_mask, num_beams=5))[0]
        
    print("Summary \n \t ", summary)
    print('\n')
    return summary

In [4]:
# prompt
_ = summarize("Q:What is the the research question or questions in the paper? A: => ", abstract)
# prompt+abtract: </s> in this paper, we propose a novel evaluation framework for subword segmentation, named the</s>
# abtract+prompt: </s> in this paper, we investigate the impact of subword segmentation corruption on the robustness</s>

_ = summarize("Q:What is the the proposed method in the paper? A: => ", abstract)
#  prompt+abtract: </s> in this paper, we propose an evaluation framework for subword segmentation, named contrastive</s>
#  abtract+prompt: </s> in this paper, we propose a novel evaluation framework for subword segmentation, named contrast</s>

_ = summarize("Q:What are the findings? A: => ", abstract  )
#  prompt+abtract: </s> this paper presents an evaluation framework for subword segmentation based on the concept of contrastive</s>
#  abtract+prompt: </s> in this paper, we investigate the impact of subword segmentation corruption on the robustness</s>

# keyword
_ = summarize("findings => ", abstract )
#  prompt+abtract: </s> in this paper, we investigate the impact of noise on the robustness of word segmentation</s>
#  abstract+prompt: </s> in this paper, we investigate the impact of subword segmentation corruption on the robustness</s>

_ = summarize("What are the findings? => ", abstract  ) # work like a language model to complete the question
#  prompt+abtract: </s> what are the findings of this study on the robustness of word segmentation under noise?</s>
#  abstract+prompt: </s> in this paper, we investigate the impact of subword segmentation corruption on the robustness</s>

# _ = summarize("What are the findings of this study on the robustness of word segmentation under noise? => ", abstract  )

_ = summarize("the main contributions of this paper are: (1) => ", abstract )
#  prompt+abtract: </s> in this paper, we propose an evaluation framework for subword segmentation, named contrastive</s>
#  prompt+abtract: </s> in this paper, we investigate the impact of subword segmentation corruption on the performance of</s>


Keyword/Prompt 
 	  Q:What is the the research question or questions in the paper? A: => 


To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)


Summary 
 	  </s> in this paper, we investigate the impact of subword segmentation corruption on the robustness</s>


Keyword/Prompt 
 	  Q:What is the the proposed method in the paper? A: => 
Summary 
 	  </s> in this paper, we propose a novel evaluation framework for subword segmentation, named contrast</s>


Keyword/Prompt 
 	  Q:What are the findings? A: => 
Summary 
 	  </s> in this paper, we investigate the impact of subword segmentation corruption on the robustness</s>


Keyword/Prompt 
 	  findings => 
Summary 
 	  </s> in this paper, we investigate the impact of subword segmentation corruption on the robustness</s>


Keyword/Prompt 
 	  What are the findings? => 
Summary 
 	  </s> in this paper, we investigate the impact of subword segmentation corruption on the robustness</s>


Keyword/Prompt 
 	  the main contributions of this paper are: (1) => 
Summary 
 	  </s> in this paper, we investigate the impact of subword segmentation corruption on the performance of</s>




In [80]:
# prompt
# _ = summarize("Q:What is the the research question or questions in the paper? A: => ", bert_abstract)
# </s> in this paper, we introduce a new language representation model called bidirectional encoder and</s>

# _ = summarize("Q:What is the the proposed method in the paper? A: => ", bert_abstract)
# </s> in this paper, we introduce a new language representation model called the bidirectional encoder</s>

# _ = summarize("Q:What are the findings? A: => ", bert_abstract  )
# </s> in this paper, we introduce a new language representation model called the bidirectional encoder</s>



# keyword
# _ = summarize("findings => ", bert_abstract )
# </s> in this paper, we introduce a new language representation model called the bidirectional encoder</s>

# _ = summarize("the main contributions of this paper are: (1) => ", bert_abstract )
# </s> in this paper, we introduce a new language representation model called the bidirectional encoder</s>

# _ = summarize("What are the findings? => ", bert_abstract  ) # unlike previous one which works like a language model to complete the question
# </s> in this paper, we introduce a new language representation model called the bidirectional encoder</s>

# _ = summarize("What are the findings of this study on the robustness of word segmentation under noise? => ", bert_abstract  )
# </s> what are the findings of this study on the robustness of word segmentation under noise?</s>



Keyword/Prompt 
 	  What are the findings of this study on the robustness of word segmentation under noise? => 
Summary 
 	  </s> what are the findings of this study on the robustness of word segmentation under noise?</s>




## CTRLSUM - Different decoding algorithms

In [5]:

from allennlp_models.generation.models import Bart
from allennlp.data import Vocabulary
import torch
from allennlp.nn.beam_search import BeamSearch

def summary_by_allennlp( prompt, context ):
    print("Keyword/Prompt \n \t ", prompt)
    vocab = Vocabulary.from_pretrained_transformer("hyunwoongko/ctrlsum-arxiv") 
    bart_model = Bart(model_name="hyunwoongko/ctrlsum-arxiv", vocab=vocab)
    data = tokenizer([context+prompt], return_tensors="pt", padding="longest" )
    input_ids, attention_mask = data["input_ids"], data["attention_mask"]
    beam_search = BeamSearch(bart_model._end_id, vocab=vocab, beam_size=5)
    initial_decoder_id = torch.tensor( [[bart_model._decoder_start_id]], dtype=input_ids.dtype, device=input_ids.device ).repeat(input_ids.shape[0], 1)
    initial_state = {
        "input_ids": input_ids,
        "input_mask": attention_mask
    }

    beam_result = beam_search.search(initial_decoder_id, initial_state, bart_model.take_step)
    predictions = beam_result[0] # (bsz, beam_size, seq_len)

    max_pred_indices = (
        beam_result[1].argmax(dim=-1).view(-1, 1, 1).expand(-1, -1, predictions.shape[-1])
    ) # (bsz, 1, seq_len)
    out = predictions.gather(dim=1, index=max_pred_indices)# (bsz, 1, seq_len)
    out = out.squeeze(dim=1) 

    predicted_text = bart_model.make_output_human_readable({"predictions": out})['predicted_text']

    print("Summary \n \t ", predicted_text[0])
    print('\n')
    return predicted_text[0]

    # import numpy as np
    # print(max_pred_indices.numpy().shape)
    # print(predictions.numpy().shape)
    # np.take(predictions.numpy(), indices=max_pred_indices.numpy(), axis=1) # (2, 2, 1, 50, 50)


In [6]:
# prompt
_ = summary_by_allennlp("Q:What is the the research question or questions in the paper? A: => ", bert_abstract)
# prompt+abstract: </s> in this paper, we introduce a new language representation model called bidirectional encoder and</s>
# prompt+abstract allennlp: in this paper, we introduce a new language representation model, called bidirectional language representation model called ( bidirectional language representation model ) called ( bidirectional language representation model ), which is designed to pre- train deep bidirectional representations
# abstract+prompt allenlp: in this paper, we introduce a new language representation model called the bidirectional encoder and discriminator ( bidirectional encoder and discriminator ) ( darc ) model called the bidirectional encoder and discriminator ( bidirectional


_ = summary_by_allennlp("Q:What is the the proposed method in the paper? A: => ", bert_abstract)
# prompt+abstract: </s> in this paper, we introduce a new language representation model called the bidirectional encoder</s>
# prompt+abstract allennlp:  in this paper, we introduce a new language representation model called the bidirectional language representation model ( dla ), which is designed to pre- train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context
# abstract+prompt allennlp:  in this paper, we introduce a new language representation model called the bidirectional encoder and discriminator ( bidirectional encoder and discriminator ) ( darc ) model called the bidirectional encoder and discriminator ( bidirectional

_ = summary_by_allennlp("Q:What are the findings? A: => ", bert_abstract  )
# prompt+abstract: </s> in this paper, we introduce a new language representation model called the bidirectional encoder</s>
# abstract+prompt allennlp:  in this paper, we introduce a new language representation model called the bidirectional encoder and discriminator ( bidirectional encoder and discriminator ) ( darc ) model called the bidirectional encoder and discriminator ( bidirectional

# keyword
_ = summary_by_allennlp("findings => ", bert_abstract )
# prompt+abstract: </s> in this paper, we introduce a new language representation model called the bidirectional encoder</s>
# abstract+prompt allennlp: in this paper, we introduce a new language representation model called _ bidirectional symbolic encoder and symbolic encoder _ ( bidirectional sceoder and symbolic encoder ), which stands for bidirectional symbolic encoder and symbolic encoder

_ = summary_by_allennlp("the main contributions of this paper are: (1) => ", bert_abstract )
# prompt+abstract: </s> in this paper, we introduce a new language representation model called the bidirectional encoder</s>
# abstract+prompt allennlp: in this paper, we introduce a new language representation model called the bidirectional encoder and discriminator ( bidirectional encoder and discriminator ) ( darc ), which stands for bidirectional encoder and discriminator ( bidirection

# _ = summary_by_allennlp("What are the findings? => ", bert_abstract  ) # unlike previous one which works like a language model to complete the question
# prompt+abstract: </s> in this paper, we introduce a new language representation model called the bidirectional encoder</s>




Keyword/Prompt 
 	  Q:What is the the research question or questions in the paper? A: => 
Summary 
 	   in this paper, we introduce a new language representation model called the bidirectional encoder and discriminator ( bidirectional encoder and discriminator ) ( darc ) model called the bidirectional encoder and discriminator ( bidirectional


Keyword/Prompt 
 	  Q:What is the the proposed method in the paper? A: => 
Summary 
 	   in this paper, we introduce a new language representation model called the bidirectional encoder and discriminator ( bidirectional encoder and discriminator ) ( darc ) model called the bidirectional encoder and discriminator ( bidirectional


Keyword/Prompt 
 	  Q:What are the findings? A: => 
Summary 
 	   in this paper, we introduce a new language representation model called the bidirectional encoder and discriminator ( bidirectional encoder and discriminator ) ( darc ) model called the bidirectional encoder and discriminator ( bidirectional


Keyword/Prom

In [88]:
# prompt
_ = summary_by_allennlp("Q:What is the the research question or questions in the paper? A: => ", abstract)
# prompt+abstract: </s> in this paper, we propose a novel evaluation framework for subword segmentation, named the</s>
# prompt+abstract allennlp: in this paper, we propose a novel evaluation framework for subword segmentation under noise, named the _ contrastive comparative subword segmentation probe _, named the _ contrastive comparative subword segmentation probe _, to be applied to real -

_ = summary_by_allennlp("Q:What is the the proposed method in the paper? A: => ", abstract)
# prompt+abstract: </s> in this paper, we propose an evaluation framework for subword segmentation, named contrastive</s>

_ = summary_by_allennlp("Q:What are the findings? A: => ", abstract  )
# prompt+abstract: </s> this paper presents an evaluation framework for subword segmentation based on the concept of contrastive</s>



# keyword
_ = summary_by_allennlp("findings => ", abstract )
# prompt+abstract: </s> in this paper, we investigate the impact of noise on the robustness of word segmentation</s>

_ = summary_by_allennlp("What are the findings? => ", abstract  ) # work like a language model to complete the question
# prompt+abstract: </s> what are the findings of this study on the robustness of word segmentation under noise?</s>

_ = summary_by_allennlp("the main contributions of this paper are: (1) => ", abstract )
# prompt+abstract: </s> in this paper, we propose an evaluation framework for subword segmentation, named contrastive</s>



Keyword/Prompt 
 	  Q:What is the the research question or questions in the paper? A: => 
Summary 
 	   in this paper, we propose a novel evaluation framework for subword segmentation under noise, named the _ contrastive comparative subword segmentation probe _, named the _ contrastive comparative subword segmentation probe _, to be applied to real -


Keyword/Prompt 
 	  Q:What is the the proposed method in the paper? A: => 
Summary 
 	   in this paper, we propose an evaluation framework for subword segmentation, named contrastive lexical semantic probe, to assess the robustness of subword segmentation against various disrupted segmentation caused by noise. the proposed framework is applied to the


Keyword/Prompt 
 	  Q:What are the findings? A: => 
Summary 
 	   this paper presents an evaluation framework for subword segmentation, named contrastive lexical semantic probe, to assess the robustness of subword segmentation against various disrupted segmentation caused by noise. the fra

## CTRLSUM - Prompt summarization (prompt in decoder)

In [35]:
# put keywords into decoder would generate nonsense
# _ = summarize("findings => ", abstract , include_decoder_input=True ) 

# QA prompt
_ = summarize("Q:What is the the proposed method in the paper? A: => ", abstract , include_decoder_input=True)
_ = summarize("Q:What are the findings? A: => ", abstract , include_decoder_input=True )

# Contribution prompt
_ = summarize("the main contributions of this paper are: (1) => ", abstract, include_decoder_input=True )
# _ = summarize("The main contribution of this paper are: (1) => ", abstract, include_decoder_input=True ) # whether the model is sensitive to the little change of prompts?


# Input length of decoder_input_ids is 21, but ``max_length`` is set to 20.
# _ = summarize("What are the findings of this study on the robustness of word segmentation under noise? => ", abstract, include_decoder_input=True )



Keyword/Prompt 
 	  Q:What is the the proposed method in the paper? A: => 
Summary 
 	  <s>Q:What is the the proposed method in the paper? A: =>  scrutinizing</s>


Keyword/Prompt 
 	  Q:What are the findings? => 
Summary 
 	  <s>Q:What are the findings? => :\ { # 1 } # 1 # 1</s>


Keyword/Prompt 
 	  the main contributions of this paper are: (1) => 
Summary 
 	  <s>the main contributions of this paper are: (1) => ength - based segmentation</s>




In [25]:
abstract = """
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.
BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
"""

_ = summarize("Q:What is the the proposed method in the paper? A: => ")
_ = summarize("findings => " )
_ = summarize("What are the findings? => " )
_ = summarize("The main contribution of this paper are: (1) => " )

Prompt 
 	  Q:What is the the proposed method in the paper? A: => 
Summary 
 	  </s> in this paper, we introduce a new language representation model called the bidirectional encoder</s>


Prompt 
 	  findings => 
Summary 
 	  </s> in this paper, we introduce a new language representation model called the bidirectional encoder</s>


Prompt 
 	  What are the findings? => 
Summary 
 	  </s> in this paper, we introduce a new language representation model called the bidirectional encoder</s>


Prompt 
 	  The main contribution of this paper are: (1) => 
Summary 
 	  </s> the main contribution of this paper are : ( 1 ) we introduce a new language representation model</s>


