Data for the AACL-IJCNLP 2020 paper "A Cascaded Approach to Neural Abstractive Summarization with Content Selection and Fusion"
Our data consists of > 1 million sentence fusion instances, of the form:
Input: one or two articles sentences + token-level highlights indicating which tokens were used to create the output summary sentence
Output: the summary sentence formed by compressing/fusing the input sentences
Our data is derived from the CNN/Daily Mail summarization dataset.
Link to our data: https://www.dropbox.com/sh/227japgc3q3klkd/AABwwFi6LueV8F6sokjHqSsTa?dl=0