Skip to content
Code for the EMNLP 2018 paper "Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization"
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
example_custom_dataset Added readme and license. Can now use custom multi-document datasets. Aug 25, 2018
logs
.gitignore Fixed readme to include --pg_mmr flag Mar 7, 2019
LICENSE.md
README.md
__init__.py
attention_decoder.py Added readme and license. Can now use custom multi-document datasets. Aug 25, 2018
batcher.py
beam_search.py
convert_data.py Now can input custom multi-doc articles without summaries Aug 25, 2018
data.py Added readme and license. Can now use custom multi-document datasets. Aug 25, 2018
decode.py
importance_features.py Added svr.pickle Apr 2, 2019
model.py Added readme and license. Can now use custom multi-document datasets. Aug 25, 2018
pg_mmr_functions.py
run_summarization.py
util.py

README.md

Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization

We provide the source code for the paper "Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization", accepted at EMNLP'18. If you find the code useful, please cite the following paper.

@inproceedings{lebanoff-song-liu:2018,
 Author = {Logan Lebanoff and Kaiqiang Song and Fei Liu},
 Title = {Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization},
 Booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
 Year = {2018}}

Goal

  • Our system seeks to summarize a set of articles (about 10) about the same topic.

  • The code takes as input a text file containing a set of articles. See below on the input format of the files.

Dependencies

The code is written in Python (v2.7) and TensorFlow (v1.4.1). We suggest the following environment:

How to Generate Summaries

  1. Clone this repo. Download this ZIP file containing the pretrained model from See et al. Move the folder pretrained_model_tf1.2.1 into the ./logs/ directory.

    $ git clone https://github.com/ucfnlp/multidoc_summarization/
    $ mv pretrained_model_tf1.2.1.zip multidoc_summarization/logs
    $ cd multidoc_summarization/logs
    $ unzip pretrained_model_tf1.2.1.zip
    $ rm pretrained_model_tf1.2.1.zip
    $ cd ..
    
  2. Format your data in the following way:

    One file for each topic. Distinct articles will be separated by one blank line (two carriage returns \n). Each sentence of the article will be on its own line. See ./example_custom_dataset/ for an example.

  3. Convert your data to TensorFlow examples that can be fed to the PG-MMR model.

    $ python convert_data.py --dataset_name=example_custom_dataset --custom_dataset_path=./example_custom_dataset/
    
  4. Run the testing script. This will create a file called logs/tfidf_vectorizer/example_custom_dataset.dill. If you change the contents of your dataset, you should delete this file so that the script will re-create the TF-IDF vectorizer which reflects the changes. The summary files are located in the ./logs/example_custom_dataset/decoded/ directory.

    $ python run_summarization.py --dataset_name=example_custom_dataset --pg_mmr
    

License

This project is licensed under the BSD License - see the LICENSE.md file for details.

Acknowledgments

We gratefully acknowledge the work of Abigail See whose code was used as a basis for this project.

You can’t perform that action at this time.