Repository for Unsupervised Sentence Compression using Denoising Auto-Encoders
Switch branches/tags
Nothing to show
Clone or download
Latest commit 657a1b7 Aug 24, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
env_configs Initial commit Sep 8, 2018
runs/default Initial commit Sep 8, 2018
sample_scripts Initial commit Sep 8, 2018
src Initial commit Sep 8, 2018
test_data Initial commit Sep 8, 2018
.gitignore Initial commit Sep 8, 2018
README.md Initial commit Sep 8, 2018
architecture.png Initial commit Sep 8, 2018
requirements.txt Initial commit Sep 8, 2018

README.md

Experiments in Unsupervised summarization

This is a Pytorch implementation of the summarization methods described in Unsupervised Sentence Compression using Denoising Autoencoders (CoNLL 2018). It features denoising additive auto-encoders with optional NLI hidden state initialization (based on Infersent).

Model architecture

Table of Contents

Requirements

pip install -r requirements.txt

Quickstart

Step 1: Get the data and create the vocabulary

Gigaword data can be downloaded from : https://github.com/harvardnlp/sent-summary. Then, extract it (tar -xzf summary.tar.gz ). Vocabulary can then be created by running python src/datasets/preprocess.py train_data_file output_voc_file)

Step 2: Create an environment configuration file

This is used to locate the datasets, embeddings, whether you want to use gpu, etc. on your computer. You can see an example configuration at env_configs/env_config.json. You only need to set nli variables if you use InferSent embeddings.

Step 3: Train the model

Simply run:

python sample_scripts/dae_json.py runs/default/default.json

Step 4: Run inference

python sample_scripts/simple_inference.py model_path test_data_path [output_data_path]

Step 5: Evaluate ROUGE scores

To evaluate for rouge, we use files2rouge, which itself uses pythonrouge.

Installation instructions:

pip install git+https://github.com/tagucci/pythonrouge.git
git clone https://github.com/pltrdy/files2rouge.git
cd files2rouge
python setup_rouge.py
python setup.py install

To run evaluation, simply run:

files2rouge summaries.txt references.txt