Hercules: Attributable and Scalable Opinion Summarization

Code for the paper "Attributable and Scalable Opinion Summarization", Tom Hosking, Hao Tang & Mirella Lapata (ACL 2023).

By representing sentences from reviews as paths through a discrete hierarchy, we can generate abstractive summaries that are informative, attributable and scale to hundreds of input reviews.

Setup

Create a fresh environment:

conda create -n herculesenv python=3.9
conda activate herculesenv

or

python3 -m venv herculesenv
source herculesenv/bin/activate

Then install dependencies:

pip install -r requirements.txt

Download data/models:

Space -> ./data/opagg/
AmaSum -> ./data/opagg/
Trained checkpoints -> ./models

Tested with Python 3.9.

Evaluation with trained models

See ./examples/Space-Eval.ipynb

or

from torchseq.utils.model_loader import model_from_path
from torchseq.metric_hooks.hrq_agg import HRQAggregationMetricHook

model_slug = 'hercules_space' # Which model to load?

instance = model_from_path('./models/' + model_slug, output_path='./runs/', data_path='./data/', silent=True)

scores, res = HRQAggregationMetricHook.eval_generate_summaries_and_score(instance.config, instance, test=True)

print("Model {:}: Abstractive R2 = {:0.2f}, Extractive R2 = {:0.2f}".format(model_slug, scores['abstractive']['rouge2'], scores['extractive']['rouge2']))

Training on SPACE/AmaSum from scratch

To train on SPACE, download the datasets (as above) then you should just be able to run:

torchseq --train --reload_after_train --validate --config ./configs/hercules_space.json

Training on a new dataset (WIP)

You will need to:

Install allennlp==2.10.1 and allennlp-models==2.10.1 via pip (ignore the warnings about version conflicts)
Make a copy of your dataset in a format expected by the script below
Run the dataset filtering scripts ./scripts/opagg_filter_space.py and ./scripts/opagg_filter_space_eval.py
Run the script to generate training pairs ./scripts/generate_opagg_pairs.py
Make a copy of one of the training configs and update to point at your data
Finally, train the model!

torchseq --train --reload_after_train --validate --config ./configs/{YOUR_CONFIG}.json

Please feel free to raise a Github issue or email me if you run into any difficulties!

Citation

@inproceedings{hosking-etal-2023-attributable,
  title = "Attributable and Scalable Opinion Summarization",
  author = "Hosking, Tom  and
      Tang, Hao  and
      Lapata, Mirella",
  booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
  month = jul,
  year = "2023",
  address = "Toronto, Canada",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2023.acl-long.473",
  pages = "8488--8505",
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
configs		configs
data		data
examples		examples
output		output
scripts		scripts
web		web
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
hercules_sdv2.jpeg		hercules_sdv2.jpeg
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hercules: Attributable and Scalable Opinion Summarization

Setup

Evaluation with trained models

Training on SPACE/AmaSum from scratch

Training on a new dataset (WIP)

Citation

About

Languages

License

tomhosking/hercules

Folders and files

Latest commit

History

Repository files navigation

Hercules: Attributable and Scalable Opinion Summarization

Setup

Evaluation with trained models

Training on SPACE/AmaSum from scratch

Training on a new dataset (WIP)

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages