AutoACU

This github repository contains the source code of the AutoACU package for automatic summarization evaluation, proposed in our paper Towards Interpretable and Efficient Automatic Reference-Based Summarization Evaluation, EMNLP 2023

AutoACU contains two types of automatic evaluation metrics:

A2CU: a two-step automatic evaluation metric that first extracts atomic content units (ACUs) from one text sequence and then evaluates the extracted ACUs against another text sequence.
A3CU: an accelerated version of A2CU that directly computes the similarity between two text sequences without extracting ACUs, but with the similar evaluation target.

Installation

You can install AutoACU using pip:

pip install autoacu

or clone the repository and install it manually:

git clone https://github.com/Yale-LILY/AutoACU
cd AutoACU
pip install .

The necessary dependencies include PyTorch and HuggingFace's Transformers. It should be compatible with any of the recent versions of PyTorch and Transformers. However, to make sure that the dependencies are compatible, you may run the following command:

pip install autoacu[stable]

You may also use the metrics directly without installing the package by importing the metric classes in autoacu/a2cu.py and autoacu/a3cu.py.

Usage

The model checkpoints for A2CU and A3CU are available on the HuggingFace model hub.

A2CU

A2CU needs to be initialized with two models, one for ACU generation and one for ACU matching. The default models are the following:

ACU generation: Yale-LILY/a2cu-generator, which is a T0-3B model finetuned on the RoSE dataset.
ACU matching: Yale-LILY/a2cu-classifier, which is a DeBERTa-XLarge model finetuned on the RoSE dataset.

Please note that to use A2CU, you may need to have a GPU with at least 16GB memory.

Below is an example of using A2CU to evaluate the similarity between two text sequences.

from autoacu import A2CU
candidates, references = ["This is a test"], ["This is a test"]
a2cu = A2CU(device=0)  # the GPU device to use
recall_scores, prec_scores, f1_scores = a2cu.score(
    references=references,
    candidates=candidates,
    generation_batch_size=2, # the batch size for ACU generation
    matching_batch_size=16 # the batch size for ACU matching
    output_path=None # the path to save the evaluation results
    recall_only=False # whether to only compute the recall score
    acu_path=None # the path to save the generated ACUs
    )
print(f"Recall: {recall_scores[0]:.4f}, Precision {prec_scores[0]:.4f}, F1: {f1_scores[0]:.4f}")

A3CU

The default model checkpoint for A3CU is Yale-LILY/a3cu, which is based on the BERT-Large model. Below is an example of using A3CU to evaluate the similarity between two text sequences.

from autoacu import A3CU
candidates, references = ["This is a test"], ["This is a test"]
a3cu = A3CU(device=0)  # the GPU device to use
recall_scores, prec_scores, f1_scores = a3cu.score(
    references=references,
    candidates=candidates,
    batch_size=16 # the batch size for ACU generation
    output_path=None # the path to save the evaluation results
    )
print(f"Recall: {recall_scores[0]:.4f}, Precision {prec_scores[0]:.4f}, F1: {f1_scores[0]:.4f}")

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
autoacu		autoacu
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

autoacu

README.md

README.md

setup.py

setup.py

Repository files navigation

AutoACU - Interpretable and Efficient Automatic Summarization Evaluation

Installation

Usage

A2CU

A3CU

About

Releases

Packages

Languages

Yale-LILY/AutoACU

Folders and files

Latest commit

History

Repository files navigation

AutoACU - Interpretable and Efficient Automatic Summarization Evaluation

Installation

Usage

A2CU

A3CU

About

Resources

Stars

Watchers

Forks

Languages