Skip to content

Yale-LILY/AutoACU

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

AutoACU - Interpretable and Efficient Automatic Summarization Evaluation

This github repository contains the source code of the AutoACU package for automatic summarization evaluation, proposed in our paper Towards Interpretable and Efficient Automatic Reference-Based Summarization Evaluation, EMNLP 2023

AutoACU contains two types of automatic evaluation metrics:

  • A2CU: a two-step automatic evaluation metric that first extracts atomic content units (ACUs) from one text sequence and then evaluates the extracted ACUs against another text sequence.
  • A3CU: an accelerated version of A2CU that directly computes the similarity between two text sequences without extracting ACUs, but with the similar evaluation target.

Installation

You can install AutoACU using pip:

pip install autoacu

or clone the repository and install it manually:

git clone https://github.com/Yale-LILY/AutoACU
cd AutoACU
pip install .

The necessary dependencies include PyTorch and HuggingFace's Transformers. It should be compatible with any of the recent versions of PyTorch and Transformers. However, to make sure that the dependencies are compatible, you may run the following command:

pip install autoacu[stable]

You may also use the metrics directly without installing the package by importing the metric classes in autoacu/a2cu.py and autoacu/a3cu.py.

Usage

The model checkpoints for A2CU and A3CU are available on the HuggingFace model hub.

A2CU

A2CU needs to be initialized with two models, one for ACU generation and one for ACU matching. The default models are the following:

Please note that to use A2CU, you may need to have a GPU with at least 16GB memory.

Below is an example of using A2CU to evaluate the similarity between two text sequences.

from autoacu import A2CU
candidates, references = ["This is a test"], ["This is a test"]
a2cu = A2CU(device=0)  # the GPU device to use
recall_scores, prec_scores, f1_scores = a2cu.score(
    references=references,
    candidates=candidates,
    generation_batch_size=2, # the batch size for ACU generation
    matching_batch_size=16 # the batch size for ACU matching
    output_path=None # the path to save the evaluation results
    recall_only=False # whether to only compute the recall score
    acu_path=None # the path to save the generated ACUs
    )
print(f"Recall: {recall_scores[0]:.4f}, Precision {prec_scores[0]:.4f}, F1: {f1_scores[0]:.4f}")

A3CU

The default model checkpoint for A3CU is Yale-LILY/a3cu, which is based on the BERT-Large model. Below is an example of using A3CU to evaluate the similarity between two text sequences.

from autoacu import A3CU
candidates, references = ["This is a test"], ["This is a test"]
a3cu = A3CU(device=0)  # the GPU device to use
recall_scores, prec_scores, f1_scores = a3cu.score(
    references=references,
    candidates=candidates,
    batch_size=16 # the batch size for ACU generation
    output_path=None # the path to save the evaluation results
    )
print(f"Recall: {recall_scores[0]:.4f}, Precision {prec_scores[0]:.4f}, F1: {f1_scores[0]:.4f}")

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages