Skip to content

zide05/AdvFact

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AdvFact

The directory contains trained models, diagnostic test sets and augmented training data for paper Factuality Checker is not Faithful: Adversarial Meta-evaluation of Factuality in Summarization

Factuality metrics

Six representative factuality checkers included in the paper are as follows:

The table below represents the 6 factuality metrics and their model types as well as training datas.

Models Type Train data
MnliBert NLI-S MNLI
MnliRoberta NLI-S MNLI
MnliElectra NLI-S MNLI
Dae NLI-A PARANMT-G
FactCC NLI-S CNNDM-G
Feqa QA QA2D,SQuAD

The model type and training data of factuality metrics. NLI-A and NLI-S represent the model belongs to NLI-based metrics while defining facts as dependency arcs and span respectively. PARANMT-G and CNNDM-G mean the automatically generated training data from PARANMT and CNN/DailyMail.

Adversarial transformation codes

The codes of adversarial transformations are in the directory of adversarial transformation. To make adversarial transformation, please run the following commands:

CUDA_VISIBLE_DEVICES=0 python ./adversarial_transformation/main.py -path DATA_PATH -save_dir SAVE_DIR -trans_type all

Change the DATA_PATH and SAVE_DIR to your own data path and save directory.

Diagnostic evaluation set

Six base evaluation datasets and four adversarial transformations are included in the paper.

Every adversarial transformation can be performed on the six base evaluation datasets, thus results in 24 diagnostic evaluation set. All base evaluation datasets and diagnostic evaluation sets can be found here. The detailed information for 6 baseline test sets and 24 diagnostic sets is shown in the table below :

Base Test Sets Origin Adversarial Transformation
Dataset type Nov. #Sys. #Sam. AntoSub NumEdit EntRep SynPrun
DocAsClaim CNNDM 0 .0 0 11490 26487 25283 6816 9533
RefAsClaim CNNDM 77.7 0 10000 14131 11621 28758 4572
FaccTe CNNDM 54 10 503 670 515 440 245
QagsC CNNDM 28.6 1 504 711 615 539 351
RankTe CNNDM 52.5 3 1072 1646 1310 767 540
FaithFact XSum 99.2 5 2332 363 94 114 118

The detailed statistics of baseline (left) and diagnostic (right) test sets. For baseline test sets in the left, dataset type means the dataset that source document and summary belong to. Here, CNNDM means CNN/DailyMail dataset. Nov.(%) means the proportion of trigrams in claims that don't exist in source documents. #Sys. and #Sam. represent the number of summarization systems that the output summaries come from and the test set size respectively. For diagnostic test sets on the right, all cells mean the sample size of the sets.

Error analysis samples

The 140 samples that are misclassified by the FactCC are in the directory: data

Augmented training data

The augmented training data can be downloaded here.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages