SPOR

Dataset and code for paper SPOR: A Comprehensive and Practical Evaluation Method for Compositional Generalization in Data-to-Text Generation.

Dataset

The datasets are available as dataset.jsonl files in dataset_webnlg and dataset_e2e. The fields included in the file are:

data: list[list[str]]. Input data units.
texts: list[str]. Reference texts.
input_order_1 / input_order_2: list[int]. The input order of the data units. Data units are indicated by their index in data. The fields appear only in order-invariance_main.
output_order: list[list[int]]. The order of occurrence of data units in the reference texts. The field appear only in order-invariance_main.
replaced_entity: dict[str, str]. Replaced entities and their representation after replacement. The field appear only in test_webnlg_rule-learnability.
replaced_kv_pair: dict[str, str]. The attributes to which the replaced values correspond, and the representations of the replaced values. The field appear only in test_e2e_rule-learnability, with possible_value.json to indicate all possible values corresponding to the attribute.

Code

Getting Started

Start by:

cd code

Install required packages with:

pip install -r requirements.txt

Dataset construction

See code/prepare_corpus.sh for commands for dataset construction.
See code/construction for the specific code for dataset construction.
If you want to reproduce the construction process, first delete the existing dataset files.

Training and Evaluation

See code/run.sh for commands for training and evaluation.

Citation

Welcome to cite our work if it helps.

@inproceedings{xu-wang-2024-spor,
    title = "{SPOR}: A Comprehensive and Practical Evaluation Method for Compositional Generalization in Data-to-Text Generation",
    author = "Xu, Ziyao  and
      Wang, Houfeng",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.acl-long.36",
    doi = "10.18653/v1/2024.acl-long.36",
    pages = "604--621"
}

Acknowledgements

The code of PARENT metric is based on KaijuML/parent.
The code of corpus-reader is based on WebNLG/corpus-reader.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
code		code
dataset_e2e		dataset_e2e
dataset_webnlg		dataset_webnlg
e2e-clean		e2e-clean
webnlg3.0		webnlg3.0
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPOR

Dataset

Code

Getting Started

Dataset construction

Training and Evaluation

Citation

Acknowledgements

About

Releases

Packages

Languages

License

xzy-xzy/SPOR

Folders and files

Latest commit

History

Repository files navigation

SPOR

Dataset

Code

Getting Started

Dataset construction

Training and Evaluation

Citation

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages