This repository contains the code for our ACL-2022 paper: AMR-DA: Data Augmentation by Abstract Meaning Representation.
This figure shows an overview of AMR-DA: AMR parser first transduces the sentence into an AMR graph, followed by an AMR graph extender to diversify graphs with different augmentation strategies; finally, the AMR generator synthesizes augmentations from AMR graphs.
The work adopts SPRING as AMR parser and plms-graph2text as AMR generator.
Augmentations examples for wiki: original data and generated augmentations.
pip install -r requirements.txt
Parse the plain text to amr graph
cd amr-parser-spring
bash predict_amr.sh <plain_text_file_path>(../data/wiki_data/wiki.txt)
Preprocess amr graph, convert to source and target string
cd data-utils/preprocess
bash prepare_data.sh <amr_file_path>(../../data/wiki_data/wiki.amr)
cd data_utils
python augment.py (modify parameters according to specific requirements)
Generate text from amr graph
cd plms-graph2text
bash decode_AMR.sh <model-path> <checkpoint> <gpu_id> <source file> <output-name>
(bash decode_AMR.sh /path/to/t5-base amr-t5-base.ckpt 0 ../data/wiki-data/wiki.source wiki-perd-t5-base.txt)
For STS tasks, we directly used the code from SimCSE. For text classification tasks, please refer the code of EDA.
Please cite this repository using the following reference:
@inproceedings{shou-etal-2022-amr,
title = "{AMR-DA}: {D}ata Augmentation by {A}bstract {M}eaning {R}epresentation",
author = "Shou, Ziyi and
Jiang, Yuxin and
Lin, Fangzhen",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2022",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.findings-acl.244",
pages = "3082--3098"
}