Smelting Gold and Silver for Improved Multilingual AMR-to-Text Generation

This repository contains the code for the EMNLP 2021 paper "Smelting Gold and Silver for Improved Multilingual AMR-to-Text Generation".

Datasets

In our experiments, we use the following datasets: LDC2017T10, LDC2020T07.

Environment

The easiest way to proceed is to create a conda environment:

conda create -n structadapt python=3.6

Further, install PyTorch:

conda install -c pytorch pytorch=1.7.0

Finally, install the packages required:

pip install -r requirements.txt

Finetuning

For training mt5 using silverAMR and silverSent, execute:

./finetune.sh <SILVER_SENT_FILE> <SILVER_AMR_FILE> <DEV_FILE> <MODEL_DIR>

where <SILVER_SENT_FILE> and <SILVER_AMR_FILE> pointer to json files for training, <DEV_FILE> pointer to the dev file, and <MODEL_DIR> is the folder where the checkpoint will be saved.

This is an example for a line in the json file:

{"source": "translate AMR to Spanish: ( relevant :polarity - :ARG1 ( or :op1 ( involve :ARG1 ( and :op1 ( face :ARG1-of ( black ) ) :op2 ( noose ) ) :ARG2 ( thing :ARG2-of ( costume :ARG1 ( you ) ) ) ) :op2 ( costume :ARG1 you :ARG2 ( sandwich :ARG1-of ( grill ) :mod ( cheese :ARG1-of ( drip :degree ( too :degree ( little ) ) ) :ARG1-of ( think :ARG0 ( involve-01 ) ) ) ) ) ) )", "target": "Si tu traje tiene un rostro negro y un nausea, o si se trata de un sándwich fritado en el que creo que el queso es un poco demasiado ardiente es irrelevante."}

Preprocessing AMR graphs

The AMR graphs need to be linearized to be fed into the model. We used the method from Ribeiro et al. 2021 for linearization: https://github.com/UKPLab/plms-graph2text.

Decoding

For decoding, run:

./test.sh <MODEL_DIR> <TEST_FILE> <GPU_ID>

Traiened Model

A checkpoint trained on SilverAMR and SilverSent can be found here. This model achieves BLEU scores of 30.7 (ES), 26.4 (IT), 20.6 (DE) and 24.2 (ZH). The outputs can be downloaded here.

More

For more details regarding hyperparameters, please refer to HuggingFace.

Contact person: Leonardo Ribeiro, ribeiro@aiphes.tu-darmstadt.de

Citation

@inproceedings{ribeiro-etal-2021-smelting,
    title = "Smelting Gold and Silver for Improved Multilingual {AMR}-to-{T}ext Generation",
    author = "Ribeiro, Leonardo F. R.  and
      Pfeiffer, Jonas  and
      Zhang, Yue  and
      Gurevych, Iryna",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.57",
    pages = "742--750",
    abstract = "Recent work on multilingual AMR-to-text generation has exclusively focused on data augmentation strategies that utilize silver AMR. However, this assumes a high quality of generated AMRs, potentially limiting the transferability to the target task. In this paper, we investigate different techniques for automatically generating AMR annotations, where we aim to study which source of information yields better multilingual results. Our models trained on gold AMR with silver (machine translated) sentences outperform approaches which leverage generated silver AMR. We find that combining both complementary sources of information further improves multilingual AMR-to-text generation. Our models surpass the previous state of the art for German, Italian, Spanish, and Chinese by a large margin.",
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
img		img
outputs		outputs
transformers		transformers
LICENSE.txt		LICENSE.txt
NOTICE.txt		NOTICE.txt
README.md		README.md
__init__.py		__init__.py
finetune.sh		finetune.sh
requirements.txt		requirements.txt
run_seq2seq.py		run_seq2seq.py
test.sh		test.sh
test_seq2seq.py		test_seq2seq.py
utils_graphadapt.py		utils_graphadapt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Smelting Gold and Silver for Improved Multilingual AMR-to-Text Generation

Datasets

Environment

Finetuning

Preprocessing AMR graphs

Decoding

Traiened Model

More

Citation

About

Releases

Packages

Languages

License

UKPLab/m-AMR2Text

Folders and files

Latest commit

History

Repository files navigation

Smelting Gold and Silver for Improved Multilingual AMR-to-Text Generation

Datasets

Environment

Finetuning

Preprocessing AMR graphs

Decoding

Traiened Model

More

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages