Skip to content

Smelting Gold and Silver for Improved Multilingual AMR-to-Text Generation (EMNLP 2021)

License

Notifications You must be signed in to change notification settings

UKPLab/m-AMR2Text

Repository files navigation

Smelting Gold and Silver for Improved Multilingual AMR-to-Text Generation

This repository contains the code for the EMNLP 2021 paper "Smelting Gold and Silver for Improved Multilingual AMR-to-Text Generation".

Datasets

In our experiments, we use the following datasets: LDC2017T10, LDC2020T07.

Environment

The easiest way to proceed is to create a conda environment:

conda create -n structadapt python=3.6

Further, install PyTorch:

conda install -c pytorch pytorch=1.7.0

Finally, install the packages required:

pip install -r requirements.txt

Finetuning

For training mt5 using silverAMR and silverSent, execute:

./finetune.sh <SILVER_SENT_FILE> <SILVER_AMR_FILE> <DEV_FILE> <MODEL_DIR> 

where <SILVER_SENT_FILE> and <SILVER_AMR_FILE> pointer to json files for training, <DEV_FILE> pointer to the dev file, and <MODEL_DIR> is the folder where the checkpoint will be saved.

This is an example for a line in the json file:

{"source": "translate AMR to Spanish: ( relevant :polarity - :ARG1 ( or :op1 ( involve :ARG1 ( and :op1 ( face :ARG1-of ( black ) ) :op2 ( noose ) ) :ARG2 ( thing :ARG2-of ( costume :ARG1 ( you ) ) ) ) :op2 ( costume :ARG1 you :ARG2 ( sandwich :ARG1-of ( grill ) :mod ( cheese :ARG1-of ( drip :degree ( too :degree ( little ) ) ) :ARG1-of ( think :ARG0 ( involve-01 ) ) ) ) ) ) )", "target": "Si tu traje tiene un rostro negro y un nausea, o si se trata de un sándwich fritado en el que creo que el queso es un poco demasiado ardiente es irrelevante."}

Preprocessing AMR graphs

The AMR graphs need to be linearized to be fed into the model. We used the method from Ribeiro et al. 2021 for linearization: https://github.com/UKPLab/plms-graph2text.

Decoding

For decoding, run:

./test.sh <MODEL_DIR> <TEST_FILE> <GPU_ID>

Traiened Model

A checkpoint trained on SilverAMR and SilverSent can be found here. This model achieves BLEU scores of 30.7 (ES), 26.4 (IT), 20.6 (DE) and 24.2 (ZH). The outputs can be downloaded here.

More

For more details regarding hyperparameters, please refer to HuggingFace.

Contact person: Leonardo Ribeiro, ribeiro@aiphes.tu-darmstadt.de

Citation

@inproceedings{ribeiro-etal-2021-smelting,
    title = "Smelting Gold and Silver for Improved Multilingual {AMR}-to-{T}ext Generation",
    author = "Ribeiro, Leonardo F. R.  and
      Pfeiffer, Jonas  and
      Zhang, Yue  and
      Gurevych, Iryna",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.57",
    pages = "742--750",
    abstract = "Recent work on multilingual AMR-to-text generation has exclusively focused on data augmentation strategies that utilize silver AMR. However, this assumes a high quality of generated AMRs, potentially limiting the transferability to the target task. In this paper, we investigate different techniques for automatically generating AMR annotations, where we aim to study which source of information yields better multilingual results. Our models trained on gold AMR with silver (machine translated) sentences outperform approaches which leverage generated silver AMR. We find that combining both complementary sources of information further improves multilingual AMR-to-text generation. Our models surpass the previous state of the art for German, Italian, Spanish, and Chinese by a large margin.",
}

About

Smelting Gold and Silver for Improved Multilingual AMR-to-Text Generation (EMNLP 2021)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages