This repository contains the code and dataset for the paper The Relative Clauses AMR Parsers Hate Most by Xiulin Yang and Nathan Schneider.
First, you need to create a virtual environment with python==3.8 and install all the dependencies.
conda create -n venv python==3.8
pip install -r requirements.txt
The parsed results for ewt can be found in the parse_results
folder.
rc-types.py
: the code to classify the relative clauses without distinguishing types of reduced relative clauses.rrc-types.py
: the code to classify reduced relative clauses and add EUD annotations.
reentrancy.py
: the code to evaluate the output from am-parserreentrancy_amrlib.py
: the code to evaluate the output from other parsers that need alignment from LEAMRdep_parse_amr.py
: the code to generate dependency trees for data from AMR 3.0amrbart_postprocess.py
: the code that post-processes the parses from AMRBART.
First, go to the eud_ewt
folder and you will find the following documents.
-
en_ewt-ud-{dev,test,train}.conllu
are downloaded from ewt-dev-branch -
eud_ewt_{dev,test,train}.conllu
are the post-processed files that have the recovered eud annotation. -
rc-types.py
: the script used to classify sentences based on their eud annotations (which means for the reduced relative clauses, the Cxn value in the misc column will be xxx-red-missingdep-xxx. -
rrc-types.py
: the script used to classify reduced relative clauses. The output is stored in theeud_{train,dev,test}
folders (each folder contains necessary documents for each split of the EWT treebank) -
verb_transitivity.tsv
: the tsv file that contains verb transitivity information.
In order to check if there is any mis-classified reduced relative clauses, you can follow the following pipeline:
-
Go to the
eud_{train,dev,test}
folder and you need to check the following two documents:{orc,oblrc}.txt
: you need to check if sentences have misclassified examples (you can also check the.conllu
files if you like).
-
Once you find a misclassification example, you can go to
eud_ewt_{train, dev, test}.conllu
and you will find that the eud annotation of the sentence is likely to be wrong; you should correct it. -
After all the corrections of one split, you can run
rc-types.py
to generate an updatedeud_ewt_split.conllu
. You need to change the PATH variable to the correct split at the beginning of each script.
- Once you have corrected all sentences, you can make a double check by (1) run
rc-types.py
to get the updatedeud_ewt_split.conllu
under theeud_ewt
folder; (2) runrrc-types.py
to get the updated reduced relative clause classification and recheck if they are correct. If some annotation is wrong, then follow the correction section.
@inproceedings{yang-schneider-2024-relative,
title = "The Relative Clauses {AMR} Parsers Hate Most",
author = "Yang, Xiulin and
Schneider, Nathan",
editor = "Bonial, Claire and
Bonn, Julia and
Hwang, Jena D.",
booktitle = "Proceedings of the Fifth International Workshop on Designing Meaning Representations @ LREC-COLING 2024",
month = may,
year = "2024",
address = "Torino, Italia",
publisher = "ELRA and ICCL",
url = "https://aclanthology.org/2024.dmr-1.16",
pages = "151--161",
abstract = "This paper evaluates how well English Abstract Meaning Representation parsers process an important and frequent kind of Long-Distance Dependency construction, namely, relative clauses (RCs). On two syntactically parsed datasets, we evaluate five AMR parsers at recovering the semantic reentrancies triggered by different syntactic subtypes of relative clauses. Our findings reveal a general difficulty among parsers at predicting such reentrancies, with recall below 64{\%} on the EWT corpus. The sequence-to-sequence models (regardless of whether structural biases were included in training) outperform the compositional model. An analysis by relative clause subtype shows that passive subject RCs are the easiest, and oblique and reduced RCs the most challenging, for AMR parsers.",
}
- Add argparse and bash script for easier implementation of the code.
- Add more detailed instructions in readme.