Skip to content

This repository contains the code and dataset for the paper The Relative Clauses AMR Parsers Hate Mosts.

Notifications You must be signed in to change notification settings

xiulinyang/relative-amr-eval

Repository files navigation

relative-amr-eval

This repository contains the code and dataset for the paper The Relative Clauses AMR Parsers Hate Most by Xiulin Yang and Nathan Schneider.

To replicate the experiments

First, you need to create a virtual environment with python==3.8 and install all the dependencies.

conda create -n venv python==3.8
pip install -r requirements.txt

dataset

The parsed results for ewt can be found in the parse_resultsfolder.

classification code

  • rc-types.py: the code to classify the relative clauses without distinguishing types of reduced relative clauses.
  • rrc-types.py: the code to classify reduced relative clauses and add EUD annotations.

evaluation code

  • reentrancy.py: the code to evaluate the output from am-parser
  • reentrancy_amrlib.py: the code to evaluate the output from other parsers that need alignment from LEAMR
  • dep_parse_amr.py: the code to generate dependency trees for data from AMR 3.0
  • amrbart_postprocess.py: the code that post-processes the parses from AMRBART.

To recover the enhanced Universal Dependency (EUD) relations back

First, go to the eud_ewtfolder and you will find the following documents.

  • en_ewt-ud-{dev,test,train}.conllu are downloaded from ewt-dev-branch

  • eud_ewt_{dev,test,train}.conllu are the post-processed files that have the recovered eud annotation.

  • rc-types.py: the script used to classify sentences based on their eud annotations (which means for the reduced relative clauses, the Cxn value in the misc column will be xxx-red-missingdep-xxx.

  • rrc-types.py: the script used to classify reduced relative clauses. The output is stored in the eud_{train,dev,test} folders (each folder contains necessary documents for each split of the EWT treebank)

  • verb_transitivity.tsv: the tsv file that contains verb transitivity information.

correction

In order to check if there is any mis-classified reduced relative clauses, you can follow the following pipeline:

  1. Go to the eud_{train,dev,test} folder and you need to check the following two documents:

    • {orc,oblrc}.txt: you need to check if sentences have misclassified examples (you can also check the .conllu files if you like).
  2. Once you find a misclassification example, you can go to eud_ewt_{train, dev, test}.conllu and you will find that the eud annotation of the sentence is likely to be wrong; you should correct it.

  3. After all the corrections of one split, you can run rc-types.py to generate an updated eud_ewt_split.conllu. You need to change the PATH variable to the correct split at the beginning of each script.

double check

  1. Once you have corrected all sentences, you can make a double check by (1) run rc-types.py to get the updated eud_ewt_split.conllu under the eud_ewt folder; (2) run rrc-types.py to get the updated reduced relative clause classification and recheck if they are correct. If some annotation is wrong, then follow the correction section.

Citation

@inproceedings{yang-schneider-2024-relative,
    title = "The Relative Clauses {AMR} Parsers Hate Most",
    author = "Yang, Xiulin  and
      Schneider, Nathan",
    editor = "Bonial, Claire  and
      Bonn, Julia  and
      Hwang, Jena D.",
    booktitle = "Proceedings of the Fifth International Workshop on Designing Meaning Representations @ LREC-COLING 2024",
    month = may,
    year = "2024",
    address = "Torino, Italia",
    publisher = "ELRA and ICCL",
    url = "https://aclanthology.org/2024.dmr-1.16",
    pages = "151--161",
    abstract = "This paper evaluates how well English Abstract Meaning Representation parsers process an important and frequent kind of Long-Distance Dependency construction, namely, relative clauses (RCs). On two syntactically parsed datasets, we evaluate five AMR parsers at recovering the semantic reentrancies triggered by different syntactic subtypes of relative clauses. Our findings reveal a general difficulty among parsers at predicting such reentrancies, with recall below 64{\%} on the EWT corpus. The sequence-to-sequence models (regardless of whether structural biases were included in training) outperform the compositional model. An analysis by relative clause subtype shows that passive subject RCs are the easiest, and oblique and reduced RCs the most challenging, for AMR parsers.",
}

TODO

  • Add argparse and bash script for easier implementation of the code.
  • Add more detailed instructions in readme.

About

This repository contains the code and dataset for the paper The Relative Clauses AMR Parsers Hate Mosts.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages