- The code for the paper "Discourse Representation Structure Parsing for Chinese".
- The data can be found in PMB's new release version, or you can download it from my Google Drive.
- If you want to test models for comparison experiments, you can download them from Google Drive.
- The models are trained by OpenNMT, or you can use AllenNLP by using the code from Rik's Github.
- python>=3.6
- pytorch==1.7.1+cu110
- torchtext==1.8.1
- cuda==11.0
- First, you need to get the data, which you can download directly. Once you get the data including English text and Chinese text and DRS for English text, you need to use tokenizer HanLP and Moses to preprocess Chinese and English respectively.
- Second, you need to use GIZA++ to process English and Chinese text together and get the alignment file "z2e.A3.final", you need to use it to replace the Chinese named-entities with English named-entities in DRS, and then you can get DRS for Chinese.
- Third, you must process the clause format DRS to the sequential format used for neural models.
- Finally, you need to split the data into valid, test, and train data, remember that the valid set and test set are gold data, and training data includes gold data and silver data.
- For the English parser, you can find the commands for training in file
silver_en_run
, and for Chinesesilver_zh_run
. - Note, the data file path for commands should be changed to the data file path where you put it.
# preprocess data
sh preproc_sbn_goldsilver.sh
# train the parser
sh train_sbn_seq_goldsilver.sh
# test the parser
sh predict_sbn_silver.sh
# or you can use the English parser to parse the English test translated from Chinese
sh predict_sbn_silver_trans.sh
- Once you get the sequential DRS data, you can use it to compare it with the gold DRS data.
- Our evaluation tool is provided in SBN-evaluation-tool
@inproceedings{Wang_Zhang_Bos_2023,
title={Discourse {R}epresentation {S}tructure {P}arsing for {C}hinese},
author={Wang, Chunliu and Zhang, Xiao and Bos, Johan},
booktitle={Proceedings of the 4th Natural Logic meets Machine Learning (NALOMAIV 2023).},
year={2023},
month={Jun},
publisher = "Association for Computational Linguistics",
}