Chinese-SBN-parsing

The code for the paper "Discourse Representation Structure Parsing for Chinese".
The data can be found in PMB's new release version, or you can download it from my Google Drive.
If you want to test models for comparison experiments, you can download them from Google Drive.
The models are trained by OpenNMT, or you can use AllenNLP by using the code from Rik's Github.

Usage

Environment (Not sure)

python>=3.6
pytorch==1.7.1+cu110
torchtext==1.8.1
cuda==11.0

Preprocess data

First, you need to get the data, which you can download directly. Once you get the data including English text and Chinese text and DRS for English text, you need to use tokenizer HanLP and Moses to preprocess Chinese and English respectively.
Second, you need to use GIZA++ to process English and Chinese text together and get the alignment file "z2e.A3.final", you need to use it to replace the Chinese named-entities with English named-entities in DRS, and then you can get DRS for Chinese.
Third, you must process the clause format DRS to the sequential format used for neural models.
Finally, you need to split the data into valid, test, and train data, remember that the valid set and test set are gold data, and training data includes gold data and silver data.

Train the model

For the English parser, you can find the commands for training in file silver_en_run, and for Chinese silver_zh_run.
Note, the data file path for commands should be changed to the data file path where you put it.

# preprocess data
sh preproc_sbn_goldsilver.sh
# train the parser
sh train_sbn_seq_goldsilver.sh
# test the parser
sh predict_sbn_silver.sh
# or you can use the English parser to parse the English test translated from Chinese
sh predict_sbn_silver_trans.sh

Evaluation

Once you get the sequential DRS data, you can use it to compare it with the gold DRS data.
Our evaluation tool is provided in SBN-evaluation-tool

Cite

@inproceedings{Wang_Zhang_Bos_2023,  
 title={Discourse {R}epresentation {S}tructure {P}arsing for {C}hinese}, 
 author={Wang, Chunliu and Zhang, Xiao and Bos, Johan}, 
 booktitle={Proceedings of the 4th Natural Logic meets Machine Learning (NALOMAIV 2023).},
 year={2023}, 
 month={Jun}, 
 publisher = "Association for Computational Linguistics",
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
SBN_data_preprocess		SBN_data_preprocess
__pycache__		__pycache__
config		config
data_preprocess		data_preprocess
onmt		onmt
silver_en_run		silver_en_run
silver_zh_run		silver_zh_run
tools		tools
README.md		README.md
SBN.py		SBN.py
preprocess.py		preprocess.py
server.py		server.py
setup.py		setup.py
train.py		train.py
translate.py		translate.py
tree.py		tree.py

wangchunliu/Chinese-SBN-parsing

Folders and files

Latest commit

History

Repository files navigation

Chinese-SBN-parsing

Usage

Environment (Not sure)

Preprocess data

Train the model

Evaluation

Cite

About

Resources

Stars

Watchers

Forks

Languages