MLPSB-Parser

The code of paper "Multi-Layer Pseudo-Siamese Biaffine Model for Dependency Parsing" in COLING 2022.

Getting Started

Requirements

Install required packages with:

pip install -r requirements.txt

Dataset

The datasets are available here. Place PTB, CTB, and UD2.2 folders inside corpus folder.

To conduct experiments on specific dataset, modify corp in src/dependency/config.py. The value of corp can be selected in ["PTB", "CTB", "bg", "ca", "cs", "de", "en", "es", "fr", "it", "nl", "no", "ro", "ru"].

*.npy file stores the list of data. Each data includes 4 lists, which store token, POS tag, head and relationship of each position in the sentence. *log.txt file stores all possible POS tags and relationships in the dataset.

Pre-trained Model

To use specific pre-trained model, place corresponding model folder inside data folder, and modify pretrain_name in src/dependency/config.py to name of the model folder. The pre-trained models we use are following:

for PTB: bert-base, bert-large, XLNet-base, XLNet-large
for CTB: bert-base-chinesee
for UD2.2: bert-base-multilingual

Directory Structure

The prepared directory structure looks like this:

.
├── README.md
├── corpus
│   ├── CTB
│   ├── PTB
│   └── UD2.2
├── data
│   ├── XLNet_base
│   ├── XLNet_large
│   ├── bert_base
│   ├── bert_base_chinese
│   ├── bert_base_multilingual
│   └── bert_large
├── requirements.txt
└── src
    └── dependency_parsing

You can keep only the folders you need in corpus and data.

Usage

Go inside code folder with:

cd ./src/dependency_parsing

You can modify device in dir.py to determine the device you want to use. Train the model with:

python main.py

Evaluation on the test set happens automatically after training is complete. The results are saved in result folder.

The models are saved in model folder. The training process can be interrupted, and it will start from checkpoint (the latest model and optimizer) next time.

The program finds checkpoint by the name. Thus, if you want to restart training process from beginning, remove or rename corresponding model folder.

You can evaluate the latest model on test set with:

python main.py test

Detailed Setting

src/dependency/config.py includes experiment settings used in our paper. You can modify the setting values to conduct specific experiment. The setting values related to specific section in our paper are following:

biaff_layers: the number of layers of biaffine model (section 3.6 and 4.1)
use_lstm: whether to use LSTM (section 3.7)
attn_type: attention function, can be selected in ["biaffine", "dot", "general", "concat"] (section 4.2)
siamese: siamese method, can be selected in ["P", "T", "N"] (section 4.3)
test_limit: limit the length of sentences to be tested (section 4.4)
use_gold_head: whether to use gold head in inference (section 4.5)
time_check: whether to check the time (section 4.7)

Citation

@inproceedings{xu-etal-2022-multi,
    title = "Multi-Layer Pseudo-{S}iamese Biaffine Model for Dependency Parsing",
    author = "Xu, Ziyao  and
      Wang, Houfeng  and
      Wang, Bingdong",
    booktitle = "Proceedings of the 29th International Conference on Computational Linguistics",
    month = oct,
    year = "2022",
    address = "Gyeongju, Republic of Korea",
    publisher = "International Committee on Computational Linguistics",
    url = "https://aclanthology.org/2022.coling-1.486",
    pages = "5476--5487",
}

Acknowledgements

The code of pre-trained model embedding is based on second-order-parser.
The code of the eisner algorithm is based on bist-parser.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLPSB-Parser

Getting Started

Requirements

Dataset

Pre-trained Model

Directory Structure

Usage

Detailed Setting

Citation

Acknowledgements

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
corpus		corpus
data		data
src/dependency_parsing		src/dependency_parsing
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

xzy-xzy/MLPSB-Parser

Folders and files

Latest commit

History

Repository files navigation

MLPSB-Parser

Getting Started

Requirements

Dataset

Pre-trained Model

Directory Structure

Usage

Detailed Setting

Citation

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages