GitHub - txAnnie/ZP-DNLG: Code for paper "Coupling Context Modeling with Zero Pronoun Recovering for Document-Level Natural Language Generation"

Description

This repo. is prepared for the conference paper "Coupling Context Modeling with Zero Pronoun Recovering for Document-Level Natural Language Generation (EMNLP-2021)". We built the project based on the pytorch version of OpenNMT (v0.2.1).

Usage

Processing Source Data with Zero Pronoun Position Detected

The source data of pro-drop language was first processed with zero pronoun position detected. For this zero pronoun preprocess period, we provide the trained model on the Chinese language in DPro_model.

Training NMT Model

Step 1: Data Preprocessing

python preprocess.py -train_src data/src-train.txt -train_tgt data/tgt-train.txt -valid_src data/src-val.txt -valid_tgt data/tgt-val.txt -save_data data/demo

Step 2: Model Training

Training the sentence-level NMT baseline:

python train.py -data path_to_preprocessed_data/data -freeze_d True -save_model path_to_saved_model/model -layers 6 -rnn_size 512 -word_vec_size 512 -transformer_ff 2048 -heads 8 -encoder_type transformer -decoder_type transformer -position_encoding -train_steps 300000 -max_generator_batches 2 -dropout 0.1 -batch_size 4096 -batch_type tokens -normalization tokens -accum_count 2 -optim adam -adam_beta2 0.998 -decay_method noam -warmup_steps 8000 -learning_rate 2 -max_grad_norm 0 -param_init 0 -param_init_glorot -label_smoothing 0.1 -valid_steps 30000 -save_checkpoint_steps 30000

Training the document-level NMT with zero pronoun recovered:

python train.py -data path_to_preprocessed_data/data -freeze_d False -train_from path_to_pretrained_sentence-level_model -save_model path_to_saved_model/model -layers 6 -rnn_size 512 -word_vec_size 512 -transformer_ff 2048 -heads 8 -encoder_type transformer -decoder_type transformer -position_encoding -train_steps 300000 -max_generator_batches 2 -dropout 0.1 -batch_size 4096 -batch_type tokens -normalization tokens -accum_count 2 -optim adam -adam_beta2 0.998 -decay_method noam -warmup_steps 8000 -learning_rate2 -max_grad_norm 0 -param_init 0 -param_init_glorot -label_smoothing 0.1 -valid_steps 200 -save_checkpoint_steps 200

Translation

python translate.py -model path_to_trained_document-level_model -src path_to_source_data -output path_to_save_output

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.idea		.idea
DPro_model		DPro_model
__pycache__		__pycache__
onmt		onmt
tools		tools
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
config.py		config.py
floyd.yml		floyd.yml
floyd_requirements.txt		floyd_requirements.txt
preprocess.py		preprocess.py
read_pro_predict.py		read_pro_predict.py
requirements.opt.txt		requirements.opt.txt
train.py		train.py
translate.py		translate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Usage

About

Releases

Packages

Languages

License

txAnnie/ZP-DNLG

Folders and files

Latest commit

History

Repository files navigation

Description

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages