Dense Information Flow for Neural Machine Translation

This is an implementation of the DenseNMT architecture described in the paper Dense Information Flow for Neural Machine Translation:

@inproceedings{shen2018dense,
  title={Dense Information Flow for Neural Machine Translation},
  author={Shen, Yanyao and Tan, Xu and He, Di and Qin, Tao and Liu, Tie-Yan},
  booktitle={Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)},
  volume={1},
  pages={1294--1303},
  year={2018}
}

It is built based on fairseq, a sequence-to-sequence learning toolkit for Torch from Facebook AI Research tailored to Neural Machine Translation (NMT). To be more specific, the DenseNMT architecture is implemented in the file fairseq/models/fconvdensemopt.lua.

Requirements and Installation

The related packages are listed in the github page of the facebookresearch/fairseq project. In short, we need: gpu, nccl, torch, nn. For more details for installation, please check facebookresearch/fairseq .
For data preprocessing, you may need other packages, such as subword-nmt for BPE level training.

Install fairseq by cloning the GitHub repository and running

luarocks make rocks/fairseq-scm-1.rockspec

Training a New Model

Data Pre-processing

The fairseq source distribution contains an example pre-processing script for the IWSLT14 German-English corpus. Pre-process and binarize the data as follows:

$ cd data/
$ bash prepare-iwslt14.sh
$ cd ..
$ TEXT=data/iwslt14.tokenized.de-en
$ fairseq preprocess -sourcelang de -targetlang en \
  -trainpref $TEXT/train -validpref $TEXT/valid -testpref $TEXT/test \
  -thresholdsrc 3 -thresholdtgt 3 -destdir data-bin/iwslt14.tokenized.de-en

This will write binarized data that can be used for model training to data-bin/iwslt14.tokenized.de-en.

Training

Use fairseq train to train a new model. Here is the code for training the original fairseq model for the IWSLT14 dataset.

# Fully convolutional sequence-to-sequence model
$ mkdir -p trainings/fconv
$ fairseq train -sourcelang de -targetlang en -datadir data-bin/iwslt14.tokenized.de-en \
  -model fconv -nenclayer 4 -nlayer 3 -dropout 0.2 -optim nag -lr 0.25 -clip 0.1 \
  -momentum 0.99 -timeavg -bptt 0 -savedir trainings/fconv -pretrain

We include the file run-iwlst-de-en-example.sh as an example of using our DenseNMT architecture for training the IWSLT14 dataset, which gives significant improvement.

Generation

We include the file run-generate-iwlst-de-en-example.sh as an example of generating predictions and calculating the BLEU scores.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
data		data
fairseq		fairseq
rocks		rocks
scripts		scripts
test		test
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
PATENTS		PATENTS
README.md		README.md
fairseq.gif		fairseq.gif
generate-lines.lua		generate-lines.lua
generate.lua		generate.lua
help.lua		help.lua
optimize-fconv.lua		optimize-fconv.lua
preprocess.lua		preprocess.lua
run-generate-iwlst-de-en-example.sh		run-generate-iwlst-de-en-example.sh
run-generate-wmt-en-de-example.sh		run-generate-wmt-en-de-example.sh
run-iwlst-de-en-example.sh		run-iwlst-de-en-example.sh
run-wmt14-en-de-example.sh		run-wmt14-en-de-example.sh
run.lua		run.lua
score.lua		score.lua
tofloat.lua		tofloat.lua
train.lua		train.lua

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dense Information Flow for Neural Machine Translation

Requirements and Installation

Training a New Model

Data Pre-processing

Training

Generation

About

Releases

Packages

Languages

License

yanyao-shen/fairseq

Folders and files

Latest commit

History

Repository files navigation

Dense Information Flow for Neural Machine Translation

Requirements and Installation

Training a New Model

Data Pre-processing

Training

Generation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages