Dense Information Flow for Neural Machine Translation

This is an implementation of the DenseNMT architecture described in the paper Dense Information Flow for Neural Machine Translation:

  title={Dense Information Flow for Neural Machine Translation},
  author={Shen, Yanyao and Tan, Xu and He, Di and Qin, Tao and Liu, Tie-Yan},
  booktitle={Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)},

It is built based on fairseq, a sequence-to-sequence learning toolkit for Torch from Facebook AI Research tailored to Neural Machine Translation (NMT). To be more specific, the DenseNMT architecture is implemented in the file fairseq/models/fconvdensemopt.lua.


Requirements and Installation

  • The related packages are listed in the github page of the facebookresearch/fairseq project. In short, we need: gpu, nccl, torch, nn. For more details for installation, please check facebookresearch/fairseq .
  • For data preprocessing, you may need other packages, such as subword-nmt for BPE level training.

Install fairseq by cloning the GitHub repository and running

luarocks make rocks/fairseq-scm-1.rockspec

Training a New Model

Data Pre-processing

The fairseq source distribution contains an example pre-processing script for the IWSLT14 German-English corpus. Pre-process and binarize the data as follows:

$ cd data/
$ bash
$ cd ..
$ TEXT=data/
$ fairseq preprocess -sourcelang de -targetlang en \
  -trainpref $TEXT/train -validpref $TEXT/valid -testpref $TEXT/test \
  -thresholdsrc 3 -thresholdtgt 3 -destdir data-bin/

This will write binarized data that can be used for model training to data-bin/


Use fairseq train to train a new model. Here is the code for training the original fairseq model for the IWSLT14 dataset.

# Fully convolutional sequence-to-sequence model
$ mkdir -p trainings/fconv
$ fairseq train -sourcelang de -targetlang en -datadir data-bin/ \
  -model fconv -nenclayer 4 -nlayer 3 -dropout 0.2 -optim nag -lr 0.25 -clip 0.1 \
  -momentum 0.99 -timeavg -bptt 0 -savedir trainings/fconv -pretrain

We include the file as an example of using our DenseNMT architecture for training the IWSLT14 dataset, which gives significant improvement.


We include the file as an example of generating predictions and calculating the BLEU scores.