This repository contains the code used for WMT14 translation experiments in Mixture of Attention Heads: Selecting Attention Heads Per Token paper.
Python 3, fairseq and PyTorch are required for the current codebase.
-
Install PyTorch and fairseq
-
Generate WMT14 translation dataset with Transformer Clinic.
-
Scripts and commands
-
Train Language Modeling
sh run.sh /path/to/your/data
-
Test Unsupervised Parsing
sh test.sh /path/to/checkpoint
In default setting, the MoA achieves a BLEU of approximately
28.4
on WMT14 EN-DE test set. -