Skip to content


Repository files navigation

Multi-Task Learning with Shared Encoder for Non-Autoregressive Machine Translatoin

This repo provides the implementation for Multi-Task Learning with Shared Encoder for Non-Autoregressive Machine Translatoin (NAACL 2021)


Non-Autoregressive machine Translation (NAT) models have demonstrated significant inference speedup but suffer from inferior translation accuracy. The common practice to tackle the problem is transferring the Autoregressive machine Translation (AT) knowledge to NAT models, e.g., with knowledge distillation. In this work, we hypothesize and empirically verify that AT and NAT encoders capture different linguistic properties of source sentences. Therefore, we propose to adopt the multi-task learning to transfer the AT knowledge to NAT models through the encoder sharing. Specifically, we take the AT model as an auxiliary task to enhance NAT model performance. Experimental results on WMT14 En-De, WMT16 En-Ro, and WMT19 En-De datasets show that the proposed Multi-Task NAT achieves significant improvements over the baseline NAT models. In addition, experimental results demonstrate that our Multi-Task NAT is complementary to the standard knowledge transfer method, knowledge distillation.

Reference Performance

Main results

We evalute our proposed model on several well established datasets. The results are as follows Main Results

Large-scale experiments

To further confirm the improvement, we conduct additional large-scale experiments on WMT19 and WMT20 En-De datasets as shown in the following table Large Scale

Reproduction Our Results

Requirements and Installation

  • PyTorch version >= 1.4.0
  • Python version >= 3.7

To install from source and develop locally:

cd multi-task-nat
pip install -e .

Data Preparation

In our paper, we use the standard benchmarks, WMT's En-De and En-Ro data. You can also replaced this with your private dataset.

To feed the data to the Fairseq models, you need to execute first. For more details, please refer to Fairseq's documents.


Use provided settings

we have already provided the launch scripts that we have used in our paper, you can simply run corresponding scripts by

sh run/

Note that you have to specify those directories left blank in run/

Use your customized arguments

You can set diffrent hyperparameters to fit your demands.

To properely use our model, you must explictly set the following args when executing

--arch mt_transformer
--task translation_mt
--criterion mt_loss

We provide the following extra arguments in this project for you to customize:

  • --share-encoder

    Share encoder's parameters between AT model and NAT model

  • --selection-criterion: str = 'nat'

    Which loss should be used to decide the best checkpoints


Before starting to inference, you may need to average several models to get the final model. The script is provided below:

python scripts/ \
  --inputs $CKPT_DIR \
  --output $CKPT_DIR/ \
  --num-update-checkpoints 5 \

where $CKPT_DIR is the checkpoints directory, and $NUM_UPD is the number of update steps.

After obatining the final model, you must explictly set when executing

--task translation_mt

You can try diffrent settings by modifying the following:

  • --iter-decode-max-iter: int = 10 Number of decoding iterates.

  • --iter-decode-length-beam: int = 5 Number of predictions of length.


If you find this work helpful, please consider citing as follows:

    title = "Multi-Task Learning with Shared Encoder for Non-Autoregressive Machine Translation",
    author = "Hao, Yongchang  and
      He, Shilin  and
      Jiao, Wenxiang  and
      Tu, Zhaopeng and
      Lyu, Michael and
      Wang, Xing",
    booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
    year = "2021",


No description, website, or topics provided.






No releases published


No packages published