Skip to content

yikangshen/MoA

Repository files navigation

Mixture of Attention Heads

This repository contains the code used for WMT14 translation experiments in Mixture of Attention Heads: Selecting Attention Heads Per Token paper.

Software Requirements

Python 3, fairseq and PyTorch are required for the current codebase.

Steps

  1. Install PyTorch and fairseq

  2. Generate WMT14 translation dataset with Transformer Clinic.

  3. Scripts and commands

    • Train Language Modeling sh run.sh /path/to/your/data

    • Test Unsupervised Parsing sh test.sh /path/to/checkpoint

    In default setting, the MoA achieves a BLEU of approximately 28.4 on WMT14 EN-DE test set.

About

Mixture of Attention Heads

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published