Skip to content

Codes for "Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View"

License

Notifications You must be signed in to change notification settings

zhuohan123/macaron-net

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

macaron-net

This repo contains the codes and pretrained models for our paper:

Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View
Yiping Lu*, Zhuohan Li*, Di He, Zhiqing Sun, Bin Dong, Tao Qin, Liwei Wang, Tie-Yan Liu

The two sub-directories includes reproducible codes, pre-trained models and instructions for the machine translation and unsupervised pretraining (BERT) tasks. Please find the READMEs in the sub-directories for the detailed instructions for reproduction.

Both implementations are based on open-sourced fairseq (v0.6.0). The codes for unsupervised pretraining tasks are based on StackingBERT. Note that currently the codes in bert subdirectories cannot be used to train translation models. We are working on merging two code bases and planning to release the unified version in the near future.

Citation

@article{lu2019understanding,
  title={Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View},
  author={Lu, Yiping and Li, Zhuohan and He, Di and Sun, Zhiqing and Dong, Bin and Qin, Tao and Wang, Liwei and Liu, Tie-Yan},
  journal={arXiv preprint arXiv:1906.02762},
  year={2019}
}

About

Codes for "Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages