Fastformer

Notes from the authors

Pytorch/Keras implementation of Fastformer. The keras version only includes the core fastformer attention part. The pytorch version is written in a huggingface transformers style. The jupyter notebooks contain the quickstart codes for text classification on AG's News (without pretrained word embeddings for simplicity), which can be directly run. We noticed that in our experiments, NOT all tasks need FFNN, residual connection, layer normalization and even position embedding. For example, we find that in news recommendation, it is better to directly use Fastformer without layer normalization and position embedding. However, in Ad CVR prediction, both position embedding and layer normalization are needed.

Keras version: 2.2.4 (may not be compatible with higher versions)

TF version: from 1.12 to 1.15 (may be compatible with lower versions)

Pytorch version: 1.6.0 (may be compatible with higher/lower versions)

Citation

@article{wu2021fastformer,
  title={Fastformer: Additive Attention Can Be All You Need},
  author={Wu, Chuhan and Wu, Fangzhao and Qi, Tao and Huang, Yongfeng},
  journal={arXiv preprint arXiv:2108.09084},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Fastformer-Keras.ipynb		Fastformer-Keras.ipynb
Fastformer.ipynb		Fastformer.ipynb
README.md		README.md
fastformer.json		fastformer.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fastformer

Notes from the authors

Citation

About

Releases

Packages

Languages

wuch15/Fastformer

Folders and files

Latest commit

History

Repository files navigation

Fastformer

Notes from the authors

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages