Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing efficient self attention in T5 #10612

Open
2 of 3 tasks
JamesDeAntonis opened this issue Mar 9, 2021 · 1 comment
Open
2 of 3 tasks

Implementing efficient self attention in T5 #10612

JamesDeAntonis opened this issue Mar 9, 2021 · 1 comment

Comments

@JamesDeAntonis
Copy link
Contributor

JamesDeAntonis commented Mar 9, 2021

🌟 New model addition

My teammates and I (including @ice-americano) would like to use efficient self attention methods such as Linformer, Performer and Nystromformer

Model description

These new methods serve as approximations of regular attention, but reduce complexity from quadratic in the inputs to linear. We would like to add a parameter to T5 where users can specify an efficient attention method to use instead of regular attention. Ideally, this would be implemented across all models, but the models tend to have varying implementations of attention, rendering this generalization fairly tedious.

Open source status

@NielsRogge
Copy link
Contributor

There are already some PRs regarding these models, I'm working on adding the Linformer (#10587), there's also a PR for the Performer (#9325, see further down the thread - people can already train T5 with Performer).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants