You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My teammates and I (including @ice-americano) would like to use efficient self attention methods such as Linformer, Performer and Nystromformer
Model description
These new methods serve as approximations of regular attention, but reduce complexity from quadratic in the inputs to linear. We would like to add a parameter to T5 where users can specify an efficient attention method to use instead of regular attention. Ideally, this would be implemented across all models, but the models tend to have varying implementations of attention, rendering this generalization fairly tedious.
There are already some PRs regarding these models, I'm working on adding the Linformer (#10587), there's also a PR for the Performer (#9325, see further down the thread - people can already train T5 with Performer).
🌟 New model addition
My teammates and I (including @ice-americano) would like to use efficient self attention methods such as Linformer, Performer and Nystromformer
Model description
These new methods serve as approximations of regular attention, but reduce complexity from quadratic in the inputs to linear. We would like to add a parameter to T5 where users can specify an efficient attention method to use instead of regular attention. Ideally, this would be implemented across all models, but the models tend to have varying implementations of attention, rendering this generalization fairly tedious.
Open source status
The text was updated successfully, but these errors were encountered: