Implementing efficient self attention in T5 #10612

JamesDeAntonis · 2021-03-09T16:29:30Z

🌟 New model addition

My teammates and I (including @ice-americano) would like to use efficient self attention methods such as Linformer, Performer and Nystromformer

Model description

These new methods serve as approximations of regular attention, but reduce complexity from quadratic in the inputs to linear. We would like to add a parameter to T5 where users can specify an efficient attention method to use instead of regular attention. Ideally, this would be implemented across all models, but the models tend to have varying implementations of attention, rendering this generalization fairly tedious.

Open source status

the model implementation is available: repos are https://github.com/mlpen and https://github.com/lucidrains/performer-pytorch
the model weights are available: N/A
who are the authors: @mlpen and @lucidrains

NielsRogge · 2021-03-10T07:43:33Z

There are already some PRs regarding these models, I'm working on adding the Linformer (#10587), there's also a PR for the Performer (#9325, see further down the thread - people can already train T5 with Performer).

JamesDeAntonis added the New model label Mar 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing efficient self attention in T5 #10612

Implementing efficient self attention in T5 #10612

JamesDeAntonis commented Mar 9, 2021 •

edited

Loading

NielsRogge commented Mar 10, 2021

Implementing efficient self attention in T5 #10612

Implementing efficient self attention in T5 #10612

Comments

JamesDeAntonis commented Mar 9, 2021 • edited Loading

🌟 New model addition

Model description

Open source status

NielsRogge commented Mar 10, 2021

JamesDeAntonis commented Mar 9, 2021 •

edited

Loading