Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making fused attention work with GPUs other than A100 #616

Open
pearlli98 opened this issue Aug 9, 2022 · 7 comments
Open

Making fused attention work with GPUs other than A100 #616

pearlli98 opened this issue Aug 9, 2022 · 7 comments

Comments

@pearlli98
Copy link

I am curious why does the fused attention code only with A100? Is there a way to make it work on other GPUs such as Quadro GP100?

@ptillet
Copy link
Collaborator

ptillet commented Aug 10, 2022

Right now there is a compiler bug that makes the trans arguments of dot not work on pre-ampere architectures. There is a way to rewrite the forward pass without transpositions, and this works on V100. But no such workaround exists for the backward pass right now.

@CarterMcClellan
Copy link

Hey @ptillet @Jokeren,

This also popped up as an issue for me running through the Triton Attention Flash Attention Example last week on my Turing Generation GPU. It sounds like an interesting bug and I wanted to contribute to the repository. Is there still help-wanted on this issue?

  • Thanks

@ptillet
Copy link
Collaborator

ptillet commented Feb 21, 2023

I think it's probably one of the hardest open issues to fix. V100 hardware is really bad at transpositions. It's possible it may be trivial for Turing since I think sm_75 has transpositions?

@jhoareau
Copy link

It would be great to implement it for Turing first if it's trivial since that generation is very common in the cloud by being the cheapest compute/$ (T4 GPUs)

@CarterMcClellan
Copy link

Sounds good, I'll start by looking into that.

@ptillet
Copy link
Collaborator

ptillet commented Feb 22, 2023

I think there are a lot of checks for sm < 80 that can be replaced by sm < 75.

@mss1213
Copy link

mss1213 commented Mar 25, 2024

Hi,I see triton used tl.trans instead of tl.dot(trans_*=True),but it seems that this problem is exsiting on V100? @ptillet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants