Making fused attention work with GPUs other than A100 #616

pearlli98 · 2022-08-09T21:44:55Z

I am curious why does the fused attention code only with A100? Is there a way to make it work on other GPUs such as Quadro GP100?

ptillet · 2022-08-10T03:15:25Z

Right now there is a compiler bug that makes the trans arguments of dot not work on pre-ampere architectures. There is a way to rewrite the forward pass without transpositions, and this works on V100. But no such workaround exists for the backward pass right now.

CarterMcClellan · 2023-02-21T04:35:49Z

Hey @ptillet @Jokeren,

This also popped up as an issue for me running through the Triton Attention Flash Attention Example last week on my Turing Generation GPU. It sounds like an interesting bug and I wanted to contribute to the repository. Is there still help-wanted on this issue?

Thanks

ptillet · 2023-02-21T07:34:41Z

I think it's probably one of the hardest open issues to fix. V100 hardware is really bad at transpositions. It's possible it may be trivial for Turing since I think sm_75 has transpositions?

jhoareau · 2023-02-21T08:05:21Z

It would be great to implement it for Turing first if it's trivial since that generation is very common in the cloud by being the cheapest compute/$ (T4 GPUs)

CarterMcClellan · 2023-02-22T04:12:34Z

Sounds good, I'll start by looking into that.

ptillet · 2023-02-22T04:21:57Z

I think there are a lot of checks for sm < 80 that can be replaced by sm < 75.

mss1213 · 2024-03-25T12:17:41Z

Hi，I see triton used tl.trans instead of tl.dot(trans_*=True),but it seems that this problem is exsiting on V100？ @ptillet

pearlli98 mentioned this issue Aug 13, 2022

Add FlashAttention Kernel in Triton facebookresearch/ParlAI#4707

Closed

Jokeren added the help wanted label Aug 15, 2022

pommedeterresautee mentioned this issue Oct 7, 2022

Flash attention crashs on turing hardware. ELS-RD/kernl#84

Closed

CarterMcClellan mentioned this issue Feb 12, 2023

[Question] Triton Flash Attention Example #1184

Closed

ptillet mentioned this issue Mar 6, 2023

Why fused attention is only applicable on Ampere GPUs? #1279

Open

kailums mentioned this issue Apr 24, 2023

implementation of flash attention fwd hangs on V100 #1567

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making fused attention work with GPUs other than A100 #616

Making fused attention work with GPUs other than A100 #616

pearlli98 commented Aug 9, 2022

ptillet commented Aug 10, 2022

CarterMcClellan commented Feb 21, 2023

ptillet commented Feb 21, 2023

jhoareau commented Feb 21, 2023

CarterMcClellan commented Feb 22, 2023

ptillet commented Feb 22, 2023

mss1213 commented Mar 25, 2024

Making fused attention work with GPUs other than A100 #616

Making fused attention work with GPUs other than A100 #616

Comments

pearlli98 commented Aug 9, 2022

ptillet commented Aug 10, 2022

CarterMcClellan commented Feb 21, 2023

ptillet commented Feb 21, 2023

jhoareau commented Feb 21, 2023

CarterMcClellan commented Feb 22, 2023

ptillet commented Feb 22, 2023

mss1213 commented Mar 25, 2024