-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making fused attention work with GPUs other than A100 #616
Comments
Right now there is a compiler bug that makes the |
I think it's probably one of the hardest open issues to fix. V100 hardware is really bad at transpositions. It's possible it may be trivial for Turing since I think sm_75 has transpositions? |
It would be great to implement it for Turing first if it's trivial since that generation is very common in the cloud by being the cheapest compute/$ (T4 GPUs) |
Sounds good, I'll start by looking into that. |
I think there are a lot of checks for |
Hi,I see triton used tl.trans instead of tl.dot(trans_*=True),but it seems that this problem is exsiting on V100? @ptillet |
I am curious why does the fused attention code only with A100? Is there a way to make it work on other GPUs such as Quadro GP100?
The text was updated successfully, but these errors were encountered: