Cross-attention optimizations #109
Replies: 8 comments 39 replies
-
I noticed that Tiled VAE/Tiled Diffusion doesn't want to work, unless I turn on xformers (which I installed along with Torch 2.0). Am I missing something within the config of the UI maybe? |
Beta Was this translation helpful? Give feedback.
-
No, I didn’t say you suggested that, but it was posted in a few places, one big study here
<https://pytorch.org/blog/accelerated-diffusers-pt-20/>
[pytorch-logo.png]
PyTorch<https://pytorch.org/blog/accelerated-diffusers-pt-20/>
pytorch.org<https://pytorch.org/blog/accelerated-diffusers-pt-20/>
The pytotch devs are claiming these improvements, but honestly I don’t see them.
P.S: offtopic is there any way to customize font in your fork? Eyes are getting tired of the boldish looking default font :D
On 14 Apr 2023, at 02:53, Vladimir Mandic ***@***.***> wrote:
i haven't said sdp is 50% faster than xformers. but its much cleaner implementation-wise, far less installation problems (xformers is quite often not available for this-or-that-platform in binary form and needs to be recompiled separately for any minor change to torch, etc.)
so if sdp "just works", i'm happy.
xformers were amazing, but now i'd recommend them only for much older gpu's which do not have sufficient pipeline. for example, nvidia 1xxx series is better with xformers. the rest? why bother, just my $0.02.
also, xformers also break quite a few things related to training, so need to unwind them each time, etc. end-user may not care, but from dev perspective, i hate doing that all the time.
—
Reply to this email directly, view it on GitHub<#109 (reply in thread)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AESH55WR5YNAOZKTUYSF5ILXBCGVFANCNFSM6AAAAAAW5LKODY>.
You are receiving this because you commented.Message ID: ***@***.***>
…_________________________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.
This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.
|
Beta Was this translation helpful? Give feedback.
-
not cleanly yet, its on my todo. but for now, you can edit |
Beta Was this translation helpful? Give feedback.
-
When you say Xformers is better than SDP on old GPUs, like my 1660ti, is the difference noticeable? Are there any measurements? |
Beta Was this translation helpful? Give feedback.
-
--xformers still works better than opt sdp attention RTX 3090 vs RTX 3060 Ultimate Showdown for Stable Diffusion, ML, AI & Video Rendering Performance you really should support --xformers along with opt sdp attention |
Beta Was this translation helpful? Give feedback.
-
I tried both xformers and SDP in vlad fork on a RTX 4070 but got the exact same speed, so why is SDP better? |
Beta Was this translation helpful? Give feedback.
-
On my 3060 12gb windows 10 I cant get the same speed as in a1111. Vlad torch 2.0 + cu118 sdp or xformers - 6.7it/s (live previews optimized or turned off, no matter) So it is not possible to reach same speeds, sadly |
Beta Was this translation helpful? Give feedback.
-
On a 6900XT I found that sub-quad attention performs the worst but has the lowest vram usage. So kind of useful to quickly switch to (thanks to it being moved to options) if you want to render something high resolution to avoid OOM. |
Beta Was this translation helpful? Give feedback.
-
As of recently, I've moved all command line flags regarding cross-optimization options to UI settings
So things like
--xformers
are goneDefault method is scaled dot product from
torch
2.0And its probably best unless you're running on low-powered GPU (e.g. nVidia 1xxx), in which case
xformers
are still betterNote that
xformers
are not released in binary form on Windows for Torch 2.0 and CUDA 11.8. So at the moment, only way to getxformers
on Windows is to either manually compile them (requires compiler to be installed) or to find a 3rd party compiled ones (cannot guarantee quality of or any 3rd party compiled code)And AMD users report that sub-quad attention achieves best results on some systems, so give that a try
Beta Was this translation helpful? Give feedback.
All reactions