You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So the GP104 architecture is unique, in that its FP16 and FP32 cores are completely separate. That means you could potentially take advantage of both at once. This pages details this information:
"On GP100, these FP16x2 cores are used throughout the GPU as both the GPU’s primarily FP32 core and primary FP16 core. However on GP104, NVIDIA has retained the old FP32 cores. The FP32 core count as we know it is for these pure FP32 cores."
I don't know anything about the limitations, but it seems like it should be possible to run a mix of FP16 and FP32 at the same time. Or more accurately, run FP16 on FP32 cores (or vice versa). I don't know what it would take to do so, but to me it seems like having some sort of script that automatically spoofs the remaining data of the FP32 space to be just all 0's, "throwing away" the extra precision, could be a valid approach, allowing you to keep the memory usage of FP16, while having the processing power of FP32, which is actually stronger on the 1080ti than FP16 is. I know nothing about this, so I could be way off the mark.
This might be a card specific type of hack, but the 1080ti is also one of the best selling cards of all time, so it would definitely be a worthwhile venture. Right now it can get almost 4it/s at FP32 and 0.3 token merging. With that unused processing potential, using mixed precision processing, it should be able to bump up to about 7.5it/s, maybe higher, which is pretty substantial. That means 512x512 generations could be processed in as little as 2-3 seconds, rather than 6-7 seconds. All we would need is a way to run both FP16 and FP32 in parallel, and there might already be precedent for this type of behavior.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
So the GP104 architecture is unique, in that its FP16 and FP32 cores are completely separate. That means you could potentially take advantage of both at once. This pages details this information:
https://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review/5
"On GP100, these FP16x2 cores are used throughout the GPU as both the GPU’s primarily FP32 core and primary FP16 core. However on GP104, NVIDIA has retained the old FP32 cores. The FP32 core count as we know it is for these pure FP32 cores."
I don't know anything about the limitations, but it seems like it should be possible to run a mix of FP16 and FP32 at the same time. Or more accurately, run FP16 on FP32 cores (or vice versa). I don't know what it would take to do so, but to me it seems like having some sort of script that automatically spoofs the remaining data of the FP32 space to be just all 0's, "throwing away" the extra precision, could be a valid approach, allowing you to keep the memory usage of FP16, while having the processing power of FP32, which is actually stronger on the 1080ti than FP16 is. I know nothing about this, so I could be way off the mark.
This might be a card specific type of hack, but the 1080ti is also one of the best selling cards of all time, so it would definitely be a worthwhile venture. Right now it can get almost 4it/s at FP32 and 0.3 token merging. With that unused processing potential, using mixed precision processing, it should be able to bump up to about 7.5it/s, maybe higher, which is pretty substantial. That means 512x512 generations could be processed in as little as 2-3 seconds, rather than 6-7 seconds. All we would need is a way to run both FP16 and FP32 in parallel, and there might already be precedent for this type of behavior.
https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html
Beta Was this translation helpful? Give feedback.
All reactions