Possible solution to basically double performance on 1080ti's... #1243

pflky · 2023-06-01T04:46:35Z

pflky
Jun 1, 2023

So the GP104 architecture is unique, in that its FP16 and FP32 cores are completely separate. That means you could potentially take advantage of both at once. This pages details this information:

https://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review/5

"On GP100, these FP16x2 cores are used throughout the GPU as both the GPU’s primarily FP32 core and primary FP16 core. However on GP104, NVIDIA has retained the old FP32 cores. The FP32 core count as we know it is for these pure FP32 cores."

I don't know anything about the limitations, but it seems like it should be possible to run a mix of FP16 and FP32 at the same time. Or more accurately, run FP16 on FP32 cores (or vice versa). I don't know what it would take to do so, but to me it seems like having some sort of script that automatically spoofs the remaining data of the FP32 space to be just all 0's, "throwing away" the extra precision, could be a valid approach, allowing you to keep the memory usage of FP16, while having the processing power of FP32, which is actually stronger on the 1080ti than FP16 is. I know nothing about this, so I could be way off the mark.

This might be a card specific type of hack, but the 1080ti is also one of the best selling cards of all time, so it would definitely be a worthwhile venture. Right now it can get almost 4it/s at FP32 and 0.3 token merging. With that unused processing potential, using mixed precision processing, it should be able to bump up to about 7.5it/s, maybe higher, which is pretty substantial. That means 512x512 generations could be processed in as little as 2-3 seconds, rather than 6-7 seconds. All we would need is a way to run both FP16 and FP32 in parallel, and there might already be precedent for this type of behavior.

https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Possible solution to basically double performance on 1080ti's... #1243

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Possible solution to basically double performance on 1080ti's... #1243

Uh oh!

Uh oh!

pflky Jun 1, 2023

Replies: 0 comments

pflky
Jun 1, 2023