TransformerEngine FP8 support #21

SinanAkkoyun · 2023-05-31T13:48:56Z

Hello! Could this work utilize the new H100 TransformerEngine for speedup? If yes, I would be very interested in that and would also pay your H100 cloud GPU access if you could estimate how long you will need it

Thank you very much!

turboderp · 2023-05-31T14:21:50Z

That's generous, but the project is really aimed at getting good performance out of consumer hardware. The H100 is the only GPU I'm aware of that even supports compute 9.0 and FP8, and at a price of $40,000 it's not really relevant.

I think I'm going to give TransformerEngine a good look in the near future, but mostly to examine NVIDIA's decoder implementation. There might be some good stuff in there, but it would have to be ported over and rewritten to support GPTQ. Otherwise I'd just be starting from scratch, basically.

Just from a quick peek at their TransformerLayer code, I don't immediately see any support for caching, which does hint that TE probably has a very different use case in mind from fast language model inference. Although maybe I'm just missing something. There's a lot of code to parse, after all.

SinanAkkoyun · 2023-05-31T16:12:32Z

I see, thank you very much for the elaborate evaluation of TE, that really helps me a lot to gain insight on the topic. Thank you very much for your work, 151tps are some really mind bending numbers!

green-s · 2023-05-31T21:08:34Z

The readme only mentions the H100 but Ada (4000 series) is Compute 8.9 which includes a number of the same improvements, including FP8.

turboderp · 2023-06-17T12:57:09Z

It does look like the 4090 has some FP8 support, despite NVIDIA's announcements, but I don't really want to write too much code that's incompatible with the 3090. At least for the time being. It's also questionable where FP8 could be useful since FP16 is already having a small (but measurable) impact on overall accuracy. Closing this for now.

turboderp closed this as not planned Won't fix, can't repro, duplicate, stale Jun 17, 2023

ZanMax mentioned this issue Apr 18, 2024

Run on CPU without AVX2 #315

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TransformerEngine FP8 support #21

TransformerEngine FP8 support #21

SinanAkkoyun commented May 31, 2023

turboderp commented May 31, 2023

SinanAkkoyun commented May 31, 2023

green-s commented May 31, 2023

turboderp commented Jun 17, 2023

TransformerEngine FP8 support #21

TransformerEngine FP8 support #21

Comments

SinanAkkoyun commented May 31, 2023

turboderp commented May 31, 2023

SinanAkkoyun commented May 31, 2023

green-s commented May 31, 2023

turboderp commented Jun 17, 2023