Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TransformerEngine FP8 support #21

Closed
SinanAkkoyun opened this issue May 31, 2023 · 4 comments
Closed

TransformerEngine FP8 support #21

SinanAkkoyun opened this issue May 31, 2023 · 4 comments

Comments

@SinanAkkoyun
Copy link
Contributor

Hello! Could this work utilize the new H100 TransformerEngine for speedup? If yes, I would be very interested in that and would also pay your H100 cloud GPU access if you could estimate how long you will need it

Thank you very much!

@turboderp
Copy link
Owner

That's generous, but the project is really aimed at getting good performance out of consumer hardware. The H100 is the only GPU I'm aware of that even supports compute 9.0 and FP8, and at a price of $40,000 it's not really relevant.

I think I'm going to give TransformerEngine a good look in the near future, but mostly to examine NVIDIA's decoder implementation. There might be some good stuff in there, but it would have to be ported over and rewritten to support GPTQ. Otherwise I'd just be starting from scratch, basically.

Just from a quick peek at their TransformerLayer code, I don't immediately see any support for caching, which does hint that TE probably has a very different use case in mind from fast language model inference. Although maybe I'm just missing something. There's a lot of code to parse, after all.

@SinanAkkoyun
Copy link
Contributor Author

I see, thank you very much for the elaborate evaluation of TE, that really helps me a lot to gain insight on the topic. Thank you very much for your work, 151tps are some really mind bending numbers!

@green-s
Copy link

green-s commented May 31, 2023

The readme only mentions the H100 but Ada (4000 series) is Compute 8.9 which includes a number of the same improvements, including FP8.

@turboderp
Copy link
Owner

It does look like the 4090 has some FP8 support, despite NVIDIA's announcements, but I don't really want to write too much code that's incompatible with the 3090. At least for the time being. It's also questionable where FP8 could be useful since FP16 is already having a small (but measurable) impact on overall accuracy. Closing this for now.

@turboderp turboderp closed this as not planned Won't fix, can't repro, duplicate, stale Jun 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants