-
-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TransformerEngine FP8 support #21
Comments
That's generous, but the project is really aimed at getting good performance out of consumer hardware. The H100 is the only GPU I'm aware of that even supports compute 9.0 and FP8, and at a price of $40,000 it's not really relevant. I think I'm going to give TransformerEngine a good look in the near future, but mostly to examine NVIDIA's decoder implementation. There might be some good stuff in there, but it would have to be ported over and rewritten to support GPTQ. Otherwise I'd just be starting from scratch, basically. Just from a quick peek at their TransformerLayer code, I don't immediately see any support for caching, which does hint that TE probably has a very different use case in mind from fast language model inference. Although maybe I'm just missing something. There's a lot of code to parse, after all. |
I see, thank you very much for the elaborate evaluation of TE, that really helps me a lot to gain insight on the topic. Thank you very much for your work, 151tps are some really mind bending numbers! |
The readme only mentions the H100 but Ada (4000 series) is Compute 8.9 which includes a number of the same improvements, including FP8. |
It does look like the 4090 has some FP8 support, despite NVIDIA's announcements, but I don't really want to write too much code that's incompatible with the 3090. At least for the time being. It's also questionable where FP8 could be useful since FP16 is already having a small (but measurable) impact on overall accuracy. Closing this for now. |
Hello! Could this work utilize the new H100 TransformerEngine for speedup? If yes, I would be very interested in that and would also pay your H100 cloud GPU access if you could estimate how long you will need it
Thank you very much!
The text was updated successfully, but these errors were encountered: