Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix][Codegen, CUDA] Fix faulty codegen for FP8 #17673

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

AntonMoberg
Copy link

@AntonMoberg AntonMoberg commented Feb 24, 2025

Fixed bug where CUDA codegen produces faulty code when a vectorizable BufferLoadNode contains a Float8 type.

Codegen generated the invalid signature
make___nv_fp8x2_e5m2(param_0[v_.x], param_0[v_.y]) where "param_0" is of type __nv_fp8_e5m2* __restrict__.

This commit adds a missing check is_float8() for CodeGenCUDA::PrintVecElemLoadExpr that is called for vectorizable BufferLoadNodes. Which instead correctly generates the signature _nv_fp8x2_e5m2(make_float2(static_cast<float>(param_0[v_.x], static_cast<float>(param_0[v_.y])))

Additionally this commit removes the added "make_" prefix for float8 in CodeGenCuda::PrintVecConstructor as the correct way to instansiate an nv_fp8x2_[e5m2/e4m3] is through the _nv_fp8x2_[e5m2/e4m3] constructor itself.

@tqchen
Copy link
Member

tqchen commented Feb 24, 2025

thanks @AntonMoberg , @MasterJH5574 would be great if we can validate this PR

@MasterJH5574
Copy link
Contributor

Thank you @AntonMoberg! Would you mind providing an example which can reproduce the error?

@AntonMoberg
Copy link
Author

Hi @tqchen & @MasterJH5574! I am trying to produce a minimal reproducible example but it is proving a bit challenging as the error only occurs in some specific scenarios. However, during this time I have encountered more faulty Codegen related to FP8. I'll get back to you with updates ASAP :)

@AntonMoberg
Copy link
Author

I am converting this PR to draft while I work fleshing it out for more cases. Will provide basic tests and suggested fixes along the way!

@MasterJH5574
Copy link
Contributor

Thank you so much @AntonMoberg!

Fixed bug where CUDA codegen produces faulty code when a vectorizable
BufferLoadNode contains a Float8 type.

Codegen generated the invalid signature
"make___nv_fp8x2_e5m2(param_0[v_.x], param_0[v_.y])" where "param_0" is
of type "__nv_fp8_e5m2* __restrict__".

This commit adds a missing check "is_float8()" for
CodeGenCUDA::PrintVecElemLoadExpr that is called for
vectorizable BufferLoadNodes. Which instead correctly generates the
signature "_nv_fp8x2_e5m2(make_float2(static_cast<float>(param_0[v_.x],
static_cast<float>(param_0[v_.y])))

Additionally this commit removes the added "make_" prefix for float8 in
CodeGenCuda::PrintVecConstructor as the correct way to instansiate an
nv_fp8x2_[e5m2/e4m3] is through the "_nv_fp8x2_[e5m2/e4m3]"
constructor itself.
FP8 are stored as __nv_[fp8/fp8x2/fp8x4]_[e5m2/e4m3] (i.e. 16bit
registers). These types do not have overloaded binary operators (such as
*) to handle these types. This commit adds the ability to do this by
exctracting the high and low bits, statically casting them to floats,
performing the operation, then repacking them into dual lane type.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants