[BugFix][Codegen, CUDA] Fix faulty codegen for FP8 #17673

AntonMoberg · 2025-02-24T16:05:40Z

Fixed bug where CUDA codegen produces faulty code when a vectorizable BufferLoadNode contains a Float8 type.

Codegen generated the invalid signature
make___nv_fp8x2_e5m2(param_0[v_.x], param_0[v_.y]) where "param_0" is of type __nv_fp8_e5m2* __restrict__.

This commit adds a missing check is_float8() for CodeGenCUDA::PrintVecElemLoadExpr that is called for vectorizable BufferLoadNodes. Which instead correctly generates the signature _nv_fp8x2_e5m2(make_float2(static_cast<float>(param_0[v_.x], static_cast<float>(param_0[v_.y])))

Additionally this commit removes the added "make_" prefix for float8 in CodeGenCuda::PrintVecConstructor as the correct way to instansiate an nv_fp8x2_[e5m2/e4m3] is through the _nv_fp8x2_[e5m2/e4m3] constructor itself.

tqchen · 2025-02-24T19:35:02Z

thanks @AntonMoberg , @MasterJH5574 would be great if we can validate this PR

MasterJH5574 · 2025-02-25T14:19:10Z

Thank you @AntonMoberg! Would you mind providing an example which can reproduce the error?

AntonMoberg · 2025-02-27T08:53:24Z

Hi @tqchen & @MasterJH5574! I am trying to produce a minimal reproducible example but it is proving a bit challenging as the error only occurs in some specific scenarios. However, during this time I have encountered more faulty Codegen related to FP8. I'll get back to you with updates ASAP :)

AntonMoberg · 2025-02-28T14:35:12Z

I am converting this PR to draft while I work fleshing it out for more cases. Will provide basic tests and suggested fixes along the way!

MasterJH5574 · 2025-03-03T16:21:27Z

Thank you so much @AntonMoberg!

Fixed bug where CUDA codegen produces faulty code when a vectorizable BufferLoadNode contains a Float8 type. Codegen generated the invalid signature "make___nv_fp8x2_e5m2(param_0[v_.x], param_0[v_.y])" where "param_0" is of type "__nv_fp8_e5m2* __restrict__". This commit adds a missing check "is_float8()" for CodeGenCUDA::PrintVecElemLoadExpr that is called for vectorizable BufferLoadNodes. Which instead correctly generates the signature "_nv_fp8x2_e5m2(make_float2(static_cast<float>(param_0[v_.x], static_cast<float>(param_0[v_.y]))) Additionally this commit removes the added "make_" prefix for float8 in CodeGenCuda::PrintVecConstructor as the correct way to instansiate an nv_fp8x2_[e5m2/e4m3] is through the "_nv_fp8x2_[e5m2/e4m3]" constructor itself.

FP8 are stored as __nv_[fp8/fp8x2/fp8x4]_[e5m2/e4m3] (i.e. 16bit registers). These types do not have overloaded binary operators (such as *) to handle these types. This commit adds the ability to do this by exctracting the high and low bits, statically casting them to floats, performing the operation, then repacking them into dual lane type.

AntonMoberg force-pushed the main branch from d1495ee to 6daafd1 Compare February 28, 2025 14:32

AntonMoberg marked this pull request as draft February 28, 2025 14:33

AntonMoberg force-pushed the main branch from 6daafd1 to 1c8f666 Compare February 28, 2025 15:25

AntonMoberg added 2 commits March 4, 2025 16:11

AntonMoberg force-pushed the main branch from 1c8f666 to 37de872 Compare March 4, 2025 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix][Codegen, CUDA] Fix faulty codegen for FP8 #17673

[BugFix][Codegen, CUDA] Fix faulty codegen for FP8 #17673

AntonMoberg commented Feb 24, 2025 •

edited

Loading

tqchen commented Feb 24, 2025

MasterJH5574 commented Feb 25, 2025

AntonMoberg commented Feb 27, 2025

AntonMoberg commented Feb 28, 2025

MasterJH5574 commented Mar 3, 2025

[BugFix][Codegen, CUDA] Fix faulty codegen for FP8 #17673

Are you sure you want to change the base?

[BugFix][Codegen, CUDA] Fix faulty codegen for FP8 #17673

Conversation

AntonMoberg commented Feb 24, 2025 • edited Loading

tqchen commented Feb 24, 2025

MasterJH5574 commented Feb 25, 2025

AntonMoberg commented Feb 27, 2025

AntonMoberg commented Feb 28, 2025

MasterJH5574 commented Mar 3, 2025

AntonMoberg commented Feb 24, 2025 •

edited

Loading