-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CIR][CUDA] Generate registration function (Part 1) #1415
Conversation
a411dee
to
17ceb0b
Compare
return ("__" + prefix + name).str(); | ||
}; | ||
|
||
// MAC OS X needs special care, but we haven't supported that ABI in CIR yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assert on missing feature?
// HIP has to create an external symbol. | ||
auto cudaBinaryHandleAttr = | ||
theModule->getAttr(CIRDialect::getCUDABinaryHandleAttrName()); | ||
if (!cudaBinaryHandleAttr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the program still valid if this returns false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's valid for CUDA, because we can compile without providing a -fcuda-include-gpubinary
argument, and CIRGen will not generate this attribute. In this case we don't do anything and nor does OG.
For HIP we need to generate a new symbol (I don't know the details), but if we block it here then the existing HIP/simple.cpp
will break, as it didn't supply that argument. I'm not sure what to do in this case. Perhaps the MissingFeatures::hipModuleCtor()
also accounts for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth adding another entry for MissingFeatures::hipModuleCtor()
right here so both paths can trigger while adding HIP support.
624beeb
to
11b05e8
Compare
I added LLVM lowering checks, and found two deviations from OG:
cir.global @__cuda_fatbin_wrapper = #cir.const_struct<{
#cir.int<1180844977>,
#cir.int<1>,
#cir.ptr<null>,
#cir.ptr<null>
}
cir.func @__cuda_module_ctor() {
%0 = cir.get_global @__cuda_fatbin_wrapper
%1 = cir.get_global @__cuda_fatbin_str
%2 = cir.get_member %0[2]
%3 = cir.cast(bitcast, %2)
cir.store %1, %3
// ...
} In OG, the value is directly stored in the initializer: @1 = private constant [61 x i8] c"..." ; Fat binary raw content
@__cuda_fatbin_wrapper = internal constant { i32 1180844977, i32 1, ptr @1, ptr null } I'm not sure if it is possible to achieve the same in CIR as in OG, as I didn't find a suitable attribute to put in for |
Are you comparing with OG from the same clangir repo or from upstream? It's possible things have changed and we didn't catch up. If you annotate the discrepancies with |
Next steps to address so we can merge this: (a) add the extra assert on missing feature and (b) add the COM lines explaining the current differences against OG (so that fixes can come in later PRs). |
I'm comparing with this clangir repo. I've checked upstream commit history though. It seems the only change from last rebase is for By the way, how should I implement fix for the second discrepancy? Is there an equivalent structure in CIR? |
Cool! Thanks for double checking.
It's fine to do it this way for the first version, you can fix this in a later PR (just add a COM in the test and file a issue here on incubator's github). Are you building the OG version with |
@AdUhTkJm if you import the OG assembly with |
Found
I can't find |
If you |
@AdUhTkJm any change remaining here? Let me know and I'll land |
Currently no changes for this PR, I'm now preparing the next one |
The generation is quite complicated so I plan to separate it into several parts.
The registration function should be like:
In this PR, we generate everything except the
__cuda_register_globals
function.OG doesn't give a name to
__cuda_fatbin_str
, which isn't allowed for cir::GlobalOp, so I invented a name for it. Other names are kept consistent with OG.