Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CIR][CUDA] Generate registration function (Part 1) #1415

Merged
merged 1 commit into from
Mar 5, 2025

Conversation

AdUhTkJm
Copy link
Contributor

@AdUhTkJm AdUhTkJm commented Feb 26, 2025

The generation is quite complicated so I plan to separate it into several parts.

The registration function should be like:

const char *__cuda_fatbin_str = /* Raw content of file in -fcuda-include-gpubinary */;
struct {
  int magicNum, version;
  void *binaryData, *unused;
} __cuda_fatbin_wrapper = { /*CUDA Magic Num*/, 1, __cuda_fatbin_str, nullptr };

void __cuda_module_ctor() {
  handle = __cudaRegisterFatBinary(&wrapper);
  __cuda_register_globals();
}

In this PR, we generate everything except the __cuda_register_globals function.

OG doesn't give a name to __cuda_fatbin_str, which isn't allowed for cir::GlobalOp, so I invented a name for it. Other names are kept consistent with OG.

@AdUhTkJm AdUhTkJm force-pushed the ctordtor branch 3 times, most recently from a411dee to 17ceb0b Compare February 26, 2025 20:03
return ("__" + prefix + name).str();
};

// MAC OS X needs special care, but we haven't supported that ABI in CIR yet.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert on missing feature?

// HIP has to create an external symbol.
auto cudaBinaryHandleAttr =
theModule->getAttr(CIRDialect::getCUDABinaryHandleAttrName());
if (!cudaBinaryHandleAttr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the program still valid if this returns false?

Copy link
Contributor Author

@AdUhTkJm AdUhTkJm Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's valid for CUDA, because we can compile without providing a -fcuda-include-gpubinary argument, and CIRGen will not generate this attribute. In this case we don't do anything and nor does OG.

For HIP we need to generate a new symbol (I don't know the details), but if we block it here then the existing HIP/simple.cpp will break, as it didn't supply that argument. I'm not sure what to do in this case. Perhaps the MissingFeatures::hipModuleCtor() also accounts for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding another entry for MissingFeatures::hipModuleCtor() right here so both paths can trigger while adding HIP support.

@AdUhTkJm AdUhTkJm force-pushed the ctordtor branch 2 times, most recently from 624beeb to 11b05e8 Compare February 28, 2025 15:06
@AdUhTkJm
Copy link
Contributor Author

I added LLVM lowering checks, and found two deviations from OG:

  1. OG does not emit the registration functions when there's nothing to register.
    To ensure this test case (an empty file) still works, I'll delay the implementation of it in the next PR.

  2. When we initialize the _fatbin_wrapper, we first initialize the third field to nullptr in CIR, then fill it later in module ctor:

cir.global @__cuda_fatbin_wrapper = #cir.const_struct<{
  #cir.int<1180844977>,
  #cir.int<1>,
  #cir.ptr<null>,
  #cir.ptr<null>
}

cir.func @__cuda_module_ctor() {
    %0 = cir.get_global @__cuda_fatbin_wrapper
    %1 = cir.get_global @__cuda_fatbin_str
    %2 = cir.get_member %0[2]
    %3 = cir.cast(bitcast, %2)
    cir.store %1, %3
    // ...
}

In OG, the value is directly stored in the initializer:

@1 = private constant [61 x i8] c"..." ; Fat binary raw content
@__cuda_fatbin_wrapper = internal constant { i32 1180844977, i32 1, ptr @1, ptr null }

I'm not sure if it is possible to achieve the same in CIR as in OG, as I didn't find a suitable attribute to put in for setInitialValueAttr.

@bcardosolopes
Copy link
Member

Are you comparing with OG from the same clangir repo or from upstream? It's possible things have changed and we didn't catch up. If you annotate the discrepancies with COM and check the current ones with CHECK, it's fine to update these things later in incremental PRs.

@bcardosolopes
Copy link
Member

Next steps to address so we can merge this: (a) add the extra assert on missing feature and (b) add the COM lines explaining the current differences against OG (so that fixes can come in later PRs).

@AdUhTkJm
Copy link
Contributor Author

AdUhTkJm commented Mar 1, 2025

Are you comparing with OG from the same clangir repo or from upstream? It's possible things have changed and we didn't catch up. If you annotate the discrepancies with COM and check the current ones with CHECK, it's fine to update these things later in incremental PRs.

I'm comparing with this clangir repo. I've checked upstream commit history though. It seems the only change from last rebase is for langOpts.OffloadViaLLVM which we haven't implemented yet.

By the way, how should I implement fix for the second discrepancy? Is there an equivalent structure in CIR?

@bcardosolopes
Copy link
Member

I'm comparing with this clangir repo. I've checked upstream commit history though. It seems the only change from last rebase is for langOpts.OffloadViaLLVM which we haven't implemented yet.

Cool! Thanks for double checking.

By the way, how should I implement fix for the second discrepancy? Is there an equivalent structure in CIR? ... When we initialize the _fatbin_wrapper, we first initialize the third field to nullptr in CIR, then fill it later in module ctor:

It's fine to do it this way for the first version, you can fix this in a later PR (just add a COM in the test and file a issue here on incubator's github). Are you building the OG version with -O0? The LLVM optimizers are pretty good and might be transforming it.

@bcardosolopes
Copy link
Member

@AdUhTkJm if you import the OG assembly with llvm-translate --import-llvm x.ll you will see what kind of llvm.mlir.global is generates, and from there you can see what can be improved in your lowering approach, see mlir/test/Target/LLVMIR/Import/global-variables.ll for an example.

@AdUhTkJm
Copy link
Contributor Author

AdUhTkJm commented Mar 4, 2025

Found cir.global_view to do the job for the 2nd one. I'll delay it to the second PR and wait for this to be merged.

if you import the OG assembly with llvm-translate --import-llvm x.ll

I can't find llvm-translate in my build folder though. Is it not built by default in CMake?

@bcardosolopes
Copy link
Member

I can't find llvm-translate in my build folder though. Is it not built by default in CMake?

If you ninja llvm-translate it should work because you need MLIR as part of building CIR. ninja mlir-check will run the tests

@bcardosolopes
Copy link
Member

@AdUhTkJm any change remaining here? Let me know and I'll land

@AdUhTkJm
Copy link
Contributor Author

AdUhTkJm commented Mar 5, 2025

@AdUhTkJm any change remaining here? Let me know and I'll land

Currently no changes for this PR, I'm now preparing the next one

@bcardosolopes bcardosolopes merged commit 1184b8d into llvm:main Mar 5, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants