New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MLIR][DISC] legalize disc to llvm #50434
Conversation
This PR provides conversion patterns to convert `disc_ral.dispatch_op` and `gpu.launch_op` to its disc-compatible llvm format.
rewriter.getStringAttr(value), /*alignment=*/0); | ||
|
||
rewriter.restoreInsertionPoint(ip); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check that if there is an existing globalOp it also matches the provided value
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'll add an assert here.
Value loadOrCreateGlobalString(PatternRewriter& rewriter, Operation* op, | ||
StringRef name, StringRef value) { | ||
ModuleOp module = op->getParentOfType<ModuleOp>(); | ||
GlobalOp globalOp = module.lookupSymbol<GlobalOp>(name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lookups will be expensive, can you instead build a SymbolTable
at the beginning of the function execution and update it through?
(Patterns would have to take it in their constructor and keep a reference to it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'm not sure if I fully understand what you mean. I guess that you say module.lookupSymbol
is expensive because this method needs to create new a SymbolTable
every time it's called.
It's worth noting that RalToLLVMPass
is a function-level instead module-level pass to enable maximum parallelism, thus multiple instance of RalToLLVMPass may run simultaneously. Therefore, we have to re-create a SymbolTable every time we need to lookup a symbol since other threads may insert symbols as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh that seems like a race condition: actually function-level pass can't create symbol in the outer module like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that there is no threading involved, can we look into this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was waiting for you here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry... I misses the updates here. I have fixed it.
PatternRewriter& rewriter, Operation* op) const { | ||
ModuleOp module = op->getParentOfType<ModuleOp>(); | ||
LLVMFuncOp func = module.lookupSymbol<LLVMFuncOp>(kRalDispatchFunctionName); | ||
if (!func) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, prefer early exit if possible:
if (!func) { | |
if (func) return func; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
llvm.func @the_kernel() attributes {gpu.kernel} { | ||
llvm.return | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like a precondition of this pass is that we already have a gpu.binary_blob
produced. It seems to me that in such cases the module shouldn't also have an IR content?
Not only they seem redundant, but also they could get out-of-sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! It does not need to contain the actual IR while we still need to keep the function declaration inside the module since the gpu.launch_op
needs to refer it. Thus, I'll remove the content of the the_kernel
while keeping the declaration.
// CHECK: %[[T23:.*]] = llvm.getelementptr %[[T22]]{{.*}} : (!llvm.ptr<array<90 x i8>>, i64, i64) -> !llvm.ptr<i8> | ||
// CHECK: llvm.call @disc_ral_call(%[[CTX]], %[[T23]], %[[KPACK]]) : (!llvm.ptr<i8>, !llvm.ptr<i8>, !llvm.ptr<ptr<i8>>) -> () | ||
|
||
gpu.launch_func @kernel_module::@the_kernel blocks in (%c1, %c1, %c1) threads in (%c1, %c1, %c1) args(%0 : memref<?x?xf32>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you split these patterns into different functions and use CHECK-LABEL to match the functions?
CHECK-LABEL delimit sections and makes FileCheck matching a bit more robust and debugging easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
class RalToLLVMPass : public RalToLLVMPassBase<RalToLLVMPass> { | ||
void getDependentDialects(DialectRegistry& registry) const override { | ||
registry.insert<LLVM::LLVMDialect>(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can put this information in the .td
file and it won't be needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! It will introduce llvm dialect
dependency into header file (e.g. PassDetail.h
) if we put this dialect dependency in td
file since some passes may shared one .td
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, looks fine to leave it there.
@joker-eph Hi, could you help to review the updated version? Thanks! |
@joker-eph Hi, could you help to take a look? Thanks! |
@joker-eph Hi,could you help to take a look? Thanks! |
You may have missed my comment inline: #50434 (comment) ? |
Imported from GitHub PR tensorflow/tensorflow#50434 This PR provides conversion patterns to convert `disc_ral.dispatch_op` and `gpu.launch_op` to its disc-compatible llvm format. Copybara import of the project: -- 410262f963c6ff25875355d1b7885472fc89f84e by Wenyi Zhao <reyizero@gmail.com>: [MLIR][DISC] legalize disc to llvm This PR provides conversion patterns to convert `disc_ral.dispatch_op` and `gpu.launch_op` to its disc-compatible llvm format. -- 3838e485168ce10569b3022f33225a9722073efb by Wenyi Zhao <reyizero@gmail.com>: fix -- a9e6891cfdfd317e0e7ec7001287ad87316f987d by Wenyi Zhao <reyizero@gmail.com>: fix PiperOrigin-RevId: 383282261
This PR provides conversion patterns to convert
disc_ral.dispatch_op
and
gpu.launch_op
to its disc-compatible llvm format.