New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
i64 index def in build_defs #53769
i64 index def in build_defs #53769
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Kushan! Looks like you tamed bazel. The main change would be to target the jit_i64_indexed_for_large_tensors
option and have i64 indexing only for large tensors.
all_precomp_kernels = zip(types, output_types, false_jits) | ||
false_use_i64_indices = [False for i in range(len(types))] | ||
all_precomp_kernels = zip( | ||
types, output_types, false_jits, false_use_i64_indices) | ||
all_kernels = all_precomp_kernels | ||
if if_mlir_generated_experimental_kernels_enabled(True, False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if statement was removed in I7cae80562ff3cd229724487be21c44e69c282b68. Can you rebase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebased.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge artifact: if_mlir_generated_experimental_kernels_enabled should go away.
@@ -360,7 +384,7 @@ def _gen_kernel_library( | |||
type = type, | |||
output_type = output_type, | |||
) | |||
for (type, output_type, jit) in all_kernels | |||
for (type, output_type, jit, index_64bit) in all_kernels | |||
] + ["//tensorflow/compiler/mlir/tools/kernel_gen:tf_framework_c_interface"] | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be jit_i64_indexed_for_large_tensors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name changed
len(jit_i64_indexed_for_large_tensors_types))] | ||
all_jit_kernels = zip(jit_types, output_jit_types, | ||
true_jits, use_i64_indices_for_jit) | ||
all_partial_jit_kernels = zip(jit_i64_indexed_for_large_tensors_types, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, maybe call this all_jit_i64_indexed_for_large_tensors_kernels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name changed
test_size = "medium"): | ||
test_size = "medium", | ||
jit_i64_indexed_for_large_tensors_types = None, | ||
output_partial_jit_types = None,): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, maybe call it output_jit_i64_indexed_for_large_tensors_types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, the two names should be jit_i64_indexed_for_large_tensors_types
and output_jit_i64_indexed_for_large_tensors_types
.
This is analogous to jit_types
and output_jit_types
@@ -162,6 +162,7 @@ def _gen_kernel_bin_impl(ctx): | |||
"--output=%s" % gpu_bin.path, | |||
"--enable_ftz=%s" % (ctx.attr.data_type == "f32"), | |||
"--cpu_codegen=%s" % ctx.attr.cpu_codegen, | |||
"--index_64bit=%s" % ctx.attr.index_64bit, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, we do not want to enforce i64 indexing for everything but only for large tensors. The right option to use here would be to use jit-i64-indexed-for-large-tensors
, which may need to be added to the kernel generator CLI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The argument is changed and also added to CLI
@@ -195,6 +196,7 @@ _gen_kernel_bin_rule = rule( | |||
"gpu_archs": attr.string_list(), | |||
"jit": attr.bool(mandatory = False), | |||
"cpu_codegen": attr.bool(mandatory = False), | |||
"index_64bit": attr.bool(mandatory = False), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also be something like jit_i64_indexed_for_large_tensors
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
@@ -260,6 +264,10 @@ def _gen_kernel_library( | |||
extra_args: Extra arguments to pass to the generator tool. | |||
test_tags: The tags to pass to the generated test. | |||
test_size: The "size" argument to pass to the test. | |||
jit_i64_indexed_for_large_tensors_types: The type to apply to the Kernel's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The option is not very self-explanatory. I think a more detailed description could help people to understand this better.
@@ -260,6 +264,10 @@ def _gen_kernel_library( | |||
extra_args: Extra arguments to pass to the generator tool. | |||
test_tags: The tags to pass to the generated test. | |||
test_size: The "size" argument to pass to the test. | |||
jit_i64_indexed_for_large_tensors_types: The type to apply to the Kernel's | |||
indexing. | |||
output_partial_jit_types: The output types for which a partial jit kernel should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Whether to" -> "The (input|output) types for which to enable..."
|
||
if cuda_gpu_architectures() or rocm_gpu_architectures() or enable_cpu: | ||
for (type, output_type, jit) in all_kernels: | ||
for (type, output_type, jit, index_64bit) in all_kernels: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
index_64bit
should be something like jit_i64_indexed_for_large_tensors
97b95dc
to
77844fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for renaming and adding the flag.
@@ -389,7 +391,9 @@ Status LowerLoopsToGPUorCPU(mlir::ModuleOp module, bool embed_memref_prints, | |||
// Take launches to launches with kernels. | |||
if (!cpu_codegen) { | |||
const std::string gpuDataLayoutSpec = | |||
"#dlti.dl_spec<#dlti.dl_entry<index,32:i32>>"; | |||
jit_i64_indexed_for_large_tensors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, you enforce i64 indexing (independently of tensor size).
The better name would be index_64bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name changed.
@@ -529,7 +533,8 @@ StatusOr<mlir::OwningOpRef<mlir::ModuleOp>> GenerateKernelForTfCode( | |||
index_64bit, cpu_codegen, | |||
jit_i64_indexed_for_large_tensors)); | |||
TF_RETURN_IF_ERROR( | |||
LowerLoopsToGPUorCPU(module.get(), embed_memref_prints, cpu_codegen)); | |||
LowerLoopsToGPUorCPU(module.get(), embed_memref_prints, cpu_codegen, | |||
jit_i64_indexed_for_large_tensors)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
test_size = "medium"): | ||
test_size = "medium", | ||
jit_i64_indexed_for_large_tensors_types = None, | ||
output_partial_jit_types = None,): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, the two names should be jit_i64_indexed_for_large_tensors_types
and output_jit_i64_indexed_for_large_tensors_types
.
This is analogous to jit_types
and output_jit_types
@@ -260,6 +264,10 @@ def _gen_kernel_library( | |||
extra_args: Extra arguments to pass to the generator tool. | |||
test_tags: The tags to pass to the generated test. | |||
test_size: The "size" argument to pass to the test. | |||
jit_i64_indexed_for_large_tensors_types: The type to apply to the Kernel's | |||
indexing. | |||
output_partial_jit_types: The output types for which a partial jit kernel should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Whether to" -> "The (input|output) types for which to enable..."
all_precomp_kernels = zip(types, output_types, false_jits) | ||
false_use_i64_indices = [False for i in range(len(types))] | ||
all_precomp_kernels = zip( | ||
types, output_types, false_jits, false_use_i64_indices) | ||
all_kernels = all_precomp_kernels | ||
if if_mlir_generated_experimental_kernels_enabled(True, False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge artifact: if_mlir_generated_experimental_kernels_enabled should go away.
all_kernels = aot_kernels + jit_kernels | ||
|
||
if not output_types: | ||
output_types = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bazel will complain about this: The [] are missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
jit_types = [] | ||
if not output_jit_types: | ||
output_jit_types = jit_types | ||
true_jits = [True for i in range(len(jit_types))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you reorganize this code segment so that we collect the different kinds of kernels one after another? I think this could make it easier to read.
- the AOT kernels
- the fully JIT-compiled kernels
- the partially JIT-compiled kernels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Organized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it's only small changes now. Let me know when it passes the tests locally and I can trigger the CI tests.
@@ -189,6 +184,10 @@ int main(int argc, char** argv) { | |||
"unroll_factors", | |||
llvm::cl::desc("factors to unroll by, separated by commas"), | |||
llvm::cl::ZeroOrMore, llvm::cl::CommaSeparated); | |||
llvm::cl::opt<bool> jit_compile( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jit_compile
-> jit_i64_indexed_for_large_tensors
Does this compile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
output_jit_i64_indexed_for_large_tensors_types: The (input|output) types for which to enable. | ||
JIT compilation of i64-indexed kernels for | ||
large inputs. | ||
output_partial_jit_types: The output types for which a partial jit kernel should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not what I meant. I should be two arguments here:
jit_i64_indexed_for_large_tensors_types: The input types for which to enable JIT compilation of i64-indexed kernels for large inputs.
output_jit_i64_indexed_for_large_tensors_types: The output types for which to enable JIT compilation of i64-indexed kernels for large inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah, fixed that.
output_jit_i64_indexed_for_large_tensors_types = [] | ||
if not output_partial_jit_types: | ||
output_partial_jit_types = output_jit_i64_indexed_for_large_tensors_types | ||
# ully JIT-compiled kernels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ully
-> Fully
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
output_partial_jit_types = output_jit_i64_indexed_for_large_tensors_types | ||
# ully JIT-compiled kernels | ||
true_jits = [True for i in range(len(jit_types))] | ||
use_i64_indices_for_jit = [False for i in jit_types] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use_i64_indices_for_jit
may be a bit misleading. What about false_i64jits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name changed.
all_jit_kernels = zip(jit_types, output_jit_types, | ||
true_jits, use_i64_indices_for_jit) | ||
# Partially JIT-compiled kernels | ||
use_i64_indices_for_partial_jit = [True for i in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same: What about true_i64jits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name changed.
# Partially JIT-compiled kernels | ||
use_i64_indices_for_partial_jit = [True for i in | ||
output_jit_i64_indexed_for_large_tensors_types] | ||
true_partial_jits = [True for i in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, we want to pass False for the --jit
flag because it is not fully JIT-compiled. You can call it false_jits
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set to False
output_jit_i64_indexed_for_large_tensors_types] | ||
true_partial_jits = [True for i in | ||
output_jit_i64_indexed_for_large_tensors_types] | ||
all_paaratial_jit_kernels = zip(output_jit_i64_indexed_for_large_tensors_types, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: paaratial
-> paratial
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's one left over
@@ -284,16 +289,47 @@ def _gen_kernel_library( | |||
extra_args: Extra arguments to pass to the generator tool. | |||
test_tags: The tags to pass to the generated test. | |||
test_size: The "size" argument to pass to the test. | |||
jit_i64_indexed_for_large_tensors_types: The (input|output) types for which to enable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some left overs here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
test_size = "medium", | ||
jit_i64_indexed_for_large_tensors_types = None, | ||
output_jit_i64_indexed_for_large_tensors_types = None, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: There is a redundant comma here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
@@ -284,16 +289,43 @@ def _gen_kernel_library( | |||
extra_args: Extra arguments to pass to the generator tool. | |||
test_tags: The tags to pass to the generated test. | |||
test_size: The "size" argument to pass to the test. | |||
jit_i64_indexed_for_large_tensors_types: The (input|output) types for which to enable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concretize "(input|output)" here. Also, there is a redundant .
in the line
jit_i64_indexed_for_large_tensors_types: The (input|output) types for which to enable. | ||
JIT compilation of i64-indexed kernels for | ||
large inputs. | ||
output_jit_i64_indexed_for_large_tensors_types: The output types for which a partial jit kernel should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ...for which to enable JIT compilation of i64-indexed kernels for large inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
@frgossen can you look again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
@@ -284,16 +289,44 @@ def _gen_kernel_library( | |||
extra_args: Extra arguments to pass to the generator tool. | |||
test_tags: The tags to pass to the generated test. | |||
test_size: The "size" argument to pass to the test. | |||
jit_i64_indexed_for_large_tensors_types: The (input|output) types for which to enable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(input|output)
-> input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be the issue. Fingers crossed.
test_size = "medium"): | ||
test_size = "medium", | ||
jit_i64_indexed_for_large_tensors_types = None, | ||
output_jit_i64_indexed_for_large_tensors_types = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these (jit_i64_indexed_for_large_tensors_types
and output_jit_i64_indexed_for_large_tensors_types
) be initialized to []
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed.
""" | ||
|
||
enable_cpu = bool(platform == "cpu") | ||
|
||
aot_kernels = zip(types, output_types or types, [False for t in types]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line and alike might be what is causing your current issue.
Before, there are things like output_types or types
meaning it will use types
if output_types
is empty or None.
Now, you assign output_types = []
if they are not set, which is different from before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
all_kernels = aot_kernels + jit_kernels | ||
|
||
if not output_types: | ||
output_types = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be: if not output_types then output_types = types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed.
if not output_types: | ||
output_types = [] | ||
if not jit_types: | ||
jit_types = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is redundant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
if not output_jit_types: | ||
output_jit_types = jit_types | ||
if not jit_i64_indexed_for_large_tensors_types: | ||
jit_i64_indexed_for_large_tensors_types = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Believe this one is redundant also. At least if you update the default to be [] (see comment above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed also.
aot_kernels = zip(types, output_types, false_jits, false_jits) | ||
all_kernels = aot_kernels | ||
all_kernels += all_jit_kernels | ||
all_kernels += all_paratial_jit_kernels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This could probably go in one line: all_kernels = aot_kernels + all_jit_kernels + all_paratial_jit_kernels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Frederik, that actually fixed the ambiguous error message. However, seems like we have to decide between gpu_atanh_kernels
and gpu_atanh_kernels_experimental
. Bazel doesn't like them co-exist!
Error in _gen_mlir_op_rule: _gen_mlir_op_rule rule 'generate_atanh_gpu_f32_f32_mlir' in package 'tensorflow/core/kernels/mlir_generated' conflicts with existing _gen_mlir_op_rule rule, defined at /workspace/sw3/LLVM/tensorflow/tensorflow-source/tensorflow/core/kernels/mlir_generated/BUILD:791:19
However, we have conditioned the choice based on if_mlir_generated_experimental_kernels_enabled
. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the naming contract, so hopefully it doesn't have any side effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe even include the {op} in all the names. It cannot hurt to have the names even more specific.
_gen_mlir_op_rule( | ||
name = "generate_{op}_{platform}_{type}_{output_type}_mlir".format( | ||
op = op, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe even include the {op} in these names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
included
platform = platform, | ||
type = type, | ||
output_type = output_type, | ||
), | ||
out = "{op}_{platform}_{type}_{output_type}.mlir".format( | ||
op = op, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe even include the {op} in these names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added back
2ca162d
to
cc094d9
Compare
ca617b0
to
1b7db45
Compare
Roll back broken change. PiperOrigin-RevId: 433002716
Enabling JIT indexing within the build pipeline.
This PR is continuation on the pull req: 53622.
CC: @frgossen :)