[ROCm] Fixes for ROCm CSB breakage - 210414 #48530

deven-amd · 2021-04-15T00:18:36Z

copy-pasting individual commit messages here for description

1

Adding "no_rocm" tag to the test //tensorflow/compiler/xla/service/gpu:nvptx_compiler_test

This is a CUDA specific test, and should not be enabled on the ROCm platform

2

ROCM specific workaround for an MLIR unittest that results in a call to ThenBlasGemmWithAlgorithm

ROCm platform does not yet have autotuning support for rocBLAS GEMM API. The GemmAlgorithmPicker pass is not called on the ROCm platform ( amdgpu_compiler.cc ), so the algorithm field does not get populated, and hence the ThenBlassGemmWithAlgorithm routine does not get called.

However the MLIR unit-tests introduced by the following commit ( 15e1036) has the algorithm field pre-populated leading to failure on the ROCm platform

...
2021-04-14 23:22:14.718495: I tensorflow/compiler/xla/service/gpu/gemm_thunk.cc:63] Running GEMM thunk
2021-04-14 23:22:14.718504: I tensorflow/compiler/xla/service/gpu/gemm_thunk.cc:179] Executing a GemmThunk
2021-04-14 23:22:14.718541: I tensorflow/stream_executor/stream.cc:3487] [stream=0x5597e86929e0,impl=0x5597e868b170] Called Stream::ThenBlasGemmWithAlgorithm(transa=NoTranspose, transb=NoTranspose, m=2, n=2, k=2, alpha=1, a=0x7fe5d4201000, lda=2, b=0x7fe5d4200000, ldb=2, beta=0, c=0x7fe5d420d000, ldc=2, computation_type=f32, algorithm=7)
2021-04-14 23:22:14.718555: I tensorflow/stream_executor/plugin_registry.cc:246] Selecting default BLAS plugin, rocBLAS
2021-04-14 23:22:14.747078: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library librocblas.so
2021-04-14 23:22:14.747200: E tensorflow/stream_executor/rocm/rocm_blas.cc:1883] rocBLAS does not currently support the GEMMwithAlgorithm operation for the "float" datatype
...

This commit updates the code in gemm_thunk.cc to always skip the path that calls ThenBlasGemmWithAlgorithm routine on the ROCm platform.

3

Skipping unit-tests (within segment_reduction_ops_deterministic_test_gpu) that test complex types.

Complex type support has not yet been enabled for segment_reduction_ops on the ROCm platform.

/cc @cheshire @chsigg @sanjoy

sanjoy · 2021-04-20T05:43:10Z

tensorflow/compiler/xla/service/gpu/gemm_thunk.cc

@@ -124,7 +124,15 @@ static bool DoGemmWithAlgorithm(
                                            : se::blas::Transpose::kNoTranspose;
  auto k = lhs_matrix.transpose ? lhs_matrix.num_rows : lhs_matrix.num_cols;

-  if (algorithm) {
+  // Ignore the "algorithm" field on the ROCm platform. This is because
+  // autotuning for GEMM is not yet available on the ROCm platform


Can we fix the unit test instead?

@sanjoy

I thought about going that route, but decided against it for the following reasons.

it requires changing the unit test to not use the algorithm attribute (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/service/gpu/tests/mlir_gemm_test.cc#L34 ). This will apply to the CUDA side too, and result in that path not being tested on the CUDA side too.

the same thing could happen again, if/when more tests are added that explicitly set the algorithm attribute in a gemm call. Then we would need to go this whole process again (detect it, file a PR to fix it, etc)

If changing the unit-test is more preferable, then I can ammend the PR to make that change. Let me know.

@sanjoy gentle ping

Actually instead of doing that, WDYT about changing the ROCm-BLAS implementation to simply forward the call from ThenBlasGemmWithAlgo to ThenBlasGemm?

Thanks Deven, keeping this check as is or George's suggestion make sense to me then.

However, I'm not sure if this check is sound -- it just checks if ROCM is linked in right, not that we are compiling for ROCm? I know that for TensorFlow they are equivalent, but e.g. for JAX it might not be.

@cheshire I looked into your suggestion, and I would like to implement it as a separate PR. even though ROCm only supports 1 algorithm choice right now, we do have all the hooks in place to be able to properly implement support for DoBlasGemmWithAlgorithm

@sanjoy I updated the implementation to make the ROCm exclusion a compile time decision

please re-review

@cheshire @sanjoy gentle ping

I'm still not sure I like that approach, sorry. I'm trying to minimize the places where we have #ifdef, having it here seems like the wrong level of abstraction. Fixing unit test indeed seems preferable, since it's wrong to select an algorithm for the ROCm platform.

. Fixing unit test indeed seems preferable, since it's wrong to select an algorithm for the ROCm platform.

done...please re-review

sanjoy · 2021-04-30T22:40:19Z

tensorflow/compiler/xla/service/gpu/gemm_thunk.cc

@@ -124,7 +124,15 @@ static bool DoGemmWithAlgorithm(
                                            : se::blas::Transpose::kNoTranspose;
  auto k = lhs_matrix.transpose ? lhs_matrix.num_rows : lhs_matrix.num_cols;

-  if (algorithm) {
+  // Ignore the "algorithm" field on the ROCm platform. This is because
+  // autotuning for GEMM is not yet available on the ROCm platform


Thanks Deven, keeping this check as is or George's suggestion make sense to me then.

However, I'm not sure if this check is sound -- it just checks if ROCM is linked in right, not that we are compiling for ROCm? I know that for TensorFlow they are equivalent, but e.g. for JAX it might not be.

cheshire · 2021-05-12T16:34:51Z

tensorflow/compiler/xla/service/gpu/gemm_thunk.cc

@@ -124,7 +124,15 @@ static bool DoGemmWithAlgorithm(
                                            : se::blas::Transpose::kNoTranspose;
  auto k = lhs_matrix.transpose ? lhs_matrix.num_rows : lhs_matrix.num_cols;

-  if (algorithm) {
+  // Ignore the "algorithm" field on the ROCm platform. This is because
+  // autotuning for GEMM is not yet available on the ROCm platform


I'm still not sure I like that approach, sorry. I'm trying to minimize the places where we have #ifdef, having it here seems like the wrong level of abstraction. Fixing unit test indeed seems preferable, since it's wrong to select an algorithm for the ROCm platform.

…u:nvptx_compiler_test This is a CUDA specific test, and should not be enabled on the ROCm platform

…to `ThenBlasGemmWithAlgorithm` ROCm platform does not yet have autotuning support for rocBLAS GEMM API. The `GemmAlgorithmPicker` pass is not called on the ROCm platform ( `amdgpu_compiler.cc` ), so the algorithm field does not get populated, and hence the `ThenBlassGemmWithAlgorithm` routine does not get called. However the MLIR unit-tests introduced by the following commit ( tensorflow@15e1036) has the algorithm field pre-populated leading to failure on the ROCm platform ``` ... 2021-04-14 23:22:14.718495: I tensorflow/compiler/xla/service/gpu/gemm_thunk.cc:63] Running GEMM thunk 2021-04-14 23:22:14.718504: I tensorflow/compiler/xla/service/gpu/gemm_thunk.cc:179] Executing a GemmThunk 2021-04-14 23:22:14.718541: I tensorflow/stream_executor/stream.cc:3487] [stream=0x5597e86929e0,impl=0x5597e868b170] Called Stream::ThenBlasGemmWithAlgorithm(transa=NoTranspose, transb=NoTranspose, m=2, n=2, k=2, alpha=1, a=0x7fe5d4201000, lda=2, b=0x7fe5d4200000, ldb=2, beta=0, c=0x7fe5d420d000, ldc=2, computation_type=f32, algorithm=7) 2021-04-14 23:22:14.718555: I tensorflow/stream_executor/plugin_registry.cc:246] Selecting default BLAS plugin, rocBLAS 2021-04-14 23:22:14.747078: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library librocblas.so 2021-04-14 23:22:14.747200: E tensorflow/stream_executor/rocm/rocm_blas.cc:1883] rocBLAS does not currently support the GEMMwithAlgorithm operation for the "float" datatype ... ``` This commit updates the code in `gemm_thunk.cc` to always skip the path that calls `ThenBlasGemmWithAlgorithm` routine on the ROCm platform.

…_gpu`) that test complex types. Complex type support has not yet been enabled for segment_reduction_ops on the ROCm platform.

…:parameter_server_training_test Initial attempt was to just skip the failing subtest (using the `test.disable_with_predicate` decorator), but that causes the following failure in non-ROCm builds ``` raise RuntimeError('You appear to be running a parameterized test case ' RuntimeError: You appear to be running a parameterized test case without having inherited from parameterized.TestCase. This is bad because none of your test cases are actually being run. You may also be using another decorator before the parameterized one, in which case you should reverse the order. ``` Reversing the decorator order results in a different error ``` ValueError: The test does not take parameters that were passed : {'use_adapt'} . ``` so abandoning that approach and adding a `no_rocm` tag to skip the test completely The subtest being failing on the ROCm platform is `KPLTest.testTrainAndServe` The error does not seem to be ROCm specific, and am hoping it will be root-caused and fixed in upstream. Skipping these subtests for now to get the ROCm unit-tests passing again. The error we see is as follows ``` ... ... [ RUN ] KPLTest.testTrainAndServe_test_mode_eager_useadapt_False ... ... INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',). I0507 13:40:41.910420 139923812845376 cross_device_ops.py:621] Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',). INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',). I0507 13:40:41.912209 139923812845376 cross_device_ops.py:621] Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',). INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',). I0507 13:40:41.912928 139923812845376 cross_device_ops.py:621] Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',). INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',). I0507 13:40:42.093631 139923812845376 cross_device_ops.py:621] Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',). INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',). I0507 13:40:42.097349 139923812845376 cross_device_ops.py:621] Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',). INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',). I0507 13:40:42.099127 139923812845376 cross_device_ops.py:621] Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',). ERROR:tensorflow:Worker /job:worker/replica:0/task:1 failed with UnavailableError():failed to connect to all addresses Additional GRPC error information from remote target /job:chief/replica:0/task:0/device:CPU:0: :{"created":"@1620394843.214809804","description":"Failed to pick subchannel","file":"external/com_github_grpc_grpc/src/core/ext/filters/client_channel/client_channel.cc","file_line":3941,"referenced_errors":[{"created":"@1620394843.214789270","description":"failed to connect to all addresses","file":"external/com_github_grpc_grpc/src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":393,"grpc_status":14}]} Additional GRPC error information from remote target /job:worker/replica:0/task:1: :{"created":"@1620394843.215581984","description":"Error received from peer ipv6:[::1]:18570","file":"external/com_github_grpc_grpc/src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"failed to connect to all addresses\nAdditional GRPC error information from remote target /job:chief/replica:0/task:0/device:CPU:0:\n:{"created":"@1620394843.214809804","description":"Failed to pick subchannel","file":"external/com_github_grpc_grpc/src/core/ext/filters/client_channel/client_channel.cc","file_line":3941,"referenced_errors":[{"created":"@1620394843.214789270","description":"failed to connect to all addresses","file":"external/com_github_grpc_grpc/src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":393,"grpc_status":14}]}","grpc_status":14} [Op:__inference_dataset_fn_471] E0507 13:40:43.216140 139850348185344 cluster_coordinator.py:680] Worker /job:worker/replica:0/task:1 failed with UnavailableError():failed to connect to all addresses Additional GRPC error information from remote target /job:chief/replica:0/task:0/device:CPU:0: :{"created":"@1620394843.214809804","description":"Failed to pick subchannel","file":"external/com_github_grpc_grpc/src/core/ext/filters/client_channel/client_channel.cc","file_line":3941,"referenced_errors":[{"created":"@1620394843.214789270","description":"failed to connect to all addresses","file":"external/com_github_grpc_grpc/src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":393,"grpc_status":14}]} Additional GRPC error information from remote target /job:worker/replica:0/task:1: :{"created":"@1620394843.215581984","description":"Error received from peer ipv6:[::1]:18570","file":"external/com_github_grpc_grpc/src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"failed to connect to all addresses\nAdditional GRPC error information from remote target /job:chief/replica:0/task:0/device:CPU:0:\n:{"created":"@1620394843.214809804","description":"Failed to pick subchannel","file":"external/com_github_grpc_grpc/src/core/ext/filters/client_channel/client_channel.cc","file_line":3941,"referenced_errors":[{"created":"@1620394843.214789270","description":"failed to connect to all addresses","file":"external/com_github_grpc_grpc/src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":393,"grpc_status":14}]}","grpc_status":14} [Op:__inference_dataset_fn_471] INFO:tensorflow:Cluster now being recovered. I0507 13:40:43.216533 139850306221824 cluster_coordinator.py:720] Cluster now being recovered. 2021-05-07 13:40:43.217208: W tensorflow/core/common_runtime/eager/context_distributed_manager.cc:671] Device filters can only be specified when initializing the cluster. Any changes in device filters are ignored when updating the server def. 2021-05-07 13:40:43.217310: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:272] Initialize GrpcChannelCache for job ps -> {0 -> localhost:17995, 1 -> localhost:21944} 2021-05-07 13:40:43.217331: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:272] Initialize GrpcChannelCache for job worker -> {0 -> localhost:19198, 1 -> localhost:18570, 2 -> localhost:19968} 2021-05-07 13:40:43.217345: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:272] Initialize GrpcChannelCache for job chief -> {0 -> localhost:57681} 2021-05-07 13:40:43.220807: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:272] Initialize GrpcChannelCache for job ps -> {0 -> localhost:17995, 1 -> localhost:21944} 2021-05-07 13:40:43.220841: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:272] Initialize GrpcChannelCache for job worker -> {0 -> localhost:19198, 1 -> localhost:18570, 2 -> localhost:19968} 2021-05-07 13:40:43.220854: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:272] Initialize GrpcChannelCache for job chief -> {0 -> localhost:57681} INFO:tensorflow:Cluster successfully recovered. ERROR:tensorflow:Worker /job:worker/replica:0/task:1 failed with UnavailableError():failed to connect to all addresses Additional GRPC error information from remote target /job:chief/replica:0/task:0/device:CPU:0: ... ... <above error repeats 100+ times and we eventually get a stack overflow> ```

…/gpu/tests:element_wise_row_vectorization.hlo.test It does a PTX IR comparison test, whose equivalent is not evailable (yet) on the ROCm platform

…ute:saved_model_mixed_api_test_gpu Three subtests started failing (as a consequence of some change within the last week or so?). Still need to root-cause the failures, but adding the no_rocm tag for now to get ROCm unit-tests passing error messages we get are ``` ====================================================================== ERROR: test_save_strategy_restore_strategy_test_distributionforrestoring_Mirrored1GPU_distributionforsaving_MirroredCPUAndGPU_mode_eager_modelandinput_SimpleSequentialModel_saveinscope_False (__main__.SavedModelSaveAndLoadTest) test_save_strategy_restore_strategy_test_distributionforrestoring_Mirrored1GPU_distributionforsaving_MirroredCPUAndGPU_mode_eager_modelandinput_SimpleSequentialModel_saveinscope_False (__main__.SavedModelSaveAndLoadTest) test_save_strategy_restore_strategy_test_distributionforrestoring_Mirrored1GPU_distributionforsaving_MirroredCPUAndGPU_mode_eager_modelandinput_SimpleSequentialModel_saveinscope_False(distribution_for_restoring=Mirrored1GPU, distribution_for_saving=MirroredCPUAndGPU, mode='eager', model_and_input=SimpleSequentialModel, save_in_scope=False) ---------------------------------------------------------------------- Traceback (most recent call last): File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/testing_utils.py", line 997, in decorated f(self, *args, **kwargs) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/absl_py/absl/testing/parameterized.py", line 263, in bound_param_test test_method(self, **testcase_params) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/framework/test_combinations.py", line 366, in decorated execute_test_method() File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/framework/test_combinations.py", line 349, in execute_test_method test_method(**kwargs_to_pass) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/distribute/combinations.py", line 517, in decorator test_method(self, **kwargs) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/distribute/saved_model_mixed_api_test.py", line 78, in test_save_strategy_restore_strategy save_in_scope) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/distribute/saved_model_test_base.py", line 248, in run_test_save_strategy_restore_strategy self._train_model(model, x_train, y_train, batch_size) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/distribute/saved_model_test_base.py", line 170, in _train_model model.fit(x=training_dataset, epochs=1, steps_per_epoch=100) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/engine/training.py", line 1188, in fit tmp_logs = self.train_function(iterator) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/def_function.py", line 885, in __call__ result = self._call(*args, **kwds) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/def_function.py", line 913, in _call return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/function.py", line 3033, in __call__ filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/function.py", line 1957, in _call_flat ctx, args, cancellation_manager=cancellation_manager)) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/function.py", line 596, in call ctx=ctx) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/execute.py", line 60, in quick_execute inputs, attrs, num_outputs) tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: Input to reshape is a tensor with 5 values, but the requested shape has 1145896960 [[node gradient_tape/mean_squared_error/Reshape (defined at usr/lib/python3.6/threading.py:916) ]] [[SGD/AddN_1/_52]] (1) Invalid argument: Input to reshape is a tensor with 5 values, but the requested shape has 1145896960 [[node gradient_tape/mean_squared_error/Reshape (defined at usr/lib/python3.6/threading.py:916) ]] 0 successful operations. 0 derived errors ignored. [Op:__inference_train_function_195971] Function call stack: train_function -> train_function ====================================================================== ERROR: test_save_strategy_restore_strategy_test_distributionforrestoring_Mirrored1GPU_distributionforsaving_OneDeviceGPU_mode_eager_modelandinput_SimpleSubclassModel_saveinscope_True (__main__.SavedModelSaveAndLoadTest) test_save_strategy_restore_strategy_test_distributionforrestoring_Mirrored1GPU_distributionforsaving_OneDeviceGPU_mode_eager_modelandinput_SimpleSubclassModel_saveinscope_True (__main__.SavedModelSaveAndLoadTest) test_save_strategy_restore_strategy_test_distributionforrestoring_Mirrored1GPU_distributionforsaving_OneDeviceGPU_mode_eager_modelandinput_SimpleSubclassModel_saveinscope_True(distribution_for_restoring=Mirrored1GPU, distribution_for_saving=OneDeviceGPU, mode='eager', model_and_input=SimpleSubclassModel, save_in_scope=True) ---------------------------------------------------------------------- Traceback (most recent call last): File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/testing_utils.py", line 997, in decorated f(self, *args, **kwargs) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/absl_py/absl/testing/parameterized.py", line 263, in bound_param_test test_method(self, **testcase_params) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/framework/test_combinations.py", line 366, in decorated execute_test_method() File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/framework/test_combinations.py", line 349, in execute_test_method test_method(**kwargs_to_pass) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/distribute/combinations.py", line 517, in decorator test_method(self, **kwargs) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/distribute/saved_model_mixed_api_test.py", line 78, in test_save_strategy_restore_strategy save_in_scope) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/distribute/saved_model_test_base.py", line 248, in run_test_save_strategy_restore_strategy self._train_model(model, x_train, y_train, batch_size) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/distribute/saved_model_test_base.py", line 170, in _train_model model.fit(x=training_dataset, epochs=1, steps_per_epoch=100) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/engine/training.py", line 1188, in fit tmp_logs = self.train_function(iterator) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/def_function.py", line 885, in __call__ result = self._call(*args, **kwds) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/def_function.py", line 913, in _call return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/function.py", line 3033, in __call__ filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/function.py", line 1957, in _call_flat ctx, args, cancellation_manager=cancellation_manager)) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/function.py", line 596, in call ctx=ctx) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/execute.py", line 60, in quick_execute inputs, attrs, num_outputs) tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 10 values, but the requested shape has 0 [[node gradient_tape/mean_squared_error/Reshape (defined at root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/distribute/saved_model_mixed_api_test.py:78) ]] [Op:__inference_train_function_216342] Function call stack: train_function ====================================================================== ERROR: test_save_strategy_restore_strategy_test_distributionforrestoring_OneDeviceCPU_distributionforsaving_MirroredCPUAndGPU_mode_eager_modelandinput_SimpleSubclassModel_saveinscope_False (__main__.SavedModelSaveAndLoadTest) test_save_strategy_restore_strategy_test_distributionforrestoring_OneDeviceCPU_distributionforsaving_MirroredCPUAndGPU_mode_eager_modelandinput_SimpleSubclassModel_saveinscope_False (__main__.SavedModelSaveAndLoadTest) test_save_strategy_restore_strategy_test_distributionforrestoring_OneDeviceCPU_distributionforsaving_MirroredCPUAndGPU_mode_eager_modelandinput_SimpleSubclassModel_saveinscope_False(distribution_for_restoring=OneDeviceCPU, distribution_for_saving=MirroredCPUAndGPU, mode='eager', model_and_input=SimpleSubclassModel, save_in_scope=False) ---------------------------------------------------------------------- Traceback (most recent call last): File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/testing_utils.py", line 997, in decorated f(self, *args, **kwargs) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/absl_py/absl/testing/parameterized.py", line 263, in bound_param_test test_method(self, **testcase_params) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/framework/test_combinations.py", line 366, in decorated execute_test_method() File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/framework/test_combinations.py", line 349, in execute_test_method test_method(**kwargs_to_pass) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/distribute/combinations.py", line 517, in decorator test_method(self, **kwargs) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/distribute/saved_model_mixed_api_test.py", line 78, in test_save_strategy_restore_strategy save_in_scope) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/distribute/saved_model_test_base.py", line 248, in run_test_save_strategy_restore_strategy self._train_model(model, x_train, y_train, batch_size) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/distribute/saved_model_test_base.py", line 170, in _train_model model.fit(x=training_dataset, epochs=1, steps_per_epoch=100) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/engine/training.py", line 1188, in fit tmp_logs = self.train_function(iterator) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/def_function.py", line 885, in __call__ result = self._call(*args, **kwds) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/def_function.py", line 913, in _call return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/function.py", line 3033, in __call__ filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/function.py", line 1957, in _call_flat ctx, args, cancellation_manager=cancellation_manager)) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/function.py", line 596, in call ctx=ctx) File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/execute.py", line 60, in quick_execute inputs, attrs, num_outputs) tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: Input to reshape is a tensor with 5 values, but the requested shape has 1120437399847525866 [[node gradient_tape/mean_squared_error/Reshape (defined at usr/lib/python3.6/threading.py:916) ]] (1) Invalid argument: Input to reshape is a tensor with 5 values, but the requested shape has 1120437399847525866 [[node gradient_tape/mean_squared_error/Reshape (defined at usr/lib/python3.6/threading.py:916) ]] [[SGD/AddN_1/_52]] 0 successful operations. 0 derived errors ignored. [Op:__inference_train_function_300901] Function call stack: train_function -> train_function ---------------------------------------------------------------------- Ran 691 tests in 683.294s FAILED (errors=3, skipped=421) ```

… in the PR feedback

cheshire · 2021-05-14T22:20:42Z

tensorflow/compiler/xla/service/gpu/tests/mlir_gemm_test.cc

@@ -31,7 +31,7 @@ TEST_F(GemmTest, SimpleCase1) {
                   %arg2: memref<2x2xf32> {lmhlo.output_index = dense<[0]> : tensor<1xindex>}) attributes {
                       result_xla_shape = "(f32[4]) "
                   } {
-          "lmhlo_gpu.gemm"(%arg0, %arg1, %arg2) {algorithm = 7 : i64, alpha_imag = 0.000000e+00 : f64, alpha_real = 1.000000e+00 : f64, batch_size = 1 : i64, dot_dimension_numbers = {lhs_batching_dimensions = dense<> : tensor<0xi64>, lhs_contracting_dimensions = dense<1> : tensor<1xi64>, rhs_batching_dimensions = dense<> : tensor<0xi64>, rhs_contracting_dimensions = dense<0> : tensor<1xi64>}} : (memref<2x2xf32>, memref<2x2xf32>, memref<2x2xf32>) -> ()
+          "lmhlo_gpu.gemm"(%arg0, %arg1, %arg2) {alpha_imag = 0.000000e+00 : f64, alpha_real = 1.000000e+00 : f64, batch_size = 1 : i64, dot_dimension_numbers = {lhs_batching_dimensions = dense<> : tensor<0xi64>, lhs_contracting_dimensions = dense<1> : tensor<1xi64>, rhs_batching_dimensions = dense<> : tensor<0xi64>, rhs_contracting_dimensions = dense<0> : tensor<1xi64>}} : (memref<2x2xf32>, memref<2x2xf32>, memref<2x2xf32>) -> ()


Should this diff be here?

yes...thought you said

Fixing unit test indeed seems preferable, since it's wrong to select an algorithm for the ROCm platform.

and hence the change above

@cheshire gentle ping

google-ml-butler bot added the size:S CL Change Size: Small label Apr 15, 2021

google-ml-butler bot requested a review from joker-eph April 15, 2021 00:18

google-cla bot added the cla: yes label Apr 15, 2021

gbaned self-assigned this Apr 15, 2021

gbaned added the comp:gpu GPU related issues label Apr 15, 2021

gbaned added this to Assigned Reviewer in PR Queue via automation Apr 15, 2021

sanjoy suggested changes Apr 20, 2021

View reviewed changes

PR Queue automation moved this from Assigned Reviewer to Reviewer Requested Changes Apr 20, 2021

gbaned added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Apr 22, 2021

sanjoy suggested changes Apr 30, 2021

View reviewed changes

deven-amd force-pushed the google_upstream_rocm_fixes_210414 branch 2 times, most recently from 176fc0d to bc79e3b Compare May 7, 2021 18:26

gbaned requested a review from sanjoy May 11, 2021 13:53

cheshire requested changes May 12, 2021

View reviewed changes

deven-amd added 7 commits May 14, 2021 13:43

Adding "no_rocm" tag to the test //tensorflow/compiler/xla/service/gp…

25d3a3e

…u:nvptx_compiler_test This is a CUDA specific test, and should not be enabled on the ROCm platform

Skipping unit-tests (within `segment_reduction_ops_deterministic_test…

c1236b1

…_gpu`) that test complex types. Complex type support has not yet been enabled for segment_reduction_ops on the ROCm platform.

Adding no_rocm tag to the unit-test //tensorflow/compiler/xla/service…

8f442e0

…/gpu/tests:element_wise_row_vectorization.hlo.test It does a PTX IR comparison test, whose equivalent is not evailable (yet) on the ROCm platform

Updating the algorithm attribute from the mlir_gemm_test as requested…

ffb210f

… in the PR feedback

deven-amd force-pushed the google_upstream_rocm_fixes_210414 branch from bc79e3b to ffb210f Compare May 14, 2021 20:27

cheshire reviewed May 14, 2021

View reviewed changes

sanjoy requested review from cheshire and removed request for sanjoy May 18, 2021 05:35

cheshire approved these changes May 19, 2021

View reviewed changes

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels May 19, 2021

kokoro-team removed the kokoro:force-run Tests on submitted change label May 19, 2021

gbaned removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label May 20, 2021

copybara-service bot merged commit be65d1d into tensorflow:master May 20, 2021

PR Queue automation moved this from Reviewer Requested Changes to Merged May 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCm] Fixes for ROCm CSB breakage - 210414 #48530

[ROCm] Fixes for ROCm CSB breakage - 210414 #48530

deven-amd commented Apr 15, 2021

sanjoy Apr 20, 2021

deven-amd Apr 20, 2021

deven-amd Apr 30, 2021

cheshire Apr 30, 2021

sanjoy Apr 30, 2021

deven-amd May 7, 2021

deven-amd May 12, 2021

cheshire May 12, 2021

deven-amd May 14, 2021

sanjoy Apr 30, 2021

cheshire May 12, 2021

cheshire May 14, 2021

deven-amd May 18, 2021

deven-amd May 19, 2021

[ROCm] Fixes for ROCm CSB breakage - 210414 #48530

[ROCm] Fixes for ROCm CSB breakage - 210414 #48530

Conversation

deven-amd commented Apr 15, 2021

1

2

3

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment