Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROCm] Fixes for ROCm CSB breakage - 210414 #48530

Merged

Conversation

deven-amd
Copy link
Contributor

copy-pasting individual commit messages here for description

1

Adding "no_rocm" tag to the test //tensorflow/compiler/xla/service/gpu:nvptx_compiler_test

This is a CUDA specific test, and should not be enabled on the ROCm platform

2

ROCM specific workaround for an MLIR unittest that results in a call to ThenBlasGemmWithAlgorithm

ROCm platform does not yet have autotuning support for rocBLAS GEMM API. The GemmAlgorithmPicker pass is not called on the ROCm platform ( amdgpu_compiler.cc ), so the algorithm field does not get populated, and hence the ThenBlassGemmWithAlgorithm routine does not get called.

However the MLIR unit-tests introduced by the following commit ( 15e1036) has the algorithm field pre-populated leading to failure on the ROCm platform

...
2021-04-14 23:22:14.718495: I tensorflow/compiler/xla/service/gpu/gemm_thunk.cc:63] Running GEMM thunk
2021-04-14 23:22:14.718504: I tensorflow/compiler/xla/service/gpu/gemm_thunk.cc:179] Executing a GemmThunk
2021-04-14 23:22:14.718541: I tensorflow/stream_executor/stream.cc:3487] [stream=0x5597e86929e0,impl=0x5597e868b170] Called Stream::ThenBlasGemmWithAlgorithm(transa=NoTranspose, transb=NoTranspose, m=2, n=2, k=2, alpha=1, a=0x7fe5d4201000, lda=2, b=0x7fe5d4200000, ldb=2, beta=0, c=0x7fe5d420d000, ldc=2, computation_type=f32, algorithm=7)
2021-04-14 23:22:14.718555: I tensorflow/stream_executor/plugin_registry.cc:246] Selecting default BLAS plugin, rocBLAS
2021-04-14 23:22:14.747078: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library librocblas.so
2021-04-14 23:22:14.747200: E tensorflow/stream_executor/rocm/rocm_blas.cc:1883] rocBLAS does not currently support the GEMMwithAlgorithm operation for the "float" datatype
...

This commit updates the code in gemm_thunk.cc to always skip the path that calls ThenBlasGemmWithAlgorithm routine on the ROCm platform.

3

Skipping unit-tests (within segment_reduction_ops_deterministic_test_gpu) that test complex types.

Complex type support has not yet been enabled for segment_reduction_ops on the ROCm platform.


/cc @cheshire @chsigg @sanjoy

@google-ml-butler google-ml-butler bot added the size:S CL Change Size: Small label Apr 15, 2021
@google-cla google-cla bot added the cla: yes label Apr 15, 2021
@gbaned gbaned self-assigned this Apr 15, 2021
@gbaned gbaned added the comp:gpu GPU related issues label Apr 15, 2021
@gbaned gbaned added this to Assigned Reviewer in PR Queue via automation Apr 15, 2021
@@ -124,7 +124,15 @@ static bool DoGemmWithAlgorithm(
: se::blas::Transpose::kNoTranspose;
auto k = lhs_matrix.transpose ? lhs_matrix.num_rows : lhs_matrix.num_cols;

if (algorithm) {
// Ignore the "algorithm" field on the ROCm platform. This is because
// autotuning for GEMM is not yet available on the ROCm platform
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we fix the unit test instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanjoy

I thought about going that route, but decided against it for the following reasons.

If changing the unit-test is more preferable, then I can ammend the PR to make that change. Let me know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanjoy gentle ping

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually instead of doing that, WDYT about changing the ROCm-BLAS implementation to simply forward the call from ThenBlasGemmWithAlgo to ThenBlasGemm?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Deven, keeping this check as is or George's suggestion make sense to me then.

However, I'm not sure if this check is sound -- it just checks if ROCM is linked in right, not that we are compiling for ROCm? I know that for TensorFlow they are equivalent, but e.g. for JAX it might not be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cheshire I looked into your suggestion, and I would like to implement it as a separate PR. even though ROCm only supports 1 algorithm choice right now, we do have all the hooks in place to be able to properly implement support for DoBlasGemmWithAlgorithm

@sanjoy I updated the implementation to make the ROCm exclusion a compile time decision

please re-review

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cheshire @sanjoy gentle ping

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure I like that approach, sorry. I'm trying to minimize the places where we have #ifdef, having it here seems like the wrong level of abstraction. Fixing unit test indeed seems preferable, since it's wrong to select an algorithm for the ROCm platform.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

. Fixing unit test indeed seems preferable, since it's wrong to select an algorithm for the ROCm platform.

done...please re-review

PR Queue automation moved this from Assigned Reviewer to Reviewer Requested Changes Apr 20, 2021
@gbaned gbaned added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Apr 22, 2021
@@ -124,7 +124,15 @@ static bool DoGemmWithAlgorithm(
: se::blas::Transpose::kNoTranspose;
auto k = lhs_matrix.transpose ? lhs_matrix.num_rows : lhs_matrix.num_cols;

if (algorithm) {
// Ignore the "algorithm" field on the ROCm platform. This is because
// autotuning for GEMM is not yet available on the ROCm platform
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Deven, keeping this check as is or George's suggestion make sense to me then.

However, I'm not sure if this check is sound -- it just checks if ROCM is linked in right, not that we are compiling for ROCm? I know that for TensorFlow they are equivalent, but e.g. for JAX it might not be.

@deven-amd deven-amd force-pushed the google_upstream_rocm_fixes_210414 branch 2 times, most recently from 176fc0d to bc79e3b Compare May 7, 2021 18:26
@gbaned gbaned requested a review from sanjoy May 11, 2021 13:53
@@ -124,7 +124,15 @@ static bool DoGemmWithAlgorithm(
: se::blas::Transpose::kNoTranspose;
auto k = lhs_matrix.transpose ? lhs_matrix.num_rows : lhs_matrix.num_cols;

if (algorithm) {
// Ignore the "algorithm" field on the ROCm platform. This is because
// autotuning for GEMM is not yet available on the ROCm platform
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure I like that approach, sorry. I'm trying to minimize the places where we have #ifdef, having it here seems like the wrong level of abstraction. Fixing unit test indeed seems preferable, since it's wrong to select an algorithm for the ROCm platform.

…u:nvptx_compiler_test

This is a CUDA specific test, and should not be enabled on the ROCm platform
…to `ThenBlasGemmWithAlgorithm`

ROCm platform does not yet have autotuning support for rocBLAS GEMM API. The `GemmAlgorithmPicker` pass is not called on the ROCm platform ( `amdgpu_compiler.cc` ), so the algorithm field does not get populated, and hence the `ThenBlassGemmWithAlgorithm` routine does not get called.

However the MLIR unit-tests introduced by the following commit ( tensorflow@15e1036) has the algorithm field pre-populated leading to failure on the ROCm platform

```
...
2021-04-14 23:22:14.718495: I tensorflow/compiler/xla/service/gpu/gemm_thunk.cc:63] Running GEMM thunk
2021-04-14 23:22:14.718504: I tensorflow/compiler/xla/service/gpu/gemm_thunk.cc:179] Executing a GemmThunk
2021-04-14 23:22:14.718541: I tensorflow/stream_executor/stream.cc:3487] [stream=0x5597e86929e0,impl=0x5597e868b170] Called Stream::ThenBlasGemmWithAlgorithm(transa=NoTranspose, transb=NoTranspose, m=2, n=2, k=2, alpha=1, a=0x7fe5d4201000, lda=2, b=0x7fe5d4200000, ldb=2, beta=0, c=0x7fe5d420d000, ldc=2, computation_type=f32, algorithm=7)
2021-04-14 23:22:14.718555: I tensorflow/stream_executor/plugin_registry.cc:246] Selecting default BLAS plugin, rocBLAS
2021-04-14 23:22:14.747078: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library librocblas.so
2021-04-14 23:22:14.747200: E tensorflow/stream_executor/rocm/rocm_blas.cc:1883] rocBLAS does not currently support the GEMMwithAlgorithm operation for the "float" datatype
...

```

This commit updates the code in `gemm_thunk.cc` to always skip the path that calls `ThenBlasGemmWithAlgorithm` routine on the ROCm platform.
…_gpu`) that test complex types.

Complex type support has not yet been enabled for segment_reduction_ops on the ROCm platform.
…:parameter_server_training_test

Initial attempt was to just skip the failing subtest (using the `test.disable_with_predicate` decorator), but that causes the following failure in non-ROCm builds

```
raise RuntimeError('You appear to be running a parameterized test case '
RuntimeError: You appear to be running a parameterized test case without having inherited from parameterized.TestCase. This is bad because none of your test cases are actually being run. You may also be using another decorator before the parameterized one, in which case you should reverse the order.
```

Reversing the decorator order results in a different error

```
ValueError: The test does not take parameters that were passed : {'use_adapt'} .
```

so abandoning that approach and adding a `no_rocm` tag to skip the test completely

The subtest being failing on the ROCm platform is `KPLTest.testTrainAndServe`

The error does not seem to be ROCm specific, and am hoping it will be root-caused and fixed in upstream.
Skipping these subtests for now to get the ROCm unit-tests passing again.

The error we see is as follows
```
...
...
[ RUN      ] KPLTest.testTrainAndServe_test_mode_eager_useadapt_False
...
...
INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).
I0507 13:40:41.910420 139923812845376 cross_device_ops.py:621] Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).
INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).
I0507 13:40:41.912209 139923812845376 cross_device_ops.py:621] Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).
INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).
I0507 13:40:41.912928 139923812845376 cross_device_ops.py:621] Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).
INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).
I0507 13:40:42.093631 139923812845376 cross_device_ops.py:621] Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).
INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).
I0507 13:40:42.097349 139923812845376 cross_device_ops.py:621] Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).
INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).
I0507 13:40:42.099127 139923812845376 cross_device_ops.py:621] Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).

ERROR:tensorflow:Worker /job:worker/replica:0/task:1 failed with UnavailableError():failed to connect to all addresses
Additional GRPC error information from remote target /job:chief/replica:0/task:0/device:CPU:0:
:{"created":"@1620394843.214809804","description":"Failed to pick subchannel","file":"external/com_github_grpc_grpc/src/core/ext/filters/client_channel/client_channel.cc","file_line":3941,"referenced_errors":[{"created":"@1620394843.214789270","description":"failed to connect to all addresses","file":"external/com_github_grpc_grpc/src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":393,"grpc_status":14}]}
Additional GRPC error information from remote target /job:worker/replica:0/task:1:
:{"created":"@1620394843.215581984","description":"Error received from peer ipv6:[::1]:18570","file":"external/com_github_grpc_grpc/src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"failed to connect to all addresses\nAdditional GRPC error information from remote target /job:chief/replica:0/task:0/device:CPU:0:\n:{"created":"@1620394843.214809804","description":"Failed to pick subchannel","file":"external/com_github_grpc_grpc/src/core/ext/filters/client_channel/client_channel.cc","file_line":3941,"referenced_errors":[{"created":"@1620394843.214789270","description":"failed to connect to all addresses","file":"external/com_github_grpc_grpc/src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":393,"grpc_status":14}]}","grpc_status":14} [Op:__inference_dataset_fn_471]
E0507 13:40:43.216140 139850348185344 cluster_coordinator.py:680] Worker /job:worker/replica:0/task:1 failed with UnavailableError():failed to connect to all addresses
Additional GRPC error information from remote target /job:chief/replica:0/task:0/device:CPU:0:
:{"created":"@1620394843.214809804","description":"Failed to pick subchannel","file":"external/com_github_grpc_grpc/src/core/ext/filters/client_channel/client_channel.cc","file_line":3941,"referenced_errors":[{"created":"@1620394843.214789270","description":"failed to connect to all addresses","file":"external/com_github_grpc_grpc/src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":393,"grpc_status":14}]}
Additional GRPC error information from remote target /job:worker/replica:0/task:1:
:{"created":"@1620394843.215581984","description":"Error received from peer ipv6:[::1]:18570","file":"external/com_github_grpc_grpc/src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"failed to connect to all addresses\nAdditional GRPC error information from remote target /job:chief/replica:0/task:0/device:CPU:0:\n:{"created":"@1620394843.214809804","description":"Failed to pick subchannel","file":"external/com_github_grpc_grpc/src/core/ext/filters/client_channel/client_channel.cc","file_line":3941,"referenced_errors":[{"created":"@1620394843.214789270","description":"failed to connect to all addresses","file":"external/com_github_grpc_grpc/src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":393,"grpc_status":14}]}","grpc_status":14} [Op:__inference_dataset_fn_471]
INFO:tensorflow:Cluster now being recovered.
I0507 13:40:43.216533 139850306221824 cluster_coordinator.py:720] Cluster now being recovered.
2021-05-07 13:40:43.217208: W tensorflow/core/common_runtime/eager/context_distributed_manager.cc:671] Device filters can only be specified when initializing the cluster. Any changes in device filters are ignored when updating the server def.
2021-05-07 13:40:43.217310: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:272] Initialize GrpcChannelCache for job ps -> {0 -> localhost:17995, 1 -> localhost:21944}
2021-05-07 13:40:43.217331: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:272] Initialize GrpcChannelCache for job worker -> {0 -> localhost:19198, 1 -> localhost:18570, 2 -> localhost:19968}
2021-05-07 13:40:43.217345: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:272] Initialize GrpcChannelCache for job chief -> {0 -> localhost:57681}
2021-05-07 13:40:43.220807: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:272] Initialize GrpcChannelCache for job ps -> {0 -> localhost:17995, 1 -> localhost:21944}
2021-05-07 13:40:43.220841: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:272] Initialize GrpcChannelCache for job worker -> {0 -> localhost:19198, 1 -> localhost:18570, 2 -> localhost:19968}
2021-05-07 13:40:43.220854: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:272] Initialize GrpcChannelCache for job chief -> {0 -> localhost:57681}
INFO:tensorflow:Cluster successfully recovered.

ERROR:tensorflow:Worker /job:worker/replica:0/task:1 failed with UnavailableError():failed to connect to all addresses
Additional GRPC error information from remote target /job:chief/replica:0/task:0/device:CPU:0:
...
...

<above error repeats 100+ times and we eventually get a stack overflow>
```
…/gpu/tests:element_wise_row_vectorization.hlo.test

It does a PTX IR comparison test, whose equivalent is not evailable (yet) on the ROCm platform
…ute:saved_model_mixed_api_test_gpu

Three subtests started failing (as a consequence of some change within the last week or so?). Still need to root-cause the failures, but adding the no_rocm tag for now to get ROCm unit-tests passing

error messages we get are

```
======================================================================
ERROR: test_save_strategy_restore_strategy_test_distributionforrestoring_Mirrored1GPU_distributionforsaving_MirroredCPUAndGPU_mode_eager_modelandinput_SimpleSequentialModel_saveinscope_False (__main__.SavedModelSaveAndLoadTest)
test_save_strategy_restore_strategy_test_distributionforrestoring_Mirrored1GPU_distributionforsaving_MirroredCPUAndGPU_mode_eager_modelandinput_SimpleSequentialModel_saveinscope_False (__main__.SavedModelSaveAndLoadTest)
test_save_strategy_restore_strategy_test_distributionforrestoring_Mirrored1GPU_distributionforsaving_MirroredCPUAndGPU_mode_eager_modelandinput_SimpleSequentialModel_saveinscope_False(distribution_for_restoring=Mirrored1GPU, distribution_for_saving=MirroredCPUAndGPU, mode='eager', model_and_input=SimpleSequentialModel, save_in_scope=False)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/testing_utils.py", line 997, in decorated
    f(self, *args, **kwargs)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/absl_py/absl/testing/parameterized.py", line 263, in bound_param_test
    test_method(self, **testcase_params)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/framework/test_combinations.py", line 366, in decorated
    execute_test_method()
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/framework/test_combinations.py", line 349, in execute_test_method
    test_method(**kwargs_to_pass)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/distribute/combinations.py", line 517, in decorator
    test_method(self, **kwargs)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/distribute/saved_model_mixed_api_test.py", line 78, in test_save_strategy_restore_strategy
    save_in_scope)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/distribute/saved_model_test_base.py", line 248, in run_test_save_strategy_restore_strategy
    self._train_model(model, x_train, y_train, batch_size)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/distribute/saved_model_test_base.py", line 170, in _train_model
    model.fit(x=training_dataset, epochs=1, steps_per_epoch=100)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/engine/training.py", line 1188, in fit
    tmp_logs = self.train_function(iterator)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/def_function.py", line 885, in __call__
    result = self._call(*args, **kwds)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/def_function.py", line 913, in _call
    return self._stateless_fn(*args, **kwds)  # pylint: disable=not-callable
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/function.py", line 3033, in __call__
    filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/function.py", line 1957, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/function.py", line 596, in call
    ctx=ctx)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  Input to reshape is a tensor with 5 values, but the requested shape has 1145896960
	 [[node gradient_tape/mean_squared_error/Reshape (defined at usr/lib/python3.6/threading.py:916) ]]
	 [[SGD/AddN_1/_52]]
  (1) Invalid argument:  Input to reshape is a tensor with 5 values, but the requested shape has 1145896960
	 [[node gradient_tape/mean_squared_error/Reshape (defined at usr/lib/python3.6/threading.py:916) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_195971]

Function call stack:
train_function -> train_function

======================================================================
ERROR: test_save_strategy_restore_strategy_test_distributionforrestoring_Mirrored1GPU_distributionforsaving_OneDeviceGPU_mode_eager_modelandinput_SimpleSubclassModel_saveinscope_True (__main__.SavedModelSaveAndLoadTest)
test_save_strategy_restore_strategy_test_distributionforrestoring_Mirrored1GPU_distributionforsaving_OneDeviceGPU_mode_eager_modelandinput_SimpleSubclassModel_saveinscope_True (__main__.SavedModelSaveAndLoadTest)
test_save_strategy_restore_strategy_test_distributionforrestoring_Mirrored1GPU_distributionforsaving_OneDeviceGPU_mode_eager_modelandinput_SimpleSubclassModel_saveinscope_True(distribution_for_restoring=Mirrored1GPU, distribution_for_saving=OneDeviceGPU, mode='eager', model_and_input=SimpleSubclassModel, save_in_scope=True)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/testing_utils.py", line 997, in decorated
    f(self, *args, **kwargs)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/absl_py/absl/testing/parameterized.py", line 263, in bound_param_test
    test_method(self, **testcase_params)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/framework/test_combinations.py", line 366, in decorated
    execute_test_method()
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/framework/test_combinations.py", line 349, in execute_test_method
    test_method(**kwargs_to_pass)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/distribute/combinations.py", line 517, in decorator
    test_method(self, **kwargs)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/distribute/saved_model_mixed_api_test.py", line 78, in test_save_strategy_restore_strategy
    save_in_scope)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/distribute/saved_model_test_base.py", line 248, in run_test_save_strategy_restore_strategy
    self._train_model(model, x_train, y_train, batch_size)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/distribute/saved_model_test_base.py", line 170, in _train_model
    model.fit(x=training_dataset, epochs=1, steps_per_epoch=100)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/engine/training.py", line 1188, in fit
    tmp_logs = self.train_function(iterator)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/def_function.py", line 885, in __call__
    result = self._call(*args, **kwds)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/def_function.py", line 913, in _call
    return self._stateless_fn(*args, **kwds)  # pylint: disable=not-callable
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/function.py", line 3033, in __call__
    filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/function.py", line 1957, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/function.py", line 596, in call
    ctx=ctx)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError:  Input to reshape is a tensor with 10 values, but the requested shape has 0
	 [[node gradient_tape/mean_squared_error/Reshape (defined at root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/distribute/saved_model_mixed_api_test.py:78) ]] [Op:__inference_train_function_216342]

Function call stack:
train_function

======================================================================
ERROR: test_save_strategy_restore_strategy_test_distributionforrestoring_OneDeviceCPU_distributionforsaving_MirroredCPUAndGPU_mode_eager_modelandinput_SimpleSubclassModel_saveinscope_False (__main__.SavedModelSaveAndLoadTest)
test_save_strategy_restore_strategy_test_distributionforrestoring_OneDeviceCPU_distributionforsaving_MirroredCPUAndGPU_mode_eager_modelandinput_SimpleSubclassModel_saveinscope_False (__main__.SavedModelSaveAndLoadTest)
test_save_strategy_restore_strategy_test_distributionforrestoring_OneDeviceCPU_distributionforsaving_MirroredCPUAndGPU_mode_eager_modelandinput_SimpleSubclassModel_saveinscope_False(distribution_for_restoring=OneDeviceCPU, distribution_for_saving=MirroredCPUAndGPU, mode='eager', model_and_input=SimpleSubclassModel, save_in_scope=False)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/testing_utils.py", line 997, in decorated
    f(self, *args, **kwargs)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/absl_py/absl/testing/parameterized.py", line 263, in bound_param_test
    test_method(self, **testcase_params)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/framework/test_combinations.py", line 366, in decorated
    execute_test_method()
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/framework/test_combinations.py", line 349, in execute_test_method
    test_method(**kwargs_to_pass)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/distribute/combinations.py", line 517, in decorator
    test_method(self, **kwargs)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/distribute/saved_model_mixed_api_test.py", line 78, in test_save_strategy_restore_strategy
    save_in_scope)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/distribute/saved_model_test_base.py", line 248, in run_test_save_strategy_restore_strategy
    self._train_model(model, x_train, y_train, batch_size)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/distribute/saved_model_test_base.py", line 170, in _train_model
    model.fit(x=training_dataset, epochs=1, steps_per_epoch=100)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/keras/engine/training.py", line 1188, in fit
    tmp_logs = self.train_function(iterator)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/def_function.py", line 885, in __call__
    result = self._call(*args, **kwds)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/def_function.py", line 913, in _call
    return self._stateless_fn(*args, **kwds)  # pylint: disable=not-callable
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/function.py", line 3033, in __call__
    filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/function.py", line 1957, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/function.py", line 596, in call
    ctx=ctx)
  File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/distribute/saved_model_mixed_api_test_gpu.runfiles/org_tensorflow/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  Input to reshape is a tensor with 5 values, but the requested shape has 1120437399847525866
	 [[node gradient_tape/mean_squared_error/Reshape (defined at usr/lib/python3.6/threading.py:916) ]]
  (1) Invalid argument:  Input to reshape is a tensor with 5 values, but the requested shape has 1120437399847525866
	 [[node gradient_tape/mean_squared_error/Reshape (defined at usr/lib/python3.6/threading.py:916) ]]
	 [[SGD/AddN_1/_52]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_300901]

Function call stack:
train_function -> train_function

----------------------------------------------------------------------
Ran 691 tests in 683.294s

FAILED (errors=3, skipped=421)
```
@deven-amd deven-amd force-pushed the google_upstream_rocm_fixes_210414 branch from bc79e3b to ffb210f Compare May 14, 2021 20:27
@@ -31,7 +31,7 @@ TEST_F(GemmTest, SimpleCase1) {
%arg2: memref<2x2xf32> {lmhlo.output_index = dense<[0]> : tensor<1xindex>}) attributes {
result_xla_shape = "(f32[4]) "
} {
"lmhlo_gpu.gemm"(%arg0, %arg1, %arg2) {algorithm = 7 : i64, alpha_imag = 0.000000e+00 : f64, alpha_real = 1.000000e+00 : f64, batch_size = 1 : i64, dot_dimension_numbers = {lhs_batching_dimensions = dense<> : tensor<0xi64>, lhs_contracting_dimensions = dense<1> : tensor<1xi64>, rhs_batching_dimensions = dense<> : tensor<0xi64>, rhs_contracting_dimensions = dense<0> : tensor<1xi64>}} : (memref<2x2xf32>, memref<2x2xf32>, memref<2x2xf32>) -> ()
"lmhlo_gpu.gemm"(%arg0, %arg1, %arg2) {alpha_imag = 0.000000e+00 : f64, alpha_real = 1.000000e+00 : f64, batch_size = 1 : i64, dot_dimension_numbers = {lhs_batching_dimensions = dense<> : tensor<0xi64>, lhs_contracting_dimensions = dense<1> : tensor<1xi64>, rhs_batching_dimensions = dense<> : tensor<0xi64>, rhs_contracting_dimensions = dense<0> : tensor<1xi64>}} : (memref<2x2xf32>, memref<2x2xf32>, memref<2x2xf32>) -> ()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this diff be here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes...thought you said

Fixing unit test indeed seems preferable, since it's wrong to select an algorithm for the ROCm platform.

and hence the change above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cheshire gentle ping

@sanjoy sanjoy requested review from cheshire and removed request for sanjoy May 18, 2021 05:35
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels May 19, 2021
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label May 19, 2021
@gbaned gbaned removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label May 20, 2021
@copybara-service copybara-service bot merged commit be65d1d into tensorflow:master May 20, 2021
PR Queue automation moved this from Reviewer Requested Changes to Merged May 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes comp:gpu GPU related issues ready to pull PR ready for merge process size:S CL Change Size: Small
Projects
PR Queue
  
Merged
Development

Successfully merging this pull request may close these issues.

None yet

5 participants