Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROCm] Skipping subtests that check support for float64 type in the NN ops #30500

Conversation

deven-amd
Copy link
Contributor

ROCm platform currently does not support the float64/double type in the NN ops

This commit skips subtests (within python unit-tests) that test this functionality. The "skip" is guarded by the call to "is_built_with_rocm()", and hence these unit-tests will not be affected in any way when running with TF which was not built with ROCm support (i.e. --config=rocm)


@tatianashp @whchung @chsigg

@tensorflow-bot tensorflow-bot bot added the size:M CL Change Size: Medium label Jul 8, 2019
Copy link
Contributor

@whchung whchung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deven-amd minor cosmetic changes are required.

@deven-amd deven-amd force-pushed the google_upstream_skip_double_dtyp_subtests branch from c8ee4e7 to d315f3b Compare July 8, 2019 18:21
whchung
whchung previously approved these changes Jul 8, 2019
@tensorflow-bot tensorflow-bot bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Jul 8, 2019
@whchung whchung requested a review from chsigg July 8, 2019 18:27
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Jul 8, 2019
@deven-amd deven-amd force-pushed the google_upstream_skip_double_dtyp_subtests branch from d315f3b to 7276d27 Compare July 8, 2019 18:30
@rthadur rthadur self-assigned this Jul 9, 2019
@rthadur rthadur added this to Assigned Reviewer in PR Queue via automation Jul 9, 2019
@rthadur rthadur added the comp:gpu GPU related issues label Jul 9, 2019
chsigg
chsigg previously approved these changes Jul 9, 2019
PR Queue automation moved this from Assigned Reviewer to Approved by Reviewer Jul 9, 2019
@tensorflow-bot tensorflow-bot bot added the kokoro:force-run Tests on submitted change label Jul 9, 2019
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Jul 9, 2019
@deven-amd
Copy link
Contributor Author

I looked at the CI failure logs, and they do not seem related to the changes in this PR.

The one failure in Linux GPU is in a test that was modified by this PR. However the changes in this PR should be a no-op for the CUDA / Linux GPU run, and hence it does not seem likely that the change in this PR is the cause for the failure.

@rthadur
Copy link
Contributor

rthadur commented Jul 10, 2019

@deven-amd here is the internal error :
InternalError: 2 root error(s) found. (0) Internal: cuDNN Backward Data function launch failure : input shape([1,5,8,7,2]) filter shape([1,2,3,2,3]) [[node gradients_5/conv_5_grad/Conv3DBackpropInputV2 (defined at /third_party/py/absl/third_party/unittest3_backport/case.py:162) ]] (1) Internal: cuDNN Backward Data function launch failure : input shape([1,5,8,7,2]) filter shape([1,2,3,2,3]) [[node gradients_5/conv_5_grad/Conv3DBackpropInputV2 (defined at /third_party/py/absl/third_party/unittest3_backport/case.py:162) ]] [[gradients_5/conv_5_grad/Conv3DBackpropInputV2/_19]]

@deven-amd
Copy link
Contributor Author

@rthadur

I saw the error in the invocation log, but cannot see how the change in this PR can cause it.

For the CUDA run, the only thing that will change with this PR is the ordering the dtypes in the list, it will go from [dtypes.float64, dtypes.float32, dtypes.float16] to [dtypes.float32, dtypes.float16, dtypes.float64]. The two are functionally equivalent and should not cause any failure (I would think)

Let me push out a change that keeps the above mentioned ordering intact, and see it that fixes the error!

deven

…N ops

ROCm platform currently does not support the float64/double type in the NN ops

This commit skips subtests (within python unit-tests) that test this functionality. The "skip" is guarded by the call to "is_built_with_rocm()", and hence these unit-tests will not be affected in any way when running with TF which was not built with ROCm support (i.e. `--config=rocm`)
@deven-amd deven-amd force-pushed the google_upstream_skip_double_dtyp_subtests branch from 7276d27 to ca79b8d Compare July 10, 2019 19:48
PR Queue automation moved this from Approved by Reviewer to Reviewer Requested Changes Jul 10, 2019
@deven-amd
Copy link
Contributor Author

@whchung , please re-approve this to kick-off the CI runs.
I need to figure out whether or not my last change fixes the failure in the LInux GPU CI run.

Thanks
deven

@whchung whchung added the kokoro:force-run Tests on submitted change label Jul 11, 2019
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Jul 11, 2019
@deven-amd
Copy link
Contributor Author

@rthadur , the CI errors are gone :)

@rthadur rthadur requested review from chsigg and whchung July 11, 2019 21:29
@rthadur rthadur removed the ready to pull PR ready for merge process label Jul 11, 2019
@rthadur
Copy link
Contributor

rthadur commented Jul 11, 2019

@deven-amd thank you , @chsigg can you please review this again.

@tensorflow-bot tensorflow-bot bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Jul 11, 2019
PR Queue automation moved this from Reviewer Requested Changes to Approved by Reviewer Jul 11, 2019
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Jul 11, 2019
@rthadur rthadur removed the ready to pull PR ready for merge process label Jul 11, 2019
@tensorflow-bot tensorflow-bot bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Jul 16, 2019
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Jul 16, 2019
@tensorflow-copybara tensorflow-copybara merged commit ca79b8d into tensorflow:master Jul 18, 2019
PR Queue automation moved this from Approved by Reviewer to Merged Jul 18, 2019
tensorflow-copybara pushed a commit that referenced this pull request Jul 18, 2019
…kip_double_dtyp_subtests

PiperOrigin-RevId: 258814196
@deven-amd deven-amd deleted the google_upstream_skip_double_dtyp_subtests branch August 2, 2019 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes comp:gpu GPU related issues ready to pull PR ready for merge process size:M CL Change Size: Medium
Projects
PR Queue
  
Merged
Development

Successfully merging this pull request may close these issues.

None yet

7 participants