[ROCm] Update to use ROCm 3.9 (when building TF with --config=rocm) #44471

deven-amd · 2020-10-30T21:47:39Z

~~PR #43636 is a pre-requisite for this PR.~~

~~For the time being, this PR includes commits from it's pre-req as well. Once the pre-req PR is merged, I will rebase this PR to remove those commits.~~

update : PR #43636 has been merged, and I have rebased this PR to pull it in

/cc @cheshire @chsigg @nvining-work

…ocblas version header file (in ROCm 3.8)

Now that we are way past the switch to use ROCm 3.5 and above (i.e. hip-clang), the codes within `#ifdef TENSORFLOW_COMPILER_IS_HIP_CLANG` are always enabled, and the codes within the corresponding `#else` blocks are deadcodes. This commit removes the references to `#ifdef TENSORFLOW_COMPILER_IS_HIP_CLANG` and their corresponding `#else` blocks

…flags

gbaned · 2020-11-09T13:36:53Z

@deven-amd Can you please resolve conflicts? Thanks!

deven-amd · 2020-11-09T14:38:01Z

@gbaned done

deven-amd · 2020-11-09T15:21:09Z

I see the cause of the errors in Linux GPU and Ubuntu CPU ... will push out a fix soon

… in ROCm 3.9 The location of the ROCm device lib files is changing in ROCm 3.9 Current (ROCm 3.8 and before) location is $ROCM_PATH/lib ``` root@ixt-rack-04:/opt/rocm-3.8.0# find . -name *.bc ./lib/oclc_isa_version_701.amdgcn.bc ./lib/ocml.amdgcn.bc ./lib/oclc_daz_opt_on.amdgcn.bc ./lib/oclc_isa_version_700.amdgcn.bc ./lib/oclc_isa_version_810.amdgcn.bc ./lib/oclc_unsafe_math_off.amdgcn.bc ./lib/oclc_wavefrontsize64_off.amdgcn.bc ./lib/oclc_isa_version_803.amdgcn.bc ./lib/oclc_isa_version_1011.amdgcn.bc ./lib/oclc_isa_version_1012.amdgcn.bc ./lib/opencl.amdgcn.bc ./lib/oclc_unsafe_math_on.amdgcn.bc ./lib/oclc_isa_version_1010.amdgcn.bc ./lib/oclc_finite_only_off.amdgcn.bc ./lib/oclc_correctly_rounded_sqrt_on.amdgcn.bc ./lib/oclc_daz_opt_off.amdgcn.bc ./lib/oclc_isa_version_802.amdgcn.bc ./lib/ockl.amdgcn.bc ./lib/oclc_isa_version_906.amdgcn.bc ./lib/oclc_isa_version_1030.amdgcn.bc ./lib/oclc_correctly_rounded_sqrt_off.amdgcn.bc ./lib/hip.amdgcn.bc ./lib/oclc_isa_version_908.amdgcn.bc ./lib/oclc_isa_version_900.amdgcn.bc ./lib/oclc_isa_version_702.amdgcn.bc ./lib/oclc_wavefrontsize64_on.amdgcn.bc ./lib/hc.amdgcn.bc ./lib/oclc_isa_version_902.amdgcn.bc ./lib/oclc_isa_version_801.amdgcn.bc ./lib/oclc_finite_only_on.amdgcn.bc ./lib/oclc_isa_version_904.amdgcn.bc ``` New (ROCm 3.9 and above) location is $ROCM_PATH/amdgcn/bitcode ``` root@ixt-hq-99:/opt/rocm-3.9.0-3703# find -name *.bc ./amdgcn/bitcode/oclc_isa_version_700.bc ./amdgcn/bitcode/ocml.bc ./amdgcn/bitcode/oclc_isa_version_1030.bc ./amdgcn/bitcode/oclc_isa_version_1010.bc ./amdgcn/bitcode/oclc_isa_version_904.bc ./amdgcn/bitcode/hip.bc ./amdgcn/bitcode/hc.bc ./amdgcn/bitcode/oclc_daz_opt_off.bc ./amdgcn/bitcode/oclc_wavefrontsize64_off.bc ./amdgcn/bitcode/oclc_wavefrontsize64_on.bc ./amdgcn/bitcode/oclc_isa_version_900.bc ./amdgcn/bitcode/oclc_isa_version_1012.bc ./amdgcn/bitcode/oclc_isa_version_702.bc ./amdgcn/bitcode/oclc_daz_opt_on.bc ./amdgcn/bitcode/oclc_unsafe_math_off.bc ./amdgcn/bitcode/ockl.bc ./amdgcn/bitcode/oclc_isa_version_803.bc ./amdgcn/bitcode/oclc_isa_version_908.bc ./amdgcn/bitcode/oclc_isa_version_802.bc ./amdgcn/bitcode/oclc_correctly_rounded_sqrt_off.bc ./amdgcn/bitcode/oclc_finite_only_on.bc ./amdgcn/bitcode/oclc_isa_version_701.bc ./amdgcn/bitcode/oclc_unsafe_math_on.bc ./amdgcn/bitcode/oclc_isa_version_902.bc ./amdgcn/bitcode/oclc_finite_only_off.bc ./amdgcn/bitcode/opencl.bc ./amdgcn/bitcode/oclc_isa_version_906.bc ./amdgcn/bitcode/oclc_isa_version_810.bc ./amdgcn/bitcode/oclc_isa_version_801.bc ./amdgcn/bitcode/oclc_correctly_rounded_sqrt_on.bc ./amdgcn/bitcode/oclc_isa_version_1011.bc ``` Also not the change in the filename(s) This commit updates the XLA code, that has the device lib path + filename(s) hardcoded, to account for the change in location / filename

Prior to ROCm 3.8, hipcc (hipclang) flushed denormal values to zero by default. Starting with ROCm 3.8 that is no longer true, denormal values are kept as is. TF expects denormals to be flushed to zero. This is enforced on the CUDA side by explicitly passing the "-fcuda-flush-denormals-to-zero" (see tensorflow.bzl). This commit does the same for the ROCm side. Also removing the no_rocm tag from the corresponding unit test - //tensorflow/python/kernel_tests:denormal_test_gpu

…uous) When building TF with ROCm 3.9, we are running into the following compile error ``` In file included from tensorflow/core/kernels/reduction_ops_half_mean_sum.cu.cc:20: ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:430:9: error: call to 'min' is ambiguous min(blockDim.y, num_rows - blockIdx.y * blockDim.y); ^~~ /opt/rocm-3.9.0-3805/llvm/lib/clang/12.0.0/include/__clang_hip_math.h:1183:23: note: candidate function __DEVICE__ inline int min(int __arg1, int __arg2) { ^ /opt/rocm-3.9.0-3805/llvm/lib/clang/12.0.0/include/__clang_hip_math.h:1197:14: note: candidate function inline float min(float __x, float __y) { return fminf(__x, __y); } ^ /opt/rocm-3.9.0-3805/llvm/lib/clang/12.0.0/include/__clang_hip_math.h:1200:15: note: candidate function inline double min(double __x, double __y) { return fmin(__x, __y); } ^ 1 error generated when compiling for gfx803. ``` The build error seems to be because ROCm 3.9 uses llvm header files from `llvm/lib/clang/12.0.0/include` (ROCm 3.8 uses the `11.0.0` version). `12.0.0` has a new `__clang_hip_math.h` file, which is not present in `11.0.0`. This file has the `min` function overloaded for the `float` and `double` types. The first argument in the call to `min` (which leads to the error) is `blockDim.y` which has a `uint` type, and hence the compiler gets confused as to which overloaded type to resole to. Previously (i.e. ROCm 3.8 and before) there was only one option (`int`), with ROCm 3.9 there are three (`int`, `float`, and `double`) and hence the error. The "fix" is to explicitly cast the first argument to `int` to remove the ambiguity (the second argument is already an `int` type).

akuegel · 2020-11-10T08:23:41Z

I have actually already rebased the earlier version of your PR and fixed a few issues, hopefully it will be merged soon.

@cheshire

…fig=rocm) Imported from GitHub PR #44471 PR #43636 is a pre-requisite for this PR. For the time being, this PR includes commits from it's pre-req as well. Once the pre-req PR is merged, I will rebase this PR to remove those commits. -------------------------------------- /cc @cheshire @chsigg @nvining-work Copybara import of the project: -- 3f0d378 by Deven Desai <deven.desai.amd@gmail.com>: Adding #defines for ROCm / MIOpen / HIP Runtime version numbers This PR/commit introduces the following #defines in the `rocm/rocm_config.h` file ``` #define TF_ROCM_VERSION <Version Number of ROCm install> #define TF_MIOPEN_VERSION <Verion Number of MIOpen in ROCm install> #define TF_HIPRUNTIME_VERSION <Version Number of HIP Runtinme in ROCm install> ``` These #defines should be used within TF code to add ROCm/MIOpen/HIp Runtime version specific code. Details on how we go about determining these version numbers can found on the following wiki-page https://github.com/ROCmSoftwarePlatform/tensorflow-internal/wiki/How-to-add-ROCm-version-specific-code-changes-in-the-TensorFlow-code%3F A new script `find_rocm_config.py` is being added by this commit. This script does all the work of determining the version number information and it is pretty to extend it to query more information about the ROCM install. The information collected by the script is available to `rocm_configure.bzl` and hence can be used to add version specific code in `rocm_configure.bzl` as well. -- 922e0e5 by Deven Desai <deven.desai.amd@gmail.com>: Updating Dockerfile.rocm to use ROCm 3.9 -- cc0b4ae by Deven Desai <deven.desai.amd@gmail.com>: Changing CI scripts to use ROCm 3.9 -- fbfdb64 by Deven Desai <deven.desai.amd@gmail.com>: Updating rocm_config.py to account for the new location of the rocblas version header file (in ROCm 3.8) -- 3f191fa by Deven Desai <deven.desai.amd@gmail.com>: Removing references to TENSORFLOW_COMPILER_IS_HIP_CLANG Now that we are way past the switch to use ROCm 3.5 and above (i.e. hip-clang), the codes within `#ifdef TENSORFLOW_COMPILER_IS_HIP_CLANG` are always enabled, and the codes within the corresponding `#else` blocks are deadcodes. This commit removes the references to `#ifdef TENSORFLOW_COMPILER_IS_HIP_CLANG` and their corresponding `#else` blocks -- 9a4841c by Deven Desai <deven.desai.amd@gmail.com>: Removing -DTENSORFLOW_COMPILER_IS_HIP_CLANG from the list of compile flags -- 745e2ad by Deven Desai <deven.desai.amd@gmail.com>: Removing deadcode for the ROCm platform within the third_party/gpus dir -- c96dc03 by Deven Desai <deven.desai.amd@gmail.com>: Updating XLA code to account for the device lib files location change in ROCm 3.9 The location of the ROCm device lib files is changing in ROCm 3.9 Current (ROCm 3.8 and before) location is $ROCM_PATH/lib ``` root@ixt-rack-04:/opt/rocm-3.8.0# find . -name *.bc ./lib/oclc_isa_version_701.amdgcn.bc ./lib/ocml.amdgcn.bc ./lib/oclc_daz_opt_on.amdgcn.bc ./lib/oclc_isa_version_700.amdgcn.bc ./lib/oclc_isa_version_810.amdgcn.bc ./lib/oclc_unsafe_math_off.amdgcn.bc ./lib/oclc_wavefrontsize64_off.amdgcn.bc ./lib/oclc_isa_version_803.amdgcn.bc ./lib/oclc_isa_version_1011.amdgcn.bc ./lib/oclc_isa_version_1012.amdgcn.bc ./lib/opencl.amdgcn.bc ./lib/oclc_unsafe_math_on.amdgcn.bc ./lib/oclc_isa_version_1010.amdgcn.bc ./lib/oclc_finite_only_off.amdgcn.bc ./lib/oclc_correctly_rounded_sqrt_on.amdgcn.bc ./lib/oclc_daz_opt_off.amdgcn.bc ./lib/oclc_isa_version_802.amdgcn.bc ./lib/ockl.amdgcn.bc ./lib/oclc_isa_version_906.amdgcn.bc ./lib/oclc_isa_version_1030.amdgcn.bc ./lib/oclc_correctly_rounded_sqrt_off.amdgcn.bc ./lib/hip.amdgcn.bc ./lib/oclc_isa_version_908.amdgcn.bc ./lib/oclc_isa_version_900.amdgcn.bc ./lib/oclc_isa_version_702.amdgcn.bc ./lib/oclc_wavefrontsize64_on.amdgcn.bc ./lib/hc.amdgcn.bc ./lib/oclc_isa_version_902.amdgcn.bc ./lib/oclc_isa_version_801.amdgcn.bc ./lib/oclc_finite_only_on.amdgcn.bc ./lib/oclc_isa_version_904.amdgcn.bc ``` New (ROCm 3.9 and above) location is $ROCM_PATH/amdgcn/bitcode ``` root@ixt-hq-99:/opt/rocm-3.9.0-3703# find -name *.bc ./amdgcn/bitcode/oclc_isa_version_700.bc ./amdgcn/bitcode/ocml.bc ./amdgcn/bitcode/oclc_isa_version_1030.bc ./amdgcn/bitcode/oclc_isa_version_1010.bc ./amdgcn/bitcode/oclc_isa_version_904.bc ./amdgcn/bitcode/hip.bc ./amdgcn/bitcode/hc.bc ./amdgcn/bitcode/oclc_daz_opt_off.bc ./amdgcn/bitcode/oclc_wavefrontsize64_off.bc ./amdgcn/bitcode/oclc_wavefrontsize64_on.bc ./amdgcn/bitcode/oclc_isa_version_900.bc ./amdgcn/bitcode/oclc_isa_version_1012.bc ./amdgcn/bitcode/oclc_isa_version_702.bc ./amdgcn/bitcode/oclc_daz_opt_on.bc ./amdgcn/bitcode/oclc_unsafe_math_off.bc ./amdgcn/bitcode/ockl.bc ./amdgcn/bitcode/oclc_isa_version_803.bc ./amdgcn/bitcode/oclc_isa_version_908.bc ./amdgcn/bitcode/oclc_isa_version_802.bc ./amdgcn/bitcode/oclc_correctly_rounded_sqrt_off.bc ./amdgcn/bitcode/oclc_finite_only_on.bc ./amdgcn/bitcode/oclc_isa_version_701.bc ./amdgcn/bitcode/oclc_unsafe_math_on.bc ./amdgcn/bitcode/oclc_isa_version_902.bc ./amdgcn/bitcode/oclc_finite_only_off.bc ./amdgcn/bitcode/opencl.bc ./amdgcn/bitcode/oclc_isa_version_906.bc ./amdgcn/bitcode/oclc_isa_version_810.bc ./amdgcn/bitcode/oclc_isa_version_801.bc ./amdgcn/bitcode/oclc_correctly_rounded_sqrt_on.bc ./amdgcn/bitcode/oclc_isa_version_1011.bc ``` Also not the change in the filename(s) This commit updates the XLA code, that has the device lib path + filename(s) hardcoded, to account for the change in location / filename -- 6f981a9 by Deven Desai <deven.desai.amd@gmail.com>: Adding "-fcuda-flush-denormals-to-zero" as a default hipcc option Prior to ROCm 3.8, hipcc (hipclang) flushed denormal values to zero by default. Starting with ROCm 3.8 that is no longer true, denormal values are kept as is. TF expects denormals to be flushed to zero. This is enforced on the CUDA side by explicitly passing the "-fcuda-flush-denormals-to-zero" (see tensorflow.bzl). This commit does the same for the ROCm side. Also removing the no_rocm tag from the corresponding unit test - //tensorflow/python/kernel_tests:denormal_test_gpu -- 7481043 by Deven Desai <deven.desai.amd@gmail.com>: Fix for TF build failure with ROCm 3.9 (error: call to 'min' is ambiguous) When building TF with ROCm 3.9, we are running into the following compile error ``` In file included from tensorflow/core/kernels/reduction_ops_half_mean_sum.cu.cc:20: ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:430:9: error: call to 'min' is ambiguous min(blockDim.y, num_rows - blockIdx.y * blockDim.y); ^~~ /opt/rocm-3.9.0-3805/llvm/lib/clang/12.0.0/include/__clang_hip_math.h:1183:23: note: candidate function __DEVICE__ inline int min(int __arg1, int __arg2) { ^ /opt/rocm-3.9.0-3805/llvm/lib/clang/12.0.0/include/__clang_hip_math.h:1197:14: note: candidate function inline float min(float __x, float __y) { return fminf(__x, __y); } ^ /opt/rocm-3.9.0-3805/llvm/lib/clang/12.0.0/include/__clang_hip_math.h:1200:15: note: candidate function inline double min(double __x, double __y) { return fmin(__x, __y); } ^ 1 error generated when compiling for gfx803. ``` The build error seems to be because ROCm 3.9 uses llvm header files from `llvm/lib/clang/12.0.0/include` (ROCm 3.8 uses the `11.0.0` version). `12.0.0` has a new `__clang_hip_math.h` file, which is not present in `11.0.0`. This file has the `min` function overloaded for the `float` and `double` types. The first argument in the call to `min` (which leads to the error) is `blockDim.y` which has a `uint` type, and hence the compiler gets confused as to which overloaded type to resole to. Previously (i.e. ROCm 3.8 and before) there was only one option (`int`), with ROCm 3.9 there are three (`int`, `float`, and `double`) and hence the error. The "fix" is to explicitly cast the first argument to `int` to remove the ambiguity (the second argument is already an `int` type). COPYBARA_INTEGRATE_REVIEW=#44471 from ROCmSoftwarePlatform:google_upstream_rocm_switch_to_rocm39 7481043 PiperOrigin-RevId: 341569721 Change-Id: Ia614893881bf8db1ef8901034c35cc585a82dba8

deven-amd · 2020-11-10T12:43:01Z

@akuegel

This PR does seem to have gotten merged...thank you for rebasing it and seeing it through.

The merged PR seems to have dropped one of the commits ( fbfdb64 ) even though it is mentioned in PR message here (312e6ba).

Do you want me to close out this PR and file another one to pick up the missing change?

thanks

akuegel · 2020-11-10T14:16:20Z

Please create another one. I am not sure why this was not included, but in any case I was just happy that I could get this merged at all. It should be easier now to apply smaller fixes on top of that.

…n merging PR tensorflow#44471 ( tensorflow#44471 ) See the following comments (in PR # 44471 for further details) tensorflow#44471 (comment) tensorflow#44471 (comment)

google-ml-butler bot added the size:L CL Change Size: Large label Oct 30, 2020

google-ml-butler bot requested a review from joker-eph October 30, 2020 21:47

google-cla bot added the cla: yes label Oct 30, 2020

cheshire approved these changes Oct 30, 2020

View reviewed changes

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Oct 30, 2020

kokoro-team removed the kokoro:force-run Tests on submitted change label Oct 30, 2020

gbaned self-assigned this Oct 31, 2020

gbaned added comp:gpu GPU related issues ready to pull PR ready for merge process and removed ready to pull PR ready for merge process labels Oct 31, 2020

deven-amd added 6 commits November 9, 2020 12:32

Updating Dockerfile.rocm to use ROCm 3.9

0aae91e

Changing CI scripts to use ROCm 3.9

50b55e9

Updating find_rocm_config.py to account for the new location of the r…

5ea061b

…ocblas version header file (in ROCm 3.8)

Removing -DTENSORFLOW_COMPILER_IS_HIP_CLANG from the list of compile …

9fdc6d2

…flags

Removing deadcode for the ROCm platform within the third_party/gpus dir

5738815

gbaned removed the ready to pull PR ready for merge process label Nov 9, 2020

deven-amd force-pushed the google_upstream_rocm_switch_to_rocm39 branch from 7481043 to 33bbfa9 Compare November 9, 2020 14:33

deven-amd added 3 commits November 9, 2020 15:27

deven-amd force-pushed the google_upstream_rocm_switch_to_rocm39 branch from 33bbfa9 to 92a2f6d Compare November 9, 2020 18:41

akuegel closed this Nov 10, 2020

deven-amd mentioned this pull request Nov 24, 2020

[ROCm] Updating find_rocm_config.py to … #45154

Merged

deven-amd deleted the google_upstream_rocm_switch_to_rocm39 branch January 4, 2021 14:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm] Update to use ROCm 3.9 (when building TF with --config=rocm) #44471

[ROCm] Update to use ROCm 3.9 (when building TF with --config=rocm) #44471

Uh oh!

deven-amd commented Oct 30, 2020 •

edited

Loading

Uh oh!

gbaned commented Nov 9, 2020

Uh oh!

deven-amd commented Nov 9, 2020

Uh oh!

deven-amd commented Nov 9, 2020

Uh oh!

akuegel commented Nov 10, 2020

Uh oh!

deven-amd commented Nov 10, 2020

Uh oh!

akuegel commented Nov 10, 2020

Uh oh!

Uh oh!

[ROCm] Update to use ROCm 3.9 (when building TF with --config=rocm) #44471

[ROCm] Update to use ROCm 3.9 (when building TF with --config=rocm) #44471

Uh oh!

Conversation

deven-amd commented Oct 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gbaned commented Nov 9, 2020

Uh oh!

deven-amd commented Nov 9, 2020

Uh oh!

deven-amd commented Nov 9, 2020

Uh oh!

akuegel commented Nov 10, 2020

Uh oh!

deven-amd commented Nov 10, 2020

Uh oh!

akuegel commented Nov 10, 2020

Uh oh!

Uh oh!

deven-amd commented Oct 30, 2020 •

edited

Loading