-
Notifications
You must be signed in to change notification settings - Fork 74.8k
[ROCm] Update to use ROCm 3.9 (when building TF with --config=rocm) #44471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ROCm] Update to use ROCm 3.9 (when building TF with --config=rocm) #44471
Conversation
…ocblas version header file (in ROCm 3.8)
Now that we are way past the switch to use ROCm 3.5 and above (i.e. hip-clang), the codes within `#ifdef TENSORFLOW_COMPILER_IS_HIP_CLANG` are always enabled, and the codes within the corresponding `#else` blocks are deadcodes. This commit removes the references to `#ifdef TENSORFLOW_COMPILER_IS_HIP_CLANG` and their corresponding `#else` blocks
@deven-amd Can you please resolve conflicts? Thanks! |
7481043
to
33bbfa9
Compare
@gbaned done |
I see the cause of the errors in |
… in ROCm 3.9 The location of the ROCm device lib files is changing in ROCm 3.9 Current (ROCm 3.8 and before) location is $ROCM_PATH/lib ``` root@ixt-rack-04:/opt/rocm-3.8.0# find . -name *.bc ./lib/oclc_isa_version_701.amdgcn.bc ./lib/ocml.amdgcn.bc ./lib/oclc_daz_opt_on.amdgcn.bc ./lib/oclc_isa_version_700.amdgcn.bc ./lib/oclc_isa_version_810.amdgcn.bc ./lib/oclc_unsafe_math_off.amdgcn.bc ./lib/oclc_wavefrontsize64_off.amdgcn.bc ./lib/oclc_isa_version_803.amdgcn.bc ./lib/oclc_isa_version_1011.amdgcn.bc ./lib/oclc_isa_version_1012.amdgcn.bc ./lib/opencl.amdgcn.bc ./lib/oclc_unsafe_math_on.amdgcn.bc ./lib/oclc_isa_version_1010.amdgcn.bc ./lib/oclc_finite_only_off.amdgcn.bc ./lib/oclc_correctly_rounded_sqrt_on.amdgcn.bc ./lib/oclc_daz_opt_off.amdgcn.bc ./lib/oclc_isa_version_802.amdgcn.bc ./lib/ockl.amdgcn.bc ./lib/oclc_isa_version_906.amdgcn.bc ./lib/oclc_isa_version_1030.amdgcn.bc ./lib/oclc_correctly_rounded_sqrt_off.amdgcn.bc ./lib/hip.amdgcn.bc ./lib/oclc_isa_version_908.amdgcn.bc ./lib/oclc_isa_version_900.amdgcn.bc ./lib/oclc_isa_version_702.amdgcn.bc ./lib/oclc_wavefrontsize64_on.amdgcn.bc ./lib/hc.amdgcn.bc ./lib/oclc_isa_version_902.amdgcn.bc ./lib/oclc_isa_version_801.amdgcn.bc ./lib/oclc_finite_only_on.amdgcn.bc ./lib/oclc_isa_version_904.amdgcn.bc ``` New (ROCm 3.9 and above) location is $ROCM_PATH/amdgcn/bitcode ``` root@ixt-hq-99:/opt/rocm-3.9.0-3703# find -name *.bc ./amdgcn/bitcode/oclc_isa_version_700.bc ./amdgcn/bitcode/ocml.bc ./amdgcn/bitcode/oclc_isa_version_1030.bc ./amdgcn/bitcode/oclc_isa_version_1010.bc ./amdgcn/bitcode/oclc_isa_version_904.bc ./amdgcn/bitcode/hip.bc ./amdgcn/bitcode/hc.bc ./amdgcn/bitcode/oclc_daz_opt_off.bc ./amdgcn/bitcode/oclc_wavefrontsize64_off.bc ./amdgcn/bitcode/oclc_wavefrontsize64_on.bc ./amdgcn/bitcode/oclc_isa_version_900.bc ./amdgcn/bitcode/oclc_isa_version_1012.bc ./amdgcn/bitcode/oclc_isa_version_702.bc ./amdgcn/bitcode/oclc_daz_opt_on.bc ./amdgcn/bitcode/oclc_unsafe_math_off.bc ./amdgcn/bitcode/ockl.bc ./amdgcn/bitcode/oclc_isa_version_803.bc ./amdgcn/bitcode/oclc_isa_version_908.bc ./amdgcn/bitcode/oclc_isa_version_802.bc ./amdgcn/bitcode/oclc_correctly_rounded_sqrt_off.bc ./amdgcn/bitcode/oclc_finite_only_on.bc ./amdgcn/bitcode/oclc_isa_version_701.bc ./amdgcn/bitcode/oclc_unsafe_math_on.bc ./amdgcn/bitcode/oclc_isa_version_902.bc ./amdgcn/bitcode/oclc_finite_only_off.bc ./amdgcn/bitcode/opencl.bc ./amdgcn/bitcode/oclc_isa_version_906.bc ./amdgcn/bitcode/oclc_isa_version_810.bc ./amdgcn/bitcode/oclc_isa_version_801.bc ./amdgcn/bitcode/oclc_correctly_rounded_sqrt_on.bc ./amdgcn/bitcode/oclc_isa_version_1011.bc ``` Also not the change in the filename(s) This commit updates the XLA code, that has the device lib path + filename(s) hardcoded, to account for the change in location / filename
Prior to ROCm 3.8, hipcc (hipclang) flushed denormal values to zero by default. Starting with ROCm 3.8 that is no longer true, denormal values are kept as is. TF expects denormals to be flushed to zero. This is enforced on the CUDA side by explicitly passing the "-fcuda-flush-denormals-to-zero" (see tensorflow.bzl). This commit does the same for the ROCm side. Also removing the no_rocm tag from the corresponding unit test - //tensorflow/python/kernel_tests:denormal_test_gpu
…uous) When building TF with ROCm 3.9, we are running into the following compile error ``` In file included from tensorflow/core/kernels/reduction_ops_half_mean_sum.cu.cc:20: ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:430:9: error: call to 'min' is ambiguous min(blockDim.y, num_rows - blockIdx.y * blockDim.y); ^~~ /opt/rocm-3.9.0-3805/llvm/lib/clang/12.0.0/include/__clang_hip_math.h:1183:23: note: candidate function __DEVICE__ inline int min(int __arg1, int __arg2) { ^ /opt/rocm-3.9.0-3805/llvm/lib/clang/12.0.0/include/__clang_hip_math.h:1197:14: note: candidate function inline float min(float __x, float __y) { return fminf(__x, __y); } ^ /opt/rocm-3.9.0-3805/llvm/lib/clang/12.0.0/include/__clang_hip_math.h:1200:15: note: candidate function inline double min(double __x, double __y) { return fmin(__x, __y); } ^ 1 error generated when compiling for gfx803. ``` The build error seems to be because ROCm 3.9 uses llvm header files from `llvm/lib/clang/12.0.0/include` (ROCm 3.8 uses the `11.0.0` version). `12.0.0` has a new `__clang_hip_math.h` file, which is not present in `11.0.0`. This file has the `min` function overloaded for the `float` and `double` types. The first argument in the call to `min` (which leads to the error) is `blockDim.y` which has a `uint` type, and hence the compiler gets confused as to which overloaded type to resole to. Previously (i.e. ROCm 3.8 and before) there was only one option (`int`), with ROCm 3.9 there are three (`int`, `float`, and `double`) and hence the error. The "fix" is to explicitly cast the first argument to `int` to remove the ambiguity (the second argument is already an `int` type).
33bbfa9
to
92a2f6d
Compare
I have actually already rebased the earlier version of your PR and fixed a few issues, hopefully it will be merged soon. |
…fig=rocm) Imported from GitHub PR #44471 PR #43636 is a pre-requisite for this PR. For the time being, this PR includes commits from it's pre-req as well. Once the pre-req PR is merged, I will rebase this PR to remove those commits. -------------------------------------- /cc @cheshire @chsigg @nvining-work Copybara import of the project: -- 3f0d378 by Deven Desai <deven.desai.amd@gmail.com>: Adding #defines for ROCm / MIOpen / HIP Runtime version numbers This PR/commit introduces the following #defines in the `rocm/rocm_config.h` file ``` #define TF_ROCM_VERSION <Version Number of ROCm install> #define TF_MIOPEN_VERSION <Verion Number of MIOpen in ROCm install> #define TF_HIPRUNTIME_VERSION <Version Number of HIP Runtinme in ROCm install> ``` These #defines should be used within TF code to add ROCm/MIOpen/HIp Runtime version specific code. Details on how we go about determining these version numbers can found on the following wiki-page https://github.com/ROCmSoftwarePlatform/tensorflow-internal/wiki/How-to-add-ROCm-version-specific-code-changes-in-the-TensorFlow-code%3F A new script `find_rocm_config.py` is being added by this commit. This script does all the work of determining the version number information and it is pretty to extend it to query more information about the ROCM install. The information collected by the script is available to `rocm_configure.bzl` and hence can be used to add version specific code in `rocm_configure.bzl` as well. -- 922e0e5 by Deven Desai <deven.desai.amd@gmail.com>: Updating Dockerfile.rocm to use ROCm 3.9 -- cc0b4ae by Deven Desai <deven.desai.amd@gmail.com>: Changing CI scripts to use ROCm 3.9 -- fbfdb64 by Deven Desai <deven.desai.amd@gmail.com>: Updating rocm_config.py to account for the new location of the rocblas version header file (in ROCm 3.8) -- 3f191fa by Deven Desai <deven.desai.amd@gmail.com>: Removing references to TENSORFLOW_COMPILER_IS_HIP_CLANG Now that we are way past the switch to use ROCm 3.5 and above (i.e. hip-clang), the codes within `#ifdef TENSORFLOW_COMPILER_IS_HIP_CLANG` are always enabled, and the codes within the corresponding `#else` blocks are deadcodes. This commit removes the references to `#ifdef TENSORFLOW_COMPILER_IS_HIP_CLANG` and their corresponding `#else` blocks -- 9a4841c by Deven Desai <deven.desai.amd@gmail.com>: Removing -DTENSORFLOW_COMPILER_IS_HIP_CLANG from the list of compile flags -- 745e2ad by Deven Desai <deven.desai.amd@gmail.com>: Removing deadcode for the ROCm platform within the third_party/gpus dir -- c96dc03 by Deven Desai <deven.desai.amd@gmail.com>: Updating XLA code to account for the device lib files location change in ROCm 3.9 The location of the ROCm device lib files is changing in ROCm 3.9 Current (ROCm 3.8 and before) location is $ROCM_PATH/lib ``` root@ixt-rack-04:/opt/rocm-3.8.0# find . -name *.bc ./lib/oclc_isa_version_701.amdgcn.bc ./lib/ocml.amdgcn.bc ./lib/oclc_daz_opt_on.amdgcn.bc ./lib/oclc_isa_version_700.amdgcn.bc ./lib/oclc_isa_version_810.amdgcn.bc ./lib/oclc_unsafe_math_off.amdgcn.bc ./lib/oclc_wavefrontsize64_off.amdgcn.bc ./lib/oclc_isa_version_803.amdgcn.bc ./lib/oclc_isa_version_1011.amdgcn.bc ./lib/oclc_isa_version_1012.amdgcn.bc ./lib/opencl.amdgcn.bc ./lib/oclc_unsafe_math_on.amdgcn.bc ./lib/oclc_isa_version_1010.amdgcn.bc ./lib/oclc_finite_only_off.amdgcn.bc ./lib/oclc_correctly_rounded_sqrt_on.amdgcn.bc ./lib/oclc_daz_opt_off.amdgcn.bc ./lib/oclc_isa_version_802.amdgcn.bc ./lib/ockl.amdgcn.bc ./lib/oclc_isa_version_906.amdgcn.bc ./lib/oclc_isa_version_1030.amdgcn.bc ./lib/oclc_correctly_rounded_sqrt_off.amdgcn.bc ./lib/hip.amdgcn.bc ./lib/oclc_isa_version_908.amdgcn.bc ./lib/oclc_isa_version_900.amdgcn.bc ./lib/oclc_isa_version_702.amdgcn.bc ./lib/oclc_wavefrontsize64_on.amdgcn.bc ./lib/hc.amdgcn.bc ./lib/oclc_isa_version_902.amdgcn.bc ./lib/oclc_isa_version_801.amdgcn.bc ./lib/oclc_finite_only_on.amdgcn.bc ./lib/oclc_isa_version_904.amdgcn.bc ``` New (ROCm 3.9 and above) location is $ROCM_PATH/amdgcn/bitcode ``` root@ixt-hq-99:/opt/rocm-3.9.0-3703# find -name *.bc ./amdgcn/bitcode/oclc_isa_version_700.bc ./amdgcn/bitcode/ocml.bc ./amdgcn/bitcode/oclc_isa_version_1030.bc ./amdgcn/bitcode/oclc_isa_version_1010.bc ./amdgcn/bitcode/oclc_isa_version_904.bc ./amdgcn/bitcode/hip.bc ./amdgcn/bitcode/hc.bc ./amdgcn/bitcode/oclc_daz_opt_off.bc ./amdgcn/bitcode/oclc_wavefrontsize64_off.bc ./amdgcn/bitcode/oclc_wavefrontsize64_on.bc ./amdgcn/bitcode/oclc_isa_version_900.bc ./amdgcn/bitcode/oclc_isa_version_1012.bc ./amdgcn/bitcode/oclc_isa_version_702.bc ./amdgcn/bitcode/oclc_daz_opt_on.bc ./amdgcn/bitcode/oclc_unsafe_math_off.bc ./amdgcn/bitcode/ockl.bc ./amdgcn/bitcode/oclc_isa_version_803.bc ./amdgcn/bitcode/oclc_isa_version_908.bc ./amdgcn/bitcode/oclc_isa_version_802.bc ./amdgcn/bitcode/oclc_correctly_rounded_sqrt_off.bc ./amdgcn/bitcode/oclc_finite_only_on.bc ./amdgcn/bitcode/oclc_isa_version_701.bc ./amdgcn/bitcode/oclc_unsafe_math_on.bc ./amdgcn/bitcode/oclc_isa_version_902.bc ./amdgcn/bitcode/oclc_finite_only_off.bc ./amdgcn/bitcode/opencl.bc ./amdgcn/bitcode/oclc_isa_version_906.bc ./amdgcn/bitcode/oclc_isa_version_810.bc ./amdgcn/bitcode/oclc_isa_version_801.bc ./amdgcn/bitcode/oclc_correctly_rounded_sqrt_on.bc ./amdgcn/bitcode/oclc_isa_version_1011.bc ``` Also not the change in the filename(s) This commit updates the XLA code, that has the device lib path + filename(s) hardcoded, to account for the change in location / filename -- 6f981a9 by Deven Desai <deven.desai.amd@gmail.com>: Adding "-fcuda-flush-denormals-to-zero" as a default hipcc option Prior to ROCm 3.8, hipcc (hipclang) flushed denormal values to zero by default. Starting with ROCm 3.8 that is no longer true, denormal values are kept as is. TF expects denormals to be flushed to zero. This is enforced on the CUDA side by explicitly passing the "-fcuda-flush-denormals-to-zero" (see tensorflow.bzl). This commit does the same for the ROCm side. Also removing the no_rocm tag from the corresponding unit test - //tensorflow/python/kernel_tests:denormal_test_gpu -- 7481043 by Deven Desai <deven.desai.amd@gmail.com>: Fix for TF build failure with ROCm 3.9 (error: call to 'min' is ambiguous) When building TF with ROCm 3.9, we are running into the following compile error ``` In file included from tensorflow/core/kernels/reduction_ops_half_mean_sum.cu.cc:20: ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:430:9: error: call to 'min' is ambiguous min(blockDim.y, num_rows - blockIdx.y * blockDim.y); ^~~ /opt/rocm-3.9.0-3805/llvm/lib/clang/12.0.0/include/__clang_hip_math.h:1183:23: note: candidate function __DEVICE__ inline int min(int __arg1, int __arg2) { ^ /opt/rocm-3.9.0-3805/llvm/lib/clang/12.0.0/include/__clang_hip_math.h:1197:14: note: candidate function inline float min(float __x, float __y) { return fminf(__x, __y); } ^ /opt/rocm-3.9.0-3805/llvm/lib/clang/12.0.0/include/__clang_hip_math.h:1200:15: note: candidate function inline double min(double __x, double __y) { return fmin(__x, __y); } ^ 1 error generated when compiling for gfx803. ``` The build error seems to be because ROCm 3.9 uses llvm header files from `llvm/lib/clang/12.0.0/include` (ROCm 3.8 uses the `11.0.0` version). `12.0.0` has a new `__clang_hip_math.h` file, which is not present in `11.0.0`. This file has the `min` function overloaded for the `float` and `double` types. The first argument in the call to `min` (which leads to the error) is `blockDim.y` which has a `uint` type, and hence the compiler gets confused as to which overloaded type to resole to. Previously (i.e. ROCm 3.8 and before) there was only one option (`int`), with ROCm 3.9 there are three (`int`, `float`, and `double`) and hence the error. The "fix" is to explicitly cast the first argument to `int` to remove the ambiguity (the second argument is already an `int` type). COPYBARA_INTEGRATE_REVIEW=#44471 from ROCmSoftwarePlatform:google_upstream_rocm_switch_to_rocm39 7481043 PiperOrigin-RevId: 341569721 Change-Id: Ia614893881bf8db1ef8901034c35cc585a82dba8
This PR does seem to have gotten merged...thank you for rebasing it and seeing it through. The merged PR seems to have dropped one of the commits ( fbfdb64 ) even though it is mentioned in PR message here (312e6ba). Do you want me to close out this PR and file another one to pick up the missing change? thanks |
Please create another one. I am not sure why this was not included, but in any case I was just happy that I could get this merged at all. It should be easier now to apply smaller fixes on top of that. |
…n merging PR tensorflow#44471 ( tensorflow#44471 ) See the following comments (in PR # 44471 for further details) tensorflow#44471 (comment) tensorflow#44471 (comment)
PR #43636 is a pre-requisite for this PR.For the time being, this PR includes commits from it's pre-req as well. Once the pre-req PR is merged, I will rebase this PR to remove those commits.update : PR #43636 has been merged, and I have rebased this PR to pull it in
/cc @cheshire @chsigg @nvining-work