Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROCm] Fix for ROCm CSB Breakage - 200630 #40954

Conversation

deven-amd
Copy link
Contributor

The following commit (which switched G's internal CI to use ROCm 3.5) breaks the ROCm CSB build (which still uses ROCm 3.3)

22def20

This PR/commit simply puts back a couple of codes that were removed the the previous commit, and makes them conditional on ROCm 3.5.

Note that the ROCm CSB build will be switching to ROCm 3.5 or higher in the near future, at which point all codes within the true block for #if TENSORFLOW_COMPILER_IS_HIP_CLANG will become default, and those in the false / #else block will be removed.


/cc @chsigg @cheshire @nvining-work

@google-ml-butler google-ml-butler bot added the size:S CL Change Size: Small label Jun 30, 2020
@gbaned gbaned self-assigned this Jul 1, 2020
@gbaned gbaned added this to Assigned Reviewer in PR Queue via automation Jul 1, 2020
@gbaned gbaned requested a review from chsigg July 1, 2020 05:12
@gbaned gbaned added the comp:gpu GPU related issues label Jul 1, 2020
@deven-amd
Copy link
Contributor Author

@chsigg gentle ping

@gbaned gbaned added the awaiting review Pull request awaiting review label Jul 6, 2020
@deven-amd
Copy link
Contributor Author

@chsigg gentle ping

Need this PR to fix the broken ROCm CSB

@gbaned gbaned requested a review from cheshire July 8, 2020 12:09
@deven-amd
Copy link
Contributor Author

@chsigg gentle ping

uint32_t thread_per_block_uint = 0;
CHECK_GE(block_size_limit, 0);
uint32_t block_size_limit_uint = static_cast<uint32_t>(block_size_limit);
hipOccupancyMaxPotentialBlockSize(&block_count_uint, &thread_per_block_uint,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be cleaner to provide a forward-compatible overload of hipOccupancyMaxPotentialBlockSize.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just trying to restore the code to its pre-existing state (for ROCm 3.3). The code within the #else block will have a short shelf life. it will be gone with the switch to ROCm 3.5+, which should happen soon (within the next month).

Can we leave it as is for now?

hipError_t err = hipOccupancyMaxActiveBlocksPerMultiprocessor(
&block_count, func, fixed_block_size, dynamic_shared_memory_size);
CHECK_EQ(err, hipSuccess);
#else
// Apply the heuristic in GetGpuLaunchConfig(int, const Eigen::GpuDevice&)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be cleaner to provide a forward-compatible implementation of hipOccupancyMaxActiveBlocksPerMultiprocessor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

inc_dirs = []

# Add HSA headers (needs to match $HSA_PATH)
inc_dirs.append(rocm_config.rocm_toolkit_path + "/hsa/include")
inc_dirs.append(rocm_toolkit_path + "/hsa/include")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, this mix of symlinked and realpath was intentional, but I don't remember the details.
:-(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we kick off the internal G build for ROCm on this change, and see if it breaks anything.
If nothing breaks, then no harm no foul!
else we can revert it back to what it was

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, it does break our CI build.

ERROR: /tmpfs/tensor_flow/tensorflow/core/kernels/BUILD:4136:1: Couldn't build file tensorflow/core/kernels/_objs/cwise_op_gpu/cwise_op_gpu_not_equal_to.cu.pic.o: undeclared inclusion(s) in rule '//tensorflow/core/kernels:cwise_op_gpu':
this rule is missing dependency declarations for the following files included by 'tensorflow/core/kernels/cwise_op_gpu_not_equal_to.cu.cc':
  '/opt/rocm/hip/include/hip/hip_runtime.h'
  '/opt/rocm/hip/include/hip/hip_version.h'
  '/opt/rocm/hip/include/hip/hip_common.h'
...

I think what's happening is that the cxx_builtin_include_directories don't match the directories where clang finds does headers (realpath vs symlink path).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very much like the error I filed the following issue for
bazelbuild/bazel#11163

The way we workedaround it on our end was
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/ci_build/Dockerfile.rocm#L76-L88

Can you please check if apply the above workaround works for your build as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do use ROCm from the default '/opt/rocm' location. I'm not entirely sure what is happening, but my theory is that in our environment, opt/rocm is a symlink, and the -isystem paths that are added for hcc are:

  • $HIP_PATH, as-is (potentially symlinked)
  • $ROCM_PATH/lib/clang/x.y/include, real-path'ed

I don't understand how an empty version file changes any of this though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/opt/rocm/ is a soft-link in our env too.

ROCm is typically installed in /opt/rocm-x.x.x (for eg /opt/rocm-3.5.0/) and the /opt/rocm is simply a softlink to the ROCm install you want to use.

I don't understand how an empty version file changes any of this though.

That is because of a quirk in hipcc implementation.

hipcc needs to determine the value of the ROCM_PATH. If the env var is not explicity set (which we suspect happens sometimes due the bazel issue I mentioned earlier), then it checks for the presence of the <path_to_invoked_hipcc>/../.info/version file, and if it exists, it will use <path_to_invoked_hipcc>/.. as the ROCM_PATH.

<path_to_invoked_hipcc> will always be $ROCM_PATH/bin/ and is not dependent upon bazel passing the ROCM_PATH env var correctly.

Are you setting the value of ROCM_PATH before running the configure script? All of our CI scripts do set that value now (before running the confgirue script) because it needs to be passed down correctly to rocm_configure.bzl

https://github.com/tensorflow/tensorflow/blob/master/configure.py#L1441-L1443
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/ci_build/xla/linux/rocm/run_py3.sh#L33
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/ci_build/linux/rocm/run_cc_core.sh#L33
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/ci_build/linux/rocm/run_csb_tests.sh#L33
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/ci_build/linux/rocm/run_py3_core.sh#L33

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our docker sets ROCM_PATH to /opt/rocm, but until this PR ROCM_PATH was not used. We do not run ./configure, but use --config=rbe_linux_rocm_py3.6. Without running ./configure, there is no --action_env=ROCM_PATH either, and rocm_configure.bzl looks (looked) at ROCM_TOOLKIT_PATH.

So what seems to be happending in our internal build is that <path_to_invoked_hipcc> is /opt/rocm/bin/ and hcc will add -isystem /opt/rocm/hip/include, which is a symlink directory. If we change the implementation here to add the realpath to cxx_builtin_include_directories, we will get /opt/rocm-3.5.0/hip/include instead. That violates the bazel layering check.

Shouldn't we support that ROCM_PATH is a symlink? I think we just need to keep the existing mix of rocm_config.rocm_toolkit_path and realpath for that. And probably add a comment why it's handled differently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undid the changes involved as per your suggestion. I do not fully understand why you are running into the build error that you are, but we can debug that after we get the ROCm CSB up and running again, so reverting that part of my change.

Also I added a new commit to the PR to update Dockerfile.rocm to use Ubuntu 18.04.
When I tested out my changes in the (default) Ubuntu 16.04 based container, I got a compile error which seemed to be because of some new C++ syntax, that the older gcc version in Ubuntu 16.04 does not support.
Our internal CI has already switched to using Ubuntu 18.04 based containers, and I believe you have done the same on your end.

this the error I was running into, with Ubuntu 16.04

tensorflow/stream_executor/tpu/tpu_executable_interface.cc: In member function 'virtual xla::StatusOr<xla::ExecutionOutput> xla::TpuExecutableInterface::Execut\
eAsyncOnStream(const xla::ServiceExecutableRunOptions*, std::vector<xla::ExecutionInput>, xla::HloExecutionProfile*)':                                          
tensorflow/stream_executor/tpu/tpu_executable_interface.cc:179:22: error: expected unqualified-id before '[' token                                              
     for (const auto& [parameter, index] :                                                                                                                      
                      ^                                                                                                                                         
tensorflow/stream_executor/tpu/tpu_executable_interface.cc:179:22: error: expected ';' before '[' token                                                         
tensorflow/stream_executor/tpu/tpu_executable_interface.cc:179:23: error: 'parameter' was not declared in this scope                                            
     for (const auto& [parameter, index] :                                                                                                                      
                       ^                                                                                                                                        
tensorflow/stream_executor/tpu/tpu_executable_interface.cc:179:34: error: capture by copy of incomplete type '<unresolved overloaded function type>'            
     for (const auto& [parameter, index] :                                                                                                                      
                                  ^                                                                                                                             
tensorflow/stream_executor/tpu/tpu_executable_interface.cc: In lambda function:                                                                                 
tensorflow/stream_executor/tpu/tpu_executable_interface.cc:179:41: error: expected '{' before ':' token                                                         
     for (const auto& [parameter, index] :                                                                                                                      
                                         ^                                                                                                                      

the same source code builds fine with gcc on Ubuntu 18.04

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chsigg please review / approve...thanks again

@tensorflowbutler tensorflowbutler removed the awaiting review Pull request awaiting review label Jul 12, 2020
@gbaned gbaned added the kokoro:force-run Tests on submitted change label Jul 12, 2020
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Jul 12, 2020
@deven-amd
Copy link
Contributor Author

@chsigg I do not havbe visibility into the results for the internal ROCm build at your end. Let me know if it passed and whether we can take this PR

@deven-amd
Copy link
Contributor Author

@chsigg gentle ping

@gbaned gbaned added the awaiting review Pull request awaiting review label Jul 17, 2020
@deven-amd
Copy link
Contributor Author

@chsigg gentle ping

The following commit (which switched G's internal CI to use ROCm 3.5) breaks the ROCm CSB build (which still uses ROCm 3.3)

tensorflow@22def20

This PR/commit simply puts back a couple of codes that were removed the the previous commit, and makes them condition on ROCm 3.5.

Note that the ROCm CSB build will be switching to ROCm 3.5 or higher in the near future, at which point all codes the `true` block for `#if TENSORFLOW_COMPILER_IS_HIP_CLANG` will become default, and those in eht `false / #else` block will be removed.
@deven-amd deven-amd force-pushed the google_upstream_rocm_platform_fix_200630 branch from b2aa121 to ddafc33 Compare July 24, 2020 03:06
PR Queue automation moved this from Assigned Reviewer to Approved by Reviewer Jul 24, 2020
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Jul 24, 2020
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Jul 24, 2020
@gbaned gbaned removed the awaiting review Pull request awaiting review label Jul 24, 2020
@tensorflow-copybara tensorflow-copybara merged commit cbc87da into tensorflow:master Jul 24, 2020
PR Queue automation moved this from Approved by Reviewer to Merged Jul 24, 2020
@deven-amd deven-amd deleted the google_upstream_rocm_platform_fix_200630 branch August 26, 2020 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes comp:gpu GPU related issues ready to pull PR ready for merge process size:S CL Change Size: Small
Projects
PR Queue
  
Merged
Development

Successfully merging this pull request may close these issues.

None yet

7 participants