New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiling 1.14 with MPI support #30703
Comments
@berniekirby Please provide the exact sequence of commands / steps that you executed before running into the problem.Thanks! |
Well, it's slightly complicated as it's a cluster system that is near the end of it's life (Centos6). Then just run ./configure Found possible Python library paths: Do you wish to build TensorFlow with XLA JIT support? [Y/n]: n Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: Do you wish to build TensorFlow with ROCm support? [y/N]: Do you wish to build TensorFlow with CUDA support? [y/N]: y Do you wish to build TensorFlow with TensorRT support? [y/N]: Could not find any cuda.h matching version '' in any subdirectory: Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10]: Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: Please specify the locally installed NCCL version you want to use. [Leave empty to use http://github.com/nvidia/nccl]: 2.3 Please specify the comma-separated list of base paths to look for CUDA libraries and headers. [Leave empty to use the default]: /usr/local/cuda/10.0.130 Found CUDA 10.0 in: Please specify a list of comma-separated CUDA compute capabilities you want to build with. Do you want to use clang as CUDA compiler? [y/N]: Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/local/gcc/4.9.3/bin/gcc]: Do you wish to build TensorFlow with MPI support? [y/N]: y Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details. Now build: After much output the build fails with the above given errors. |
I think this is a related issue: As far as I know, MPI with TF is community supported. |
It looks to me as though tensorflow/contrib/mpi_collectives/kernels/ring.cu.cc needs to somehow via #includes get the definition of CudaLaunchKernel from somewhere. If it's community supported, then I suppose we'll just have to wait. Thank you for your time. |
I will work on a fix. |
@byronyi Any updates on this? I'm hitting the same problem... |
I got this working by adding |
I can confirm that adding an |
Fixed by #29673 |
Please make sure that this is a build/installation issue. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:build_template
System information
Compiling with MPI support gives the following build errors:
INFO: From Compiling tensorflow/contrib/mpi_collectives/kernels/ring.cu.cc:
external/com_google_absl/absl/strings/string_view.h(495): warning: expression has no effect
tensorflow/contrib/mpi_collectives/kernels/ring.cu.cc(109): error: identifier "CudaLaunchKernel" is undefined
tensorflow/contrib/mpi_collectives/kernels/ring.cu.cc(110): error: identifier "CudaLaunchKernel" is undefined
tensorflow/contrib/mpi_collectives/kernels/ring.cu.cc(111): error: identifier "CudaLaunchKernel" is undefined
3 errors detected in the compilation of "/tmp/tmpxft_00038d5b_00000000-6_ring.cu.cpp1.ii".
Standard ./configure but answer yes to MPI support
Compiles fine without MPI. Have tried with both openmpi/3.1.3 and cuda enabled openmpi/3.1.3
The text was updated successfully, but these errors were encountered: