Skip to content

Build from source with MPI support #25677

@jw447

Description

@jw447

Please make sure that this is a build/installation issue. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:build_template

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary): source
  • TensorFlow version: v1.12
  • Python version: 3.6
  • Installed using virtualenv? pip? conda?:
  • Bazel version (if compiling from source): 0.21.0
  • GCC/Compiler version (if compiling from source):4.8.5
  • CUDA/cuDNN version:
  • GPU model and memory:

Describe the problem

Bazel won't build tensorflow with MPI support. Error message says "ImportError: libmpi.so.40: cannot open shared object file: No such file or directory". However I had my $LD_LIBRARY_PATH set correctly.

I've using openmpi-3.0.3. Previously I tried openmpi-1.8.1 and it didn't work either.

PS: the tested tensorflow build configuration says to use bazel 0.15.0 but I got an error says "Please upgrade your bazel installation to version 0.19.0 or higher to build TensorFlow!" when I tried to configure. Then i switched to bazel 0.21.0.

Provide the exact sequence of commands / steps that you executed before running into the problem

jon@jon-OptiPlex-3050:~/local_build/tensorflow$ ./configure
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
INFO: Invocation ID: a65fa066-3efc-406a-aba3-d8f7e8c10bfa
You have bazel 0.21.0 installed.
Please specify the location of python. [Default is /home/jon/anaconda3/bin/python]:

Found possible Python library paths:
/home/jon/anaconda3/lib/python3.6/site-packages
Please input the desired Python library path to use. Default is [/home/jon/anaconda3/lib/python3.6/site-packages]

Do you wish to build TensorFlow with XLA JIT support? [Y/n]:
XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]:
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with ROCm support? [y/N]:
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]:
No CUDA support will be enabled for TensorFlow.

Do you wish to download a fresh release of clang? (Experimental) [y/N]:
Clang will not be downloaded.

Do you wish to build TensorFlow with MPI support? [Y/n]: Y
MPI support will be enabled for TensorFlow.

Please specify the MPI toolkit folder. [Default is /home/jon/openmpi-3.0.3]:

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare
]:

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
--config=mkl # Build with MKL support.
--config=monolithic # Config for mostly static monolithic build.
--config=gdr # Build with GDR support.
--config=verbs # Build with libverbs support.
--config=ngraph # Build with Intel nGraph support.
--config=dynamic_kernels # (Experimental) Build kernels into separate shared objects.
Preconfigured Bazel build configs to DISABLE default on features:
--config=noaws # Disable AWS S3 filesystem support.
--config=nogcp # Disable GCP support.
--config=nohdfs # Disable HDFS support.
--config=noignite # Disable Apache Ignite support.
--config=nokafka # Disable Apache Kafka support.
--config=nonccl # Disable NVIDIA NCCL support.
Configuration finished


bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package
Starting local Bazel server and connecting to it...
INFO: Invocation ID: 26351414-5c95-4a85-91f0-54b998cf6311
WARNING: /home/jon/local_build/tensorflow/tensorflow/python/BUILD:3140:1: in py_library rule //tensorflow/python:standard_ops: target '//tensorflow/p
ython:standard_ops' depends on deprecated target '//tensorflow/python/ops/distributions:distributions': TensorFlow Distributions has migrated to Tens
orFlow Probability (https://github.com/tensorflow/probability). Deprecated copies remaining in tf.distributions will not receive new features, and wi
ll be removed by early 2019. You should update all usage of tf.distributions to tfp.distributions.
WARNING: /home/jon/local_build/tensorflow/tensorflow/python/BUILD:100:1: in py_library rule //tensorflow/python:no_contrib: target '//tensorflow/pyth
on:no_contrib' depends on deprecated target '//tensorflow/python/ops/distributions:distributions': TensorFlow Distributions has migrated to TensorFlo
w Probability (https://github.com/tensorflow/probability). Deprecated copies remaining in tf.distributions will not receive new features, and will be
removed by early 2019. You should update all usage of tf.distributions to tfp.distributions.
WARNING: /home/jon/local_build/tensorflow/tensorflow/contrib/metrics/BUILD:16:1: in py_library rule //tensorflow/contrib/metrics:metrics_py: target '
//tensorflow/contrib/metrics:metrics_py' depends on deprecated target '//tensorflow/python/ops/distributions:distributions': TensorFlow Distributions
has migrated to TensorFlow Probability (https://github.com/tensorflow/probability). Deprecated copies remaining in tf.distributions will not receive
new features, and will be removed by early 2019. You should update all usage of tf.distributions to tfp.distributions.
WARNING: /home/jon/local_build/tensorflow/tensorflow/contrib/learn/BUILD:17:1: in py_library rule //tensorflow/contrib/learn:learn: target '//tensorf
low/contrib/learn:learn' depends on deprecated target '//tensorflow/contrib/session_bundle:exporter': No longer supported. Switch to SavedModel immed
iately.
WARNING: /home/jon/local_build/tensorflow/tensorflow/contrib/learn/BUILD:17:1: in py_library rule //tensorflow/contrib/learn:learn: target '//tensorf
low/contrib/learn:learn' depends on deprecated target '//tensorflow/contrib/session_bundle:gc': No longer supported. Switch to SavedModel immediately
.
WARNING: /home/jon/local_build/tensorflow/tensorflow/contrib/bayesflow/BUILD:17:1: in py_library rule //tensorflow/contrib/bayesflow:bayesflow_py: ta
rget '//tensorflow/contrib/bayesflow:bayesflow_py' depends on deprecated target '//tensorflow/contrib/distributions:distributions_py': TensorFlow Dis
tributions has migrated to TensorFlow Probability (https://github.com/tensorflow/probability). Deprecated copies remaining in tf.contrib.distribution
s are unmaintained, unsupported, and will be removed by late 2018. You should update all usage of tf.contrib.distributions to tfp.distributions.
WARNING: /home/jon/local_build/tensorflow/tensorflow/contrib/BUILD:13:1: in py_library rule //tensorflow/contrib:contrib_py: target '//tensorflow/con
trib:contrib_py' depends on deprecated target '//tensorflow/contrib/distributions:distributions_py': TensorFlow Distributions has migrated to TensorF
low Probability (https://github.com/tensorflow/probability). Deprecated copies remaining in tf.contrib.distributions are unmaintained, unsupported, a
nd will be removed by late 2018. You should update all usage of tf.contrib.distributions to tfp.distributions.
INFO: Analysed target //tensorflow/tools/pip_package:build_pip_package (387 packages loaded, 23360 targets configured).
INFO: Found 1 target...
ERROR: /home/jon/local_build/tensorflow/tensorflow/BUILD:606:1: Executing genrule //tensorflow:tf_python_api_gen_v1 failed (Exit 1)
Traceback (most recent call last):
File "/home/jon/.cache/bazel/_bazel_jon/7144622773f131b4a531d14f5da5d2cf/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.py
thon_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow.py", line 58, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "/home/jon/.cache/bazel/_bazel_jon/7144622773f131b4a531d14f5da5d2cf/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.py
thon_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in
_pywrap_tensorflow_internal = swig_import_helper()
File "/home/jon/.cache/bazel/_bazel_jon/7144622773f131b4a531d14f5da5d2cf/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.py
thon_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/home/jon/anaconda3/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/home/jon/anaconda3/lib/python3.6/imp.py", line 347, in load_dynamic
return _load(spec)
ImportError: libmpi.so.40: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/jon/.cache/bazel/_bazel_jon/7144622773f131b4a531d14f5da5d2cf/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.py
thon_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/tools/api/generator/create_python_api.py", line 27, in
from tensorflow.python.tools.api.generator import doc_srcs
File "/home/jon/.cache/bazel/_bazel_jon/7144622773f131b4a531d14f5da5d2cf/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.py
thon_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/init.py", line 49, in
from tensorflow.python import pywrap_tensorflow
File "/home/jon/.cache/bazel/_bazel_jon/7144622773f131b4a531d14f5da5d2cf/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.py
thon_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow.py", line 74, in
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/home/jon/.cache/bazel/_bazel_jon/7144622773f131b4a531d14f5da5d2cf/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.py
thon_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow.py", line 58, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "/home/jon/.cache/bazel/_bazel_jon/7144622773f131b4a531d14f5da5d2cf/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.py
thon_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in
_pywrap_tensorflow_internal = swig_import_helper()
File "/home/jon/.cache/bazel/_bazel_jon/7144622773f131b4a531d14f5da5d2cf/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.py
thon_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/home/jon/anaconda3/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/home/jon/anaconda3/lib/python3.6/imp.py", line 347, in load_dynamic
return _load(spec)
ImportError: libmpi.so.40: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.
ModuleSpec(name='_pywrap_tensorflow_internal', loader=<_frozen_importlib_external.ExtensionFileLoader object at 0x7f059de53c18>, origin='/home/jon/.c
ache/bazel/_bazel_jon/7144622773f131b4a531d14f5da5d2cf/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.python_api_1_tf_python
_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/_pywrap_tensorflow_internal.so')
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 28.562s, Critical Path: 4.92s
INFO: 0 processes.
FAILED: Build did NOT complete successfully

Any other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions