Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorflow v2.2.0 build failure at the final step on macOS 10.13.6 with CUDA enabled #39262

Closed
TomHeaven opened this issue May 7, 2020 · 9 comments
Assignees
Labels
stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author subtype:macOS macOS Build/Installation issues TF 2.2 Issues related to TF 2.2 type:build/install Build and install issues

Comments

@TomHeaven
Copy link

TomHeaven commented May 7, 2020

I've been building wheel packages of Tensorflow with CUDA support for macOS for a while. Usually, TF will be built successfully with some patches on sources and bazel config files.

However, I met a troublesome issue building Tensorflow v2.2.0 at the final step: The compiler complained about Symbol not found: __ZN10tensorflow4data12experimental14SnapshotReader33kSnappyReaderInputBufferSizeBytesE when loading _pywrap_tensorflow_internal.so. The details are as follows:

System information

  • Mac OS 10.13.6:
  • TensorFlow installed from (source or binary): Source
  • TensorFlow version (use command below): 2.2.0
  • Python version: 3.7.0
  • Bazel version (if compiling from source): 2.0.0
  • GCC/Compiler version (if compiling from source): AppleClang++ 9.0
  • CUDA/cuDNN version: 10.0/7.4
  • GPU model and memory: Nvidia Titan V
  • Exact command to reproduce:
bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-msse4.2 --config=nonccl --config=monolithic --verbose_failures //tensorflow/tools/pip_package:build_pip_package
  • Build Failure Info:
INFO: Analyzed target //tensorflow/tools/pip_package:build_pip_package (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
ERROR: /Volumes/Data/libraries/tensorflow/tensorflow/python/keras/api/BUILD:117:1: Executing genrule //tensorflow/python/keras/api:keras_python_api_gen_compat_v1 failed (Exit 1)
Traceback (most recent call last):
  File "/private/var/tmp/_bazel_tomheaven/561821a038e9c8d51ab53646fb4bd33f/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v1.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/private/var/tmp/_bazel_tomheaven/561821a038e9c8d51ab53646fb4bd33f/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v1.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/private/var/tmp/_bazel_tomheaven/561821a038e9c8d51ab53646fb4bd33f/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v1.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: dlopen(/private/var/tmp/_bazel_tomheaven/561821a038e9c8d51ab53646fb4bd33f/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v1.runfiles/org_tensorflow/tensorflow/python/_pywrap_tensorflow_internal.so, 10): Symbol not found: __ZN10tensorflow4data12experimental14SnapshotReader33kSnappyReaderInputBufferSizeBytesE
  Referenced from: /private/var/tmp/_bazel_tomheaven/561821a038e9c8d51ab53646fb4bd33f/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v1.runfiles/org_tensorflow/tensorflow/python/_pywrap_tensorflow_internal.so
  Expected in: flat namespace
 in /private/var/tmp/_bazel_tomheaven/561821a038e9c8d51ab53646fb4bd33f/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v1.runfiles/org_tensorflow/tensorflow/python/_pywrap_tensorflow_internal.so

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/private/var/tmp/_bazel_tomheaven/561821a038e9c8d51ab53646fb4bd33f/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v1.runfiles/org_tensorflow/tensorflow/python/tools/api/generator/create_python_api.py", line 27, in <module>
    from tensorflow.python.tools.api.generator import doc_srcs
  File "/private/var/tmp/_bazel_tomheaven/561821a038e9c8d51ab53646fb4bd33f/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v1.runfiles/org_tensorflow/tensorflow/python/__init__.py", line 50, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/private/var/tmp/_bazel_tomheaven/561821a038e9c8d51ab53646fb4bd33f/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v1.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow.py", line 69, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/private/var/tmp/_bazel_tomheaven/561821a038e9c8d51ab53646fb4bd33f/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v1.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/private/var/tmp/_bazel_tomheaven/561821a038e9c8d51ab53646fb4bd33f/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v1.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/private/var/tmp/_bazel_tomheaven/561821a038e9c8d51ab53646fb4bd33f/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v1.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: dlopen(/private/var/tmp/_bazel_tomheaven/561821a038e9c8d51ab53646fb4bd33f/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v1.runfiles/org_tensorflow/tensorflow/python/_pywrap_tensorflow_internal.so, 10): Symbol not found: __ZN10tensorflow4data12experimental14SnapshotReader33kSnappyReaderInputBufferSizeBytesE
  Referenced from: /private/var/tmp/_bazel_tomheaven/561821a038e9c8d51ab53646fb4bd33f/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v1.runfiles/org_tensorflow/tensorflow/python/_pywrap_tensorflow_internal.so
  Expected in: flat namespace
 in /private/var/tmp/_bazel_tomheaven/561821a038e9c8d51ab53646fb4bd33f/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v1.runfiles/org_tensorflow/tensorflow/python/_pywrap_tensorflow_internal.so


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
ERROR: /Volumes/Data/libraries/tensorflow/tensorflow/python/tools/BUILD:82:1 Executing genrule //tensorflow/python/keras/api:keras_python_api_gen_compat_v1 failed (Exit 1)
INFO: Elapsed time: 0.967s, Critical Path: 0.31s
INFO: 0 processes.
FAILED: Build did NOT complete successfully
@google-ml-butler
Copy link

From the template it looks like you are installing TensorFlow (TF) prebuilt binaries:

  • For TF-GPU - See point 1
  • For TF-CPU - See point 2

1. Installing TensorFlow-GPU (TF) prebuilt binaries

TF Version >= 1.13 requires CUDA 10.0 and TF Version < 1.13 (till TF 1.5) requires CUDA 9.0.

  • If you have above configuration and using Windows platform -
    • Try adding the CUDA, CUPTI, and cuDNN installation directories to the %PATH% environment variable.
    • Refer windows setup guide.
  • If you have above configuration and using Ubuntu/Linux platform -
    • Try adding the CUDA, CUPTI, and cuDNN installation directories to the $LD_LIBRARY_PATH environment variable.
    • Refer linux setup guide.
  • If error still persists then, apparently your CPU model does not support AVX instruction sets.

2. Installing TensorFlow (TF) CPU prebuilt binaries

TensorFlow release binaries version 1.6 and higher are prebuilt with AVX instruction sets.

Therefore on any CPU that does not have these instruction sets, either CPU or GPU version of TF will fail to load.
Apparently, your CPU model does not support AVX instruction sets. You can still use TensorFlow with the alternatives given below:

  • Try Google Colab to use TensorFlow.
    • The easiest way to use TF will be to switch to google colab.You get pre-installed latest stable TF version. Also you can usepip install to install any other preferred TF version.
    • It has an added advantage since you can you easily switch to different hardware accelerators (cpu, gpu, tpu) as per the task.
    • All you need is a good internet connection and you are all set.
  • Try to build TF from sources by changing CPU optimization flags.

Please let us know if this helps.

@Saduf2019 Saduf2019 added type:build/install Build and install issues subtype:macOS macOS Build/Installation issues labels May 7, 2020
@Saduf2019
Copy link
Contributor

@TomHeaven
Could you please have a look at below issues with similar error.

#35584 #34429 #36945 #34117

@Saduf2019 Saduf2019 added the stat:awaiting response Status - Awaiting response from author label May 7, 2020
@TomHeaven
Copy link
Author

TomHeaven commented May 8, 2020

@TomHeaven
Could you please have a look at below issues with similar error.

#35584 #34429 #36945 #34117

Thanks for the reply. The issue seems the most similar to #35584, but the cause is different.

I've figured it out. The problem is in tensorflow/core/kernels/data/experimental/snapshot_util.h:73-77:

class SnapshotReader {
 public:
   //The reader input buffer size is deliberately large because the input reader
   //will throw an error if the compressed block length cannot fit in the input
   //buffer.
  static constexpr const int64 kSnappyReaderInputBufferSizeBytes =
      1 << 30;  // 1 GiB
  // TODO(b/148804377): Set this in a smarter fashion.
  static constexpr const int64 kSnappyReaderOutputBufferSizeBytes =
  /   32 << 20;  // 32 MiB
...

There are two variables kSnappyReaderInputBufferSizeBytes and kSnappyReaderOutputBufferSizeBytes declared within class SnapshotReader and referenced in tensorflow/core/kernels/data/experimental/snapshot_util.cc. Somehow, the constant variables in the class are compiled as indirect symbols that cannot be found in flat namespace on macOS using AppleClang++.

The solution is quite simple. I just move those two variable declarations from snapshot_util.h to snapshot_util.cc:32-40 such as

namespace tensorflow {
namespace data {
namespace experimental {
// Tom Added to solve symbol not found error on macOS
static constexpr const int64 kSnappyReaderInputBufferSizeBytes =
    1 << 30;  // 1 GiB
    // TODO(b/148804377): Set this in a smarter fashion.
static constexpr const int64 kSnappyReaderOutputBufferSizeBytes =
    32 << 20;  // 32 MiB

and re-compile Tensorflow.

I'll try to make a pull request to close this issue.

@Saduf2019 Saduf2019 added TF 2.2 Issues related to TF 2.2 and removed stat:awaiting response Status - Awaiting response from author labels May 8, 2020
@TomHeaven
Copy link
Author

I created a pull request #39297 to solve this issue and multiple other compilation errors on macOS with CUDA enabled.

@jkyl
Copy link
Contributor

jkyl commented Jun 4, 2020

Thank you @TomHeaven I will be eagerly awaiting the result of this PR.

@Saduf2019
Copy link
Contributor

@TomHeaven

Please confirm if we may move this issue to closed status as there is a pr to monitor it

@Saduf2019 Saduf2019 added the stat:awaiting response Status - Awaiting response from author label Jun 5, 2020
@google-ml-butler
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

@google-ml-butler google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Jun 12, 2020
@google-ml-butler
Copy link

Closing as stale. Please reopen if you'd like to work on this further.

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author subtype:macOS macOS Build/Installation issues TF 2.2 Issues related to TF 2.2 type:build/install Build and install issues
Projects
None yet
Development

No branches or pull requests

3 participants