Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_pywrap_tensorflow_internal.so: undefined symbol: _ZTIN10tensorflow8OpKernelE #36691

Closed
artemry-nv opened this issue Feb 12, 2020 · 8 comments
Closed
Assignees
Labels
subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues TF 2.1 for tracking issues in 2.1 release type:build/install Build and install issues

Comments

@artemry-nv
Copy link

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04.6
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
  • TensorFlow installed from (source or binary): source
  • TensorFlow version: 2.1.0 (master)
  • Python version: Python 3.5.2
  • Installed using virtualenv? pip? conda?: apt
  • Bazel version (if compiling from source): 1.2.1
  • GCC/Compiler version (if compiling from source): gcc 5.4.0
  • CUDA/cuDNN version: CUDA 10.2 / cuDNN 7.6.5.32-1+cuda10.2
  • GPU model and memory: NVIDIA Tesla V100

Describe the problem
Observed after 00befcd commit.
TensorFlow building fails with the error:

05:24:13  ./tensorflow/c/eager/tape.h: In instantiation of 'tensorflow::Status tensorflow::eager::{anonymous}::InitialGradients(const tensorflow::eager::VSpace<Gradient, BackwardFunction, TapeTensor>&, tensorflow::gtl::ArraySlice<long long int>, const std::unordered_map<long long int, TapeTensor>&, tensorflow::gtl::ArraySlice<Gradient*>, const TensorTape&, tensorflow::eager::OpTape<BackwardFunction, TapeTensor>&, std::unordered_map<long long int, std::vector<LhsScalar*> >*) [with Gradient = _object; BackwardFunction = std::function<_object*(_object*, const std::vector<long long int>&)>; TapeTensor = PyTapeTensor; tensorflow::gtl::ArraySlice<long long int> = absl::Span<const long long int>; tensorflow::gtl::ArraySlice<Gradient*> = absl::Span<_object* const>; tensorflow::eager::TensorTape = std::unordered_map<long long int, long long int>; tensorflow::eager::OpTape<BackwardFunction, TapeTensor> = std::unordered_map<long long int, tensorflow::eager::OpTapeEntry<std::function<_object*(_object*, const std::vector<long long int>&)>, PyTapeTensor>, std::hash<long long int>, std::equal_to<long long int>, std::allocator<std::pair<const long long int, tensorflow::eager::OpTapeEntry<std::function<_object*(_object*, const std::vector<long long int>&)>, PyTapeTensor> > > >]':
05:24:13  ./tensorflow/c/eager/tape.h:663:30:   required from 'tensorflow::Status tensorflow::eager::GradientTape<Gradient, BackwardFunction, TapeTensor>::ComputeGradient(const tensorflow::eager::VSpace<Gradient, BackwardFunction, TapeTensor>&, tensorflow::gtl::ArraySlice<long long int>, tensorflow::gtl::ArraySlice<long long int>, const std::unordered_map<long long int, TapeTensor>&, tensorflow::gtl::ArraySlice<Gradient*>, std::vector<LhsScalar*>*) [with Gradient = _object; BackwardFunction = std::function<_object*(_object*, const std::vector<long long int>&)>; TapeTensor = PyTapeTensor; tensorflow::gtl::ArraySlice<long long int> = absl::Span<const long long int>; tensorflow::gtl::ArraySlice<Gradient*> = absl::Span<_object* const>]'
05:24:13  tensorflow/python/eager/pywrap_tfe_src.cc:2566:27:   required from here
05:24:13  ./tensorflow/c/eager/tape.h:576:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
05:24:13     for (int i = 0; i < target_tensor_ids.size(); ++i) {
05:24:13                       ^
05:24:13  ./tensorflow/c/eager/tape.h:588:27: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
05:24:13           for (int j = 0; j < op_it->second.output_tensor_info.size(); ++j) {
05:24:13                             ^
05:24:52  ERROR: /scrap/jenkins/workspace/_ML_DevOps_team/ml-tensorflow-ci-pipeline/tensorflow/tensorflow/python/keras/api/BUILD:130:1: Executing genrule //tensorflow/python/keras/api:keras_python_api_gen_compat_v2 failed (Exit 1)
05:24:52  Traceback (most recent call last):
05:24:52    File "/home/swx-jenkins/.cache/bazel/_bazel_swx-jenkins/2c3650f385da66a5652cb752baf8a83c/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v2.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
05:24:52      from tensorflow.python.pywrap_tensorflow_internal import *
05:24:52    File "/home/swx-jenkins/.cache/bazel/_bazel_swx-jenkins/2c3650f385da66a5652cb752baf8a83c/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v2.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
05:24:52      _pywrap_tensorflow_internal = swig_import_helper()
05:24:52    File "/home/swx-jenkins/.cache/bazel/_bazel_swx-jenkins/2c3650f385da66a5652cb752baf8a83c/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v2.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
05:24:52      _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
05:24:52    File "/usr/lib/python3.5/imp.py", line 242, in load_module
05:24:52      return load_dynamic(name, filename, file)
05:24:52    File "/usr/lib/python3.5/imp.py", line 342, in load_dynamic
05:24:52      return _load(spec)
05:24:52  ImportError: /home/swx-jenkins/.cache/bazel/_bazel_swx-jenkins/2c3650f385da66a5652cb752baf8a83c/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v2.runfiles/org_tensorflow/tensorflow/python/_pywrap_tensorflow_internal.so: undefined symbol: _ZTIN10tensorflow8OpKernelE
05:24:52  
05:24:52  During handling of the above exception, another exception occurred:
05:24:52  
05:24:52  Traceback (most recent call last):
05:24:52    File "/home/swx-jenkins/.cache/bazel/_bazel_swx-jenkins/2c3650f385da66a5652cb752baf8a83c/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v2.runfiles/org_tensorflow/tensorflow/python/tools/api/generator/create_python_api.py", line 27, in <module>
05:24:52      from tensorflow.python.tools.api.generator import doc_srcs
05:24:52    File "/home/swx-jenkins/.cache/bazel/_bazel_swx-jenkins/2c3650f385da66a5652cb752baf8a83c/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v2.runfiles/org_tensorflow/tensorflow/python/__init__.py", line 50, in <module>
05:24:52      from tensorflow.python import pywrap_tensorflow
05:24:52    File "/home/swx-jenkins/.cache/bazel/_bazel_swx-jenkins/2c3650f385da66a5652cb752baf8a83c/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v2.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow.py", line 69, in <module>
05:24:52      raise ImportError(msg)
05:24:52  ImportError: Traceback (most recent call last):
05:24:52    File "/home/swx-jenkins/.cache/bazel/_bazel_swx-jenkins/2c3650f385da66a5652cb752baf8a83c/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v2.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
05:24:52      from tensorflow.python.pywrap_tensorflow_internal import *
05:24:52    File "/home/swx-jenkins/.cache/bazel/_bazel_swx-jenkins/2c3650f385da66a5652cb752baf8a83c/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v2.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
05:24:52      _pywrap_tensorflow_internal = swig_import_helper()
05:24:52    File "/home/swx-jenkins/.cache/bazel/_bazel_swx-jenkins/2c3650f385da66a5652cb752baf8a83c/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v2.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
05:24:52      _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
05:24:52    File "/usr/lib/python3.5/imp.py", line 242, in load_module
05:24:52      return load_dynamic(name, filename, file)
05:24:52    File "/usr/lib/python3.5/imp.py", line 342, in load_dynamic
05:24:52      return _load(spec)
05:24:52  ImportError: /home/swx-jenkins/.cache/bazel/_bazel_swx-jenkins/2c3650f385da66a5652cb752baf8a83c/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v2.runfiles/org_tensorflow/tensorflow/python/_pywrap_tensorflow_internal.so: undefined symbol: _ZTIN10tensorflow8OpKernelE

Provide the exact sequence of commands / steps that you executed before running into the problem
.tf_configure.bazelrc:

build --action_env PYTHON_BIN_PATH="/usr/bin/python3"
build --action_env PYTHON_LIB_PATH="/usr/local/lib/python3.5/dist-packages"
build --python_path="/usr/bin/python3"
build:xla --define with_xla_support=true
build --action_env TF_CUDA_VERSION="10.2"
build --action_env TF_CUDNN_VERSION="7"
build --action_env TF_NCCL_VERSION="2.6.0"
build --action_env TF_CUDA_PATHS="/hpc/local/oss/cuda10.2/cuda-toolkit,/usr,/usr/local/cuda"
build --action_env CUDA_TOOLKIT_PATH="/hpc/local/oss/cuda10.2/cuda-toolkit"
build --action_env CUDNN_INSTALL_PATH="/usr"
build --action_env NCCL_INSTALL_PATH="<cut>/nccl/stable"
build --action_env TF_CUDA_COMPUTE_CAPABILITIES="7.0"
build --action_env LD_LIBRARY_PATH="<cut>/nccl/stable/lib:<cut>/ci_tools_do_not_remove/hpcx-v2.6.pre-gcc-MLNX_OFED_LINUX-4.7-1.0.0.1-ubuntu16.04-x86_64/nccl_rdma_sharp_plugin/lib:<cut>/ci_tools_do_not_remove/hpcx-v2.6.pre-gcc-MLNX_OFED_LINUX-4.7-1.0.0.1-ubuntu16.04-x86_64/ucx/lib/ucx:<cut>/ci_tools_do_not_remove/hpcx-v2.6.pre-gcc-MLNX_OFED_LINUX-4.7-1.0.0.1-ubuntu16.04-x86_64/ucx/lib:<cut>/ci_tools_do_not_remove/hpcx-v2.6.pre-gcc-MLNX_OFED_LINUX-4.7-1.0.0.1-ubuntu16.04-x86_64/sharp/lib:<cut>/ci_tools_do_not_remove/hpcx-v2.6.pre-gcc-MLNX_OFED_LINUX-4.7-1.0.0.1-ubuntu16.04-x86_64/hcoll/lib:<cut>/ci_tools_do_not_remove/hpcx-v2.6.pre-gcc-MLNX_OFED_LINUX-4.7-1.0.0.1-ubuntu16.04-x86_64/ompi/lib:/hpc/local/oss/cuda10.2/cuda-toolkit/lib64:/hpc/local/oss/cuda10.2/cuda-toolkit/lib64/stubs:/usr/local/nvidia/lib:/usr/local/nvidia/lib64"
build --action_env GCC_HOST_COMPILER_PATH="/usr/bin/gcc-5"
build --config=cuda
build:opt --copt=-march=native
build:opt --host_copt=-march=native
build:opt --define with_default_optimizations=true
test --flaky_test_attempts=3
test --test_size_filters=small,medium
test --test_env=LD_LIBRARY_PATH
test:v1 --test_tag_filters=-benchmark-test,-no_oss,-no_gpu,-oss_serial
test:v1 --build_tag_filters=-benchmark-test,-no_oss,-no_gpu
test:v2 --test_tag_filters=-benchmark-test,-no_oss,-no_gpu,-oss_serial,-v1only
test:v2 --build_tag_filters=-benchmark-test,-no_oss,-no_gpu,-v1only
build --action_env TF_CONFIGURE_IOS="0"

CC @av8ramit

@av8ramit av8ramit self-assigned this Feb 12, 2020
@gadagashwini-zz gadagashwini-zz added TF 2.1 for tracking issues in 2.1 release subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues type:build/install Build and install issues labels Feb 13, 2020
@gadagashwini-zz gadagashwini-zz removed their assignment Feb 13, 2020
@av8ramit
Copy link

Can you check if c4c8b17 fixes it for you? I've been able to reproduce and fix that issue on an 18.04 docker container with gcc5.

@artemry-nv
Copy link
Author

Thanks, I'll try it but seems there's another building regression (need to double check it):

23:10:11  DEBUG: Rule 'io_bazel_rules_docker' indicated that a canonical reproducible form can be obtained by modifying arguments shallow_since = "1556410077 -0400"
23:10:11  DEBUG: Call stack for the definition of repository 'io_bazel_rules_docker' which is a git_repository (rule definition at /home/swx-jenkins/.cache/bazel/_bazel_swx-jenkins/2c3650f385da66a5652cb752baf8a83c/external/bazel_tools/tools/build_defs/repo/git.bzl:195:18):
23:10:11   - /home/swx-jenkins/.cache/bazel/_bazel_swx-jenkins/2c3650f385da66a5652cb752baf8a83c/external/bazel_toolchains/repositories/repositories.bzl:37:9
23:10:11   - /scrap/jenkins/workspace/_ML_DevOps_team/ml-tensorflow-ci-pipeline/tensorflow/WORKSPACE:37:1
23:10:12  Loading: 0 packages loaded
23:10:12      currently loading: tensorflow/tools/pip_package
23:10:19  INFO: Call stack for the definition of repository 'local_config_cuda' which is a cuda_configure (rule definition at /scrap/jenkins/workspace/_ML_DevOps_team/ml-tensorflow-ci-pipeline/tensorflow/third_party/gpus/cuda_configure.bzl:1183:18):
23:10:19   - /scrap/jenkins/workspace/_ML_DevOps_team/ml-tensorflow-ci-pipeline/tensorflow/tensorflow/workspace.bzl:87:5
23:10:19   - /scrap/jenkins/workspace/_ML_DevOps_team/ml-tensorflow-ci-pipeline/tensorflow/WORKSPACE:19:1
23:10:19  Loading: 0 packages loaded
23:10:19      currently loading: tensorflow/tools/pip_package
23:10:19  ERROR: An error occurred during the fetch of repository 'local_config_cuda':
23:10:19     Traceback (most recent call last):
23:10:19  	File "/scrap/jenkins/workspace/_ML_DevOps_team/ml-tensorflow-ci-pipeline/tensorflow/third_party/gpus/cuda_configure.bzl", line 1181
23:10:19  		_create_local_cuda_repository(<1 more arguments>)
23:10:19  	File "/scrap/jenkins/workspace/_ML_DevOps_team/ml-tensorflow-ci-pipeline/tensorflow/third_party/gpus/cuda_configure.bzl", line 930, in _create_local_cuda_repository
23:10:19  		cuda_lib_outs.append(<1 more arguments>)
23:10:19  	File "/scrap/jenkins/workspace/_ML_DevOps_team/ml-tensorflow-ci-pipeline/tensorflow/third_party/gpus/cuda_configure.bzl", line 930, in cuda_lib_outs.append
23:10:19  		_basename(repository_ctx, <1 more arguments>)
23:10:19  	File "/scrap/jenkins/workspace/_ML_DevOps_team/ml-tensorflow-ci-pipeline/tensorflow/third_party/gpus/cuda_configure.bzl", line 930, in _basename
23:10:19  		path.basename
23:10:19  object of type 'string' has no field 'basename'
23:10:19  ERROR: Skipping '//tensorflow/tools/pip_package:build_pip_package': no such package '@local_config_cuda//cuda': Traceback (most recent call last):
23:10:19  	File "/scrap/jenkins/workspace/_ML_DevOps_team/ml-tensorflow-ci-pipeline/tensorflow/third_party/gpus/cuda_configure.bzl", line 1181
23:10:19  		_create_local_cuda_repository(<1 more arguments>)
23:10:19  	File "/scrap/jenkins/workspace/_ML_DevOps_team/ml-tensorflow-ci-pipeline/tensorflow/third_party/gpus/cuda_configure.bzl", line 930, in _create_local_cuda_repository
23:10:19  		cuda_lib_outs.append(<1 more arguments>)
23:10:19  	File "/scrap/jenkins/workspace/_ML_DevOps_team/ml-tensorflow-ci-pipeline/tensorflow/third_party/gpus/cuda_configure.bzl", line 930, in cuda_lib_outs.append
23:10:19  		_basename(repository_ctx, <1 more arguments>)
23:10:19  	File "/scrap/jenkins/workspace/_ML_DevOps_team/ml-tensorflow-ci-pipeline/tensorflow/third_party/gpus/cuda_configure.bzl", line 930, in _basename
23:10:19  		path.basename
23:10:19  object of type 'string' has no field 'basename'
23:10:19  WARNING: Target pattern parsing failed.
23:10:19  ERROR: no such package '@local_config_cuda//cuda': Traceback (most recent call last):
23:10:19  	File "/scrap/jenkins/workspace/_ML_DevOps_team/ml-tensorflow-ci-pipeline/tensorflow/third_party/gpus/cuda_configure.bzl", line 1181
23:10:19  		_create_local_cuda_repository(<1 more arguments>)
23:10:19  	File "/scrap/jenkins/workspace/_ML_DevOps_team/ml-tensorflow-ci-pipeline/tensorflow/third_party/gpus/cuda_configure.bzl", line 930, in _create_local_cuda_repository
23:10:19  		cuda_lib_outs.append(<1 more arguments>)
23:10:19  	File "/scrap/jenkins/workspace/_ML_DevOps_team/ml-tensorflow-ci-pipeline/tensorflow/third_party/gpus/cuda_configure.bzl", line 930, in cuda_lib_outs.append
23:10:19  		_basename(repository_ctx, <1 more arguments>)
23:10:19  	File "/scrap/jenkins/workspace/_ML_DevOps_team/ml-tensorflow-ci-pipeline/tensorflow/third_party/gpus/cuda_configure.bzl", line 930, in _basename
23:10:19  		path.basename
23:10:19  object of type 'string' has no field 'basename'

@av8ramit
Copy link

Sorry for the inconvenience @artemry-mlnx I'll try and see what I can do about that right away.

tensorflow-copybara pushed a commit that referenced this issue Feb 13, 2020
This was a leftover from switching from path.basename to _basename(path).

External issue: #36691

PiperOrigin-RevId: 295007256
Change-Id: I7ecc21f6a5c15aeee65eff05aa580a7aae9df9a6
@av8ramit
Copy link

That should be fixed now as well: a15fe8f

@artemry-nv
Copy link
Author

@av8ramit
Thanks a lot!
TensorFlow is built successfully now, but horovod installation with pip install horovod fails now (we use it for TF CNN benchmark test). I'm investigating the issue - may the above commits cause such error?

06:26:48    2020-02-15 03:26:38.528006: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
06:26:48    INFO: Unable to build TensorFlow plugin, will skip it.
06:26:48    
06:26:48    Traceback (most recent call last):
06:26:48      File "/tmp/pip-install-e8noyyjq/horovod/setup.py", line 254, in get_tf_flags
06:26:48        return tf.sysconfig.get_compile_flags(), tf.sysconfig.get_link_flags()
06:26:48      File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/sysconfig.py", line 65, in get_compile_flags
06:26:48        flags.append('-I%s' % get_include())
06:26:48      File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/sysconfig.py", line 43, in get_include
06:26:48        return _os_path.join(_os_path.dirname(tf.__file__), 'include')
06:26:48    AttributeError: module 'tensorflow_core' has no attribute '__file__'
06:26:48    
06:26:48    During handling of the above exception, another exception occurred:
06:26:48    
06:26:48    Traceback (most recent call last):
06:26:48      File "/tmp/pip-install-e8noyyjq/horovod/setup.py", line 1408, in build_extensions
06:26:48        build_tf_extension(self, options)
06:26:48      File "/tmp/pip-install-e8noyyjq/horovod/setup.py", line 869, in build_tf_extension
06:26:48        build_ext, options['COMPILE_FLAGS'])
06:26:48      File "/tmp/pip-install-e8noyyjq/horovod/setup.py", line 257, in get_tf_flags
06:26:48        tf_include_dirs = get_tf_include_dirs()
06:26:48      File "/tmp/pip-install-e8noyyjq/horovod/setup.py", line 182, in get_tf_include_dirs
06:26:48        tf_inc = tf.sysconfig.get_include()
06:26:48      File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/sysconfig.py", line 43, in get_include
06:26:48        return _os_path.join(_os_path.dirname(tf.__file__), 'include')
06:26:48    AttributeError: module 'tensorflow_core' has no attribute '__file__'

@av8ramit
Copy link

That error seems unrelated. I'd recommend filing a new issue or seeing if another exists. I'll also keep a lookout. Closing this for now.

@tensorflow-bot
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@artemry-nv
Copy link
Author

Thanks, it looks like pip install horovod related error has been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues TF 2.1 for tracking issues in 2.1 release type:build/install Build and install issues
Projects
None yet
Development

No branches or pull requests

3 participants