-
Notifications
You must be signed in to change notification settings - Fork 74.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build failure: undefined reference to protobuf symbols #34117
Comments
@dbonner, Did you run |
https://www.tensorflow.org/install/source#linux Tested build configuration differs from the OP. Tested build uses: |
@gadagashwini Found possible Python library paths: Do you wish to build TensorFlow with XLA JIT support? [Y/n]: Y Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: N Do you wish to build TensorFlow with ROCm support? [y/N]: N Do you wish to build TensorFlow with CUDA support? [y/N]: y Do you wish to build TensorFlow with TensorRT support? [y/N]: y Found CUDA 10.0 in: Please specify a list of comma-separated CUDA compute capabilities you want to build with. Do you want to use clang as CUDA compiler? [y/N]: N Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details. |
I've also not been able to compile TF from source for the last 4 days because of this. |
I see the same errors too. |
I also see the same problem. I tried bazel 0.27.1 and 0.27.2. It seems related to protobuf. What version of protobuf is TF using? |
Issue title is misleading, first error is not
In any case, can you try building again from a fresh clone and attach the entire log of |
The first error I see is this one:
|
@mihaimaruseac |
First error is
|
@pkanwar23 , can you please help to drive this one? It is blocking several people on our team. |
Just tried this command on the latest master, and got the same error. The first error seems to be this one:
|
That's the exact error I'm seeing. I'm compiling merely with: |
Just noticed this: if I checkout a random commit from Oct 2, I see the linker errors when I build with bazel 0.27.1, but not when I build with bazel 0.24.1. |
I've come to the same conclusion. |
Seems we're observing the similar issue after recent Bazel version upgrade. |
I have also problems with this, trying now bazel 0.24.1 |
Yes, 0.27.1 still fails with the patch. |
And I call this a solution! Here is a simple patch that can be used to lower the bazel requirements down to 0.26.1. Then bazel 0.26.1 worked for me. Hope we'll find a more clever fix.
FYI I'm running configure with the following setting:
|
We have a fix for this issue on TF side. However given that the build succeeds with 0.26.1 and is broken with 0.27.1, we should locate the change in Bazel that caused this behavior. |
@dbonner can you please clarify -- was the build with 0.26.1 successful before a73d7ac? |
building at a73d7ac with bazel 0.27.2 with --noincompatible_do_not_split_linking_cmdline produces the same linker errors. |
@bmzhao has a fix for this which would be incoming soon. |
Bazel's change to legacy_whole_archive behavior is not the cause for TF's linking issues with protobuf. Protobuf's implementation and runtime are correctly being linked into TF here: https://github.com/tensorflow/tensorflow/blob/da5765ebad2e1d3c25d11ee45aceef0b60da499f/tensorflow/core/platform/default/build_config.bzl#L239 and https://github.com/tensorflow/tensorflow/blob/da5765ebad2e1d3c25d11ee45aceef0b60da499f/third_party/protobuf/protobuf.patch#L18, and I've confirmed that protobuf symbols are still present in libtensorflow_framework.so via nm. After examining the linker flags that bazel passes to gcc, https://gist.github.com/bmzhao/f51bbdef50e9db9b24acd5b5acc95080, I discovered that the order of the linker flags was what was causing the undefined reference. See https://eli.thegreenplace.net/2013/07/09/library-order-in-static-linking/ and https://stackoverflow.com/a/12272890. Basically linkers discard the objects they've been asked to link if those objects do not export any symbols that the linker currently has kept track as "undefined". To prove this was the issue, I was able to successfully link after moving the linking shared object flag (-l:libtensorflow_framework.so.2) to the bottom of the flag order, and manually invoking g++. This change uses cc_import to to link against a .so in the "deps" of tf_cc_binary, rather than as the "srcs" of tf_cc_binary. This technique was inspired by the comment here: https://github.com/bazelbuild/bazel/blob/387c610d09b99536f7f5b8ecb883d14ee6063fdd/examples/windows/dll/windows_dll_library.bzl#L47-L48 Successfully built on vanilla Ubuntu 18.04 VM: bmzhao@bmzhao-tf-build-failure-reproing:~/tf-fix/tf$ bazel build -c opt --config=cuda --config=v2 --host_force_python=PY3 //tensorflow/tools/pip_package:build_pip_package Target //tensorflow/tools/pip_package:build_pip_package up-to-date: bazel-bin/tensorflow/tools/pip_package/build_pip_package INFO: Elapsed time: 2067.380s, Critical Path: 828.19s INFO: 12942 processes: 51 remote cache hit, 12891 local. INFO: Build completed successfully, 14877 total actions The root cause might instead be bazelbuild/bazel#7687, which is pending further investigation. PiperOrigin-RevId: 281341817 Change-Id: Ia240eb050d9514ed5ac95b7b5fb7e0e98b7d1e83
Hello! 5caa9e8 is now in master. I've manually tested building with it using bazel 1.1, with the following command:
@dbonner can you confirm if Tensorflow head now builds for you as well? |
Hey @bas-aarts, After double checking with our buildcop, it looks like the current Windows and Mac breakages' root causes are other commits (not 5caa9e8). Therefore, I don't expect this change to be rolled back. |
Is this error related?
|
@alanpurple could you file a separate issue including all relevant information to reproduce the error? Please see https://github.com/tensorflow/tensorflow/blob/master/ISSUE_TEMPLATE.md#system-information |
I am still in the process of building with bazel 1.1.0 and will report back how it goes. |
I believe the undefined symbols errors are caused by 2 different Bazel flags: @bmzhao confirmed that The I am working on setting the default for @dbonner configure.py already sets the max version to 1.1.0. Or is there another configure.py somewhere? |
JFYI |
Bazel's change to legacy_whole_archive behavior is not the cause for TF's linking issues with protobuf. Protobuf's implementation and runtime are correctly being linked into TF here: https://github.com/tensorflow/tensorflow/blob/da5765ebad2e1d3c25d11ee45aceef0b60da499f/tensorflow/core/platform/default/build_config.bzl#L239 and https://github.com/tensorflow/tensorflow/blob/da5765ebad2e1d3c25d11ee45aceef0b60da499f/third_party/protobuf/protobuf.patch#L18, and I've confirmed that protobuf symbols are still present in libtensorflow_framework.so via nm. After examining the linker flags that bazel passes to gcc, https://gist.github.com/bmzhao/f51bbdef50e9db9b24acd5b5acc95080, I discovered that the order of the linker flags was what was causing the undefined reference. See https://eli.thegreenplace.net/2013/07/09/library-order-in-static-linking/ and https://stackoverflow.com/a/12272890. Basically linkers discard the objects they've been asked to link if those objects do not export any symbols that the linker currently has kept track as "undefined". To prove this was the issue, I was able to successfully link after moving the linking shared object flag (-l:libtensorflow_framework.so.2) to the bottom of the flag order, and manually invoking g++. This change uses cc_import to to link against a .so in the "deps" of tf_cc_binary, rather than as the "srcs" of tf_cc_binary. This technique was inspired by the comment here: https://github.com/bazelbuild/bazel/blob/387c610d09b99536f7f5b8ecb883d14ee6063fdd/examples/windows/dll/windows_dll_library.bzl#L47-L48 Successfully built on vanilla Ubuntu 18.04 VM: bmzhao@bmzhao-tf-build-failure-reproing:~/tf-fix/tf$ bazel build -c opt --config=cuda --config=v2 --host_force_python=PY3 //tensorflow/tools/pip_package:build_pip_package Target //tensorflow/tools/pip_package:build_pip_package up-to-date: bazel-bin/tensorflow/tools/pip_package/build_pip_package INFO: Elapsed time: 2067.380s, Critical Path: 828.19s INFO: 12942 processes: 51 remote cache hit, 12891 local. INFO: Build completed successfully, 14877 total actions The root cause might instead be bazelbuild/bazel#7687, which is pending further investigation. PiperOrigin-RevId: 281341817 Change-Id: Ia240eb050d9514ed5ac95b7b5fb7e0e98b7d1e83
@mihaimaruseac |
cc @hlopko to deal with this from Bazel side. |
I also hit a related issue that I need the flag For example: If I do no add the flag, I hit the below error due to linkage change.
|
Please make sure that this is a build/installation issue. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:build_template
System information
Describe the problem
Build fails most of the way in to build.
Provide the exact sequence of commands / steps that you executed before running into the problem
git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow
git checkout -b mybranch (make up a branch to checkout head)
bazel build --config=opt --config=cuda --config=v2 --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" //tensorflow/tools/pip_package:build_pip_package
Here is the last part of the terminal's output (attached text file):
tensorflow_build_fails.txt
Any other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
The text was updated successfully, but these errors were encountered: