Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undefined reference to google::protobuf::FileDescriptor::DebugString() #32

Closed
grwlf opened this issue Mar 5, 2019 · 23 comments
Closed
Labels
bug Something isn't working stat:awaiting response Awaiting response from user

Comments

@grwlf
Copy link

grwlf commented Mar 5, 2019

I just installed lingvo and it looks like I face the issue with protobuf linking. The nightly-tf seems to be up-to-date

mironov@70e0b410070b:~/lingvo$ python -c "import tensorflow as tf;print(tf.__version__)"
1.14.1-dev20190305

The exact error from bazel build is

mironov@70e0b410070b:~/lingvo$ bazel build -c opt //lingvo:trainer
INFO: Analysed target //lingvo:trainer (22 packages loaded).
INFO: Found 1 target...
ERROR: /workspace/lingvo/lingvo/tools/BUILD:98:1: Linking of rule '//lingvo/tools:generate_proto_def' failed (Exit 1)
bazel-out/host/bin/lingvo/tools/_objs/generate_proto_def/generate_proto_def.o:generate_proto_def.cc:function (anonymous namespace)::WriteDotProto(google::protobuf::FileDescriptor const*, char const*): error: undefined reference to 'google::protobuf::FileDescriptor::DebugString() const'
collect2: error: ld returned 1 exit status
Target //lingvo:trainer failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 3.746s, Critical Path: 2.20s
INFO: 3 processes: 3 processwrapper-sandbox.
FAILED: Build did NOT complete successfully

Could you please check?

@grwlf grwlf changed the title Undefined reference to google::protobuf::FileDescriptor::DebugStrin() Undefined reference to google::protobuf::FileDescriptor::DebugString() Mar 5, 2019
@kdatta
Copy link

kdatta commented Mar 5, 2019

Since DebugString() is to print warning/error info, I commented this line out in generate_proto_def.cc. But other errors follow with protoc not finding tensorflow proto definitions:
lingvo/core/inference_graph.proto:81:14: "tensorflow.DataType" is not defined.
lingvo/core/inference_graph.proto:31:12: "tensorflow.GraphDef" is not defined.
lingvo/core/inference_graph.proto:37:12: "tensorflow.SaverDef" is not defined.

@drpngx
Copy link
Contributor

drpngx commented Mar 5, 2019

Can you re-run with --print_actions --verbose_failures?

@kdatta
Copy link

kdatta commented Mar 5, 2019

bazel build --verbose_failures -c opt //lingvo:trainer
WARNING: Output base '/ec/site/disks/aipg_lab_home_pool_01/kdatta1/.cache/bazel/_bazel_kdatta1/5093b1640050e5eba5263415894f442c' is on NFS. This may lead to surprising failures and undetermined behavior.
INFO: Analysed target //lingvo:trainer (0 packages loaded).
INFO: Found 1 target...
ERROR: /ec/site/disks/aipg_lab_home_pool_01/kdatta1/TensorFlow/lingvo/lingvo/core/BUILD:339:1: Executing genrule //lingvo/core:inference_graph_py_pb2_genpy failed (Exit 1): bash failed: error executing command
  (cd /ec/site/disks/aipg_lab_home_pool_01/kdatta1/.cache/bazel/_bazel_kdatta1/5093b1640050e5eba5263415894f442c/execroot/__main__ && \
  exec env - \
    LD_LIBRARY_PATH=/nfs/pdx/home/kdatta1/MKL-DNN/mklml_lnx_2019.0.3.20190220/lib:/usr/lib64:/nfs/pdx/home/kdatta1/openmpi/lib \
    PATH=/opt/intel/compilers_and_libraries_2018.3.222/linux/bin/intel64:/opt/intel/compilers_and_libraries_2018.3.222/linux/mpi/intel64/bin:/nfs/pdx/home/kdatta1/anaconda2/envs/anaconda2-python-tf-1.12/bin:/nfs/pdx/home/kdatta1/anaconda2/condabin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/nfs/pdx/home/kdatta1/openmpi/bin:/nfs/pdx/home/kdatta1/openmpi/bin \
  /bin/bash -c 'source external/bazel_tools/tools/genrule/genrule-setup.sh;
          mkdir -p bazel-out/k8-opt/genfiles/lingvo/core/tf_proto.$$;
          tar -C bazel-out/k8-opt/genfiles/lingvo/core/tf_proto.$$ -xf bazel-out/host/genfiles/lingvo/tf_protos.tar;
          external/protobuf_protoc/bin/protoc --proto_path=bazel-out/k8-opt/genfiles/lingvo/core/tf_proto.$$ --proto_path=. --python_out=bazel-out/k8-opt/genfiles lingvo/core/inference_graph.proto;
          rm -rf bazel-out/k8-opt/genfiles/lingvo/core/tf_proto.$$
        ')

Use --sandbox_debug to see verbose messages from the sandbox
[libprotobuf WARNING ../../../../../src/google/protobuf/compiler/parser.cc:562] No syntax specified for the proto file: tensorflow/core/framework/graph.proto. Please use 'syntax = "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2 syntax.)
[libprotobuf WARNING ../../../../../src/google/protobuf/compiler/parser.cc:562] No syntax specified for the proto file: tensorflow/core/framework/types.proto. Please use 'syntax = "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2 syntax.)
[libprotobuf WARNING ../../../../../src/google/protobuf/compiler/parser.cc:562] No syntax specified for the proto file: tensorflow/core/protobuf/saver.proto. Please use 'syntax = "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2 syntax.)
lingvo/core/inference_graph.proto:81:14: "tensorflow.DataType" is not defined.
lingvo/core/inference_graph.proto:31:12: "tensorflow.GraphDef" is not defined.
lingvo/core/inference_graph.proto:37:12: "tensorflow.SaverDef" is not defined.
lingvo/core/inference_graph.proto: warning: Import tensorflow/core/protobuf/saver.proto but not used.
lingvo/core/inference_graph.proto: warning: Import tensorflow/core/framework/types.proto but not used.
lingvo/core/inference_graph.proto: warning: Import tensorflow/core/framework/graph.proto but not used.
Target //lingvo:trainer failed to build

@drpngx
Copy link
Contributor

drpngx commented Mar 5, 2019

That's very strange. Could you modify the tool to add this line:

std::cout << "File: " << output_filepath << " = " << dot_proto->DebugString() << std::endl;

@kdatta
Copy link

kdatta commented Mar 5, 2019

How will that help? The build with my current toolchain fails now as it can't find DebugSting()

@drpngx
Copy link
Contributor

drpngx commented Mar 5, 2019

The tool first generates the protos. For some reason, we suspect that the .proto files generated are empty. We want to know if it gets there and why they would be empty. You can check the tarball at bazel-out/host/genfiles/lingvo/tf_protos.tar.

@drpngx
Copy link
Contributor

drpngx commented Mar 5, 2019

OK, reading more carefully, I see what you mean. You can't uncomment that line out. Can go back to the original version, then run bazel with --print_actions --verbose_failures?

@zh794390558
Copy link
Contributor

#23 has same problem.

@drpngx
Copy link
Contributor

drpngx commented Mar 6, 2019

Can you run with --verbose_failures --print_actions? I need to see the command that was used to link. Also --link_opts=-vv. Then the next step is to use nm on generate_proto_def.o and nm the library to find out which symbol is defined and why they don't match.

@drpngx drpngx added the bug Something isn't working label Mar 6, 2019
@zh794390558
Copy link
Contributor

zh794390558 commented Mar 6, 2019

(tf1.12_py3.5) [luban@luban-351 lingvo]$ bazel print_action -c opt //lingvo:trainer_test --verbose_failures                                                                                                        
Starting local Bazel server and connecting to it...
INFO: Analysed target //lingvo:trainer_test (31 packages loaded).
INFO: Found 1 target...
ERROR: /nfs/project/zhanghui/lingvo/lingvo/tools/BUILD:98:1: Linking of rule '//lingvo/tools:generate_proto_def' failed (Exit 1): gcc failed: error executing command 
  (cd /home/luban/.cache/bazel/_bazel_luban/b5ef85f1c360696308ba7ab9000cfd03/execroot/__main__ && \
  exec env - \
    LD_LIBRARY_PATH=/usr/local/lib:/nfs/project/tools/anaconda3/pkgs/cudnn-7.2.1-cuda9.2_0/lib:/nfs/project/tools/anaconda3/pkgs/cudatoolkit-9.2-0/lib:/usr/local/nccl_2.3.7-1+cuda10.0_x86_64/lib/:/usr/local/cuda-9.0/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64: \
    PATH=/nfs/project/tools/openfst1.6.2/bin/:/nfs/project/tools/packages/kaldi-master/src/bin:/nfs/project/tools/packages/kaldi-master/src/fstbin/:/nfs/project/tools/packages/kaldi-master/src/gmmbin/:/nfs/project/tools/packages/kaldi-master/src/featbin/:/nfs/project/tools/packages/kaldi-master/src/lm/:/nfs/project/tools/packages/kaldi-master/src/sgmmbin/:/nfs/project/tools/packages/kaldi-master/src/sgmm2bin/:/nfs/project/tools/packages/kaldi-master/src/fgmmbin/:/nfs/project/tools/packages/kaldi-master/src/latbin/:/nfs/project/tools/packages/kaldi-master/src/nnetbin:/nfs/project/tools/packages/kaldi-master/src/nnet2bin/:/nfs/project/tools/packages/kaldi-master/src/kwsbin:/nfs/project/tools/packages/kaldi-master/tools/sph2pipe_v2.5:/nfs/project/tools/packages/kaldi-master/src/ivectorbin:/tools/kaldi-io/build/bin:/nfs/project/tools/anaconda3/envs/tf1.12_py3.5/bin:/nfs/project/tools/anaconda3/bin:/home/luban/miniconda3/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/home/luban/miniconda3/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/luban/.local/bin:/home/luban/bin:/home/luban/.local/bin:/home/luban/bin \
    PWD=/proc/self/cwd \
  /usr/bin/gcc -o bazel-out/host/bin/lingvo/tools/generate_proto_def '-Wl,-rpath,$ORIGIN/../../_solib_k8/_U@tensorflow_Usolib_S_S_Cframework_Ulib___Uexternal_Stensorflow_Usolib_Stensorflow_Usolib' -Lbazel-out/host/bin/_solib_k8/_U@tensorflow_Usolib_S_S_Cframework_Ulib___Uexternal_Stensorflow_Usolib_Stensorflow_Usolib '-fuse-ld=gold' -Wl,-no-as-needed -Wl,-z,relro,-z,now -B/usr/bin -B/usr/bin -pass-exit-codes -Wl,--gc-sections -Wl,-S -Wl,@bazel-out/host/bin/lingvo/tools/generate_proto_def-2.params)

Use --sandbox_debug to see verbose messages from the sandbox
bazel-out/host/bin/lingvo/tools/_objs/generate_proto_def/generate_proto_def.o:generate_proto_def.cc:function (anonymous namespace)::WriteDotProto(google::protobuf::FileDescriptor const*, char const*): error: undefined reference to 'google::protobuf::FileDescriptor::DebugString() const'
collect2: error: ld returned 1 exit status
Target //lingvo:trainer_test failed to build
INFO: Elapsed time: 94.812s, Critical Path: 7.43s
INFO: 0 processes.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully
(tf1.12_py3.5) [luban@luban-351 lingvo]$ nm bazel-out/host/bin/lingvo/tools/_objs/generate_proto_def/generate_proto_def.o | grep U          
                 U _Unwind_Resume
                 U _ZN10tensorflow19DataType_descriptorEv
                 U _ZN10tensorflow8GraphDef10descriptorEv
                 U _ZN10tensorflow8SaverDef10descriptorEv
                 U _ZNK6google8protobuf14FileDescriptor10dependencyEi
                 U _ZNK6google8protobuf14FileDescriptor11DebugStringEv
                 U _ZNKSt8__detail20_Prime_rehash_policy11_M_next_bktEm
                 U _ZNKSt8__detail20_Prime_rehash_policy14_M_need_rehashEmmm
                 U _ZNSs4_Rep10_M_destroyERKSaIcE
                 U _ZNSs4_Rep10_M_disposeERKSaIcE
                 U _ZNSs4_Rep20_S_empty_rep_storageE
                 U _ZNSs6appendEPKcm
                 U _ZNSs6appendERKSs
                 U _ZNSsC1EPKcRKSaIcE
                 U _ZNSsC1ERKSs
                 U _ZNSt12__basic_fileIcED1Ev
                 U _ZNSt13basic_filebufIcSt11char_traitsIcEE4openEPKcSt13_Ios_Openmode
                 U _ZNSt13basic_filebufIcSt11char_traitsIcEE5closeEv
                 U _ZNSt13basic_filebufIcSt11char_traitsIcEEC1Ev
                 U _ZNSt13basic_filebufIcSt11char_traitsIcEED1Ev
                 U _ZNSt14basic_ofstreamIcSt11char_traitsIcEED1Ev
                 U _ZNSt6localeD1Ev
                 U _ZNSt8ios_base4InitC1Ev
                 U _ZNSt8ios_base4InitD1Ev
                 U _ZNSt8ios_baseC2Ev
                 U _ZNSt8ios_baseD2Ev
                 U _ZNSt9basic_iosIcSt11char_traitsIcEE4initEPSt15basic_streambufIcS1_E
                 U _ZNSt9basic_iosIcSt11char_traitsIcEE5clearESt12_Ios_Iostate
                 U _ZSt11_Hash_bytesPKvmm
                 U _ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l
                 U _ZSt17__throw_bad_allocv
                 U _ZTTSt14basic_ofstreamIcSt11char_traitsIcEE
                 U _ZTVSt13basic_filebufIcSt11char_traitsIcEE
                 U _ZTVSt14basic_ofstreamIcSt11char_traitsIcEE
                 U _ZTVSt15basic_streambufIcSt11char_traitsIcEE
                 U _ZTVSt9basic_iosIcSt11char_traitsIcEE
                 U _ZdlPv
                 U _Znwm
                 U __cxa_atexit
                 U __cxa_begin_catch
                 U __cxa_end_catch
                 U __cxa_rethrow
                 U __dso_handle
                 U __gxx_personality_v0
                 U __stack_chk_fail
                 U memcmp
                 U memset

@drpngx
Copy link
Contributor

drpngx commented Mar 9, 2019

So, it's trying to link:

$ORIGIN/../../_solib_k8/_U@tensorflow_Usolib_S_S_Cframework_Ulib___Uexternal_Stensorflow_Usolib_Stensorflow_Usolib

which maps to

_solib_k8/_U@tensorflow_Usolib_S_S_Cframework_Ulib___Uexternal_Stensorflow_Usolib_Stensorflow_Usolib/libtensorflow_framework.so

which should be a symlink to something like

/usr/local/lib/python2.7/dist-packages/tensorflow/libtensorflow_framework.so

if you nm this, you should find the symbol with a T.

<address> T _ZNK6google8protobuf14FileDescriptor11DebugStringEv

@drpngx drpngx mentioned this issue Mar 9, 2019
@drpngx drpngx added the stat:awaiting response Awaiting response from user label Mar 9, 2019
@nim456
Copy link

nim456 commented Mar 13, 2019

for how many steps the model will be trained ?

@Raviteja1996
Copy link

Was the problem resolved?

@grwlf
Copy link
Author

grwlf commented Mar 27, 2019

Was the problem resolved?

Dear @Raviteja1996 , looks like the problem was resolved. The build log is for reference

mironov@23ba9b0d756c:~/lingvo$ python -c "import tensorflow as tf;print(tf.__version__)"
1.14.1-dev20190327
mironov@23ba9b0d756c:~/lingvo$ bazel build -c opt //lingvo:trainer
WARNING: detected http_proxy set in env, setting no_proxy for localhost.
Starting local Bazel server and connecting to it...
INFO: Analysed target //lingvo:trainer (35 packages loaded).
INFO: Found 1 target...
Target //lingvo:trainer up-to-date:
  bazel-bin/lingvo/trainer
INFO: Elapsed time: 14.628s, Critical Path: 8.28s
INFO: 22 processes: 22 processwrapper-sandbox.
INFO: Build completed successfully, 29 total actions

Thank you.

@drpngx drpngx closed this as completed Mar 29, 2019
@iamxiaoyubei
Copy link

iamxiaoyubei commented May 7, 2019

@drpngx @grwlf @zh794390558 I meet the similar problem. But after reading your discussion, I still have no idea about how to solve it. Could you please tell me more detailed operations step by step? Thank you so much!
Here is my log:
WXWorkCapture_15572025722301

@fangelyuan
Copy link

@iamxiaoyubei I have the same problem, can you tell me how to resolve it

bazel-out/host/bin/lingvo/tools/_objs/generate_proto_def/generate_proto_def.o:generate_proto_def.cc:function main: error: undefined reference to 'tensorflow::GraphDef::descriptor()'
bazel-out/host/bin/lingvo/tools/_objs/generate_proto_def/generate_proto_def.o:generate_proto_def.cc:function main: error: undefined reference to 'tensorflow::SaverDef::descriptor()'

@iamxiaoyubei
Copy link

iamxiaoyubei commented May 30, 2019

@fangelyuan I have a bug with "undefined reference to tensorflow..." because I installed both tensorflow and tf-nightly. Just uninstall tensorflow and install tf-nightly.

In addition, I am using the tf-nightly-gpu version 1.14.1-dev20190426, and I have encountered some other problems when installing the latest version. So I suggest you install this version.

Hope can help.

@fangelyuan
Copy link

@iamxiaoyubei can i add your WECHAT

@fangelyuan
Copy link

@iamxiaoyubei lingvo is based on tensorflow. when you uninstall tensorflow, can it work normal?

@iamxiaoyubei
Copy link

@fangelyuan Sorry, I don't want to add people on WeChat and I don't often read WeChat except after work. So, if you have any questions, you can communicate on github or send an email. If I see and have time, I will respond to you in time.

Yes, it can work. Tf-nightly is the latest version of tensorflow. You can check the intro of tf-nightly online.

@fangelyuan
Copy link

@iamxiaoyubei thanks I success to build trainer. now i test transformer model , i hope you can help me when i encounter problem
thanks

@cranehuang
Copy link

@Raviteja1996 I have the same problem. I build tensorflow (v1.15.0 commit: 590d6ee) from source with gcc 5.4 and bazel 0.25.2. Then build lingvo (commit: 8926ece), the problem occurred. I found that there's a flag "-D_GLIBCXX_USE_CXX11_ABI=0" in the file lingvo/lingvo/lingvo.bzl, so it can not find the symbol "_ZNK6google8protobuf14FileDescriptor11DebugStringEv" in libtensorflow_framework.so, it's "_ZNK6google8protobuf14FileDescriptor11DebugStringB5cxx11Ev" exactly in libtensorflow_framework.so. So, changing "-D_GLIBCXX_USE_CXX11_ABI=0" to "-D_GLIBCXX_USE_CXX11_ABI=1" solve the problem. Hope it can help you.

@zhangqiaorjc
Copy link
Member

i can confirm that this problem still exists at HEAD but it probably only happens in specific build environment

the following one-liner will fix it

zhangqiaorjc@xxx:~/lingvo/lingvo$ git diff
diff --git a/lingvo/lingvo.bzl b/lingvo/lingvo.bzl
index 01928bbc..eb69faa3 100644
--- a/lingvo/lingvo.bzl
+++ b/lingvo/lingvo.bzl
@@ -4,7 +4,7 @@ load("@subpar//:subpar.bzl", "par_binary")

 def tf_copts():
     # TODO(drpng): autoconf this.
-    return ["-D_GLIBCXX_USE_CXX11_ABI=0", "-Wno-sign-compare", "-mavx"] + select({
+    return ["-D_GLIBCXX_USE_CXX11_ABI=1", "-Wno-sign-compare", "-mavx"] + select({
         "//lingvo:cuda": ["-DGOOGLE_CUDA=1"],
         "//conditions:default": [],
     })

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stat:awaiting response Awaiting response from user
Projects
None yet
Development

No branches or pull requests

10 participants