Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TF build fails on Serving Docker image. #410

Closed
therealmitchconnors opened this issue Apr 19, 2017 · 7 comments

Comments

@therealmitchconnors
Copy link

commented Apr 19, 2017

I am attempting to follow the steps at https://tensorflow.github.io/serving/docker to get an instance of tf serving up and running. On first pass, the 'bazel test' command returns:
ERROR: /root/.cache/bazel/_bazel_root/01a289b7faaf5ec651fb0e4e35f862a1/external/org_tensorflow/tensorflow/core/kernels/BUILD:2042:1: C++ compilation of rule '@org_tensorflow//tensorflow/core/kernels:svd_op' failed: gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -B/usr/bin -B/usr/bin -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer '-std=c++0x' -MD -MF ... (remaining 98 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 4.

A quick google reveals this is likely due to gcc being at version 4.8, while TF requires 4.9. After upgrading to gcc-4.9, I get
ERROR: /root/.cache/bazel/_bazel_root/01a289b7faaf5ec651fb0e4e35f862a1/external/org_tensorflow/tensorflow/core/BUILD:1353:1: undeclared inclusion(s) in rule '@org_tensorflow//tensorflow/core:framework_internal': this rule is missing dependency declarations for the following files included by 'external/org_tensorflow/tensorflow/core/framework/tracking_allocator.cc': '/usr/lib/gcc/x86_64-linux-gnu/4.9/include/stdarg.h' '/usr/lib/gcc/x86_64-linux-gnu/4.9/include/stddef.h' '/usr/lib/gcc/x86_64-linux-gnu/4.9/include/stdint.h' '/usr/lib/gcc/x86_64-linux-gnu/4.9/include/mmintrin.h' '/usr/lib/gcc/x86_64-linux-gnu/4.9/include/emmintrin.h' ...

I also tried running the build from the gcr.io/tensorflow/tensorflow:latest-devel image which should theoretically have all TF dependencies taken care of, and even there, I get:
ERROR: /root/.cache/bazel/_bazel_root/01a289b7faaf5ec651fb0e4e35f862a1/external/org_tensorflow/tensorflow/core/kernels/BUILD:1988:1: C++ compilation of rule '@org_tensorflow//tensorflow/core/kernels:cholesky_grad' failed: gcc failed: error executing command (cd /root/.cache/bazel/_bazel_root/01a289b7faaf5ec651fb0e4e35f862a1/execroot/serving && \ exec env - \ PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \ /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -B/usr/bin -B/usr/bin -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer '-std=c++0x' -MD -MF bazel-out/local-fastbuild/bin/external/org_tensorflow/tensorflow/core/kernels/_objs/cholesky_grad/external/org_tensorflow/tensorflow/core/kernels/cholesky_grad.pic.d '-frandom-seed=bazel-out/local-fastbuild/bin/external/org_tensorflow/tensorflow/core/kernels/_objs/cholesky_grad/external/org_tensorflow/tensorflow/core/kernels/cholesky_grad.pic.o' -fPIC -DEIGEN_MPL2_ONLY -DSNAPPY -iquote external/org_tensorflow -iquote bazel-out/local-fastbuild/genfiles/external/org_tensorflow -iquote external/bazel_tools -iquote bazel-out/local-fastbuild/genfiles/external/bazel_tools -iquote external/protobuf -iquote bazel-out/local-fastbuild/genfiles/external/protobuf -iquote external/eigen_archive -iquote bazel-out/local-fastbuild/genfiles/external/eigen_archive -iquote external/local_config_sycl -iquote bazel-out/local-fastbuild/genfiles/external/local_config_sycl -iquote external/gif_archive -iquote bazel-out/local-fastbuild/genfiles/external/gif_archive -iquote external/jpeg -iquote bazel-out/local-fastbuild/genfiles/external/jpeg -iquote external/com_googlesource_code_re2 -iquote bazel-out/local-fastbuild/genfiles/external/com_googlesource_code_re2 -iquote external/farmhash_archive -iquote bazel-out/local-fastbuild/genfiles/external/farmhash_archive -iquote external/highwayhash -iquote bazel-out/local-fastbuild/genfiles/external/highwayhash -iquote external/png_archive -iquote bazel-out/local-fastbuild/genfiles/external/png_archive -iquote external/zlib_archive -iquote bazel-out/local-fastbuild/genfiles/external/zlib_archive -iquote external/snappy -iquote bazel-out/local-fastbuild/genfiles/external/snappy -isystem external/bazel_tools/tools/cpp/gcc3 -isystem external/protobuf/src -isystem bazel-out/local-fastbuild/genfiles/external/protobuf/src -isystem external/eigen_archive -isystem bazel-out/local-fastbuild/genfiles/external/eigen_archive -isystem external/gif_archive/lib -isystem bazel-out/local-fastbuild/genfiles/external/gif_archive/lib -isystem external/farmhash_archive/src -isystem bazel-out/local-fastbuild/genfiles/external/farmhash_archive/src -isystem external/png_archive -isystem bazel-out/local-fastbuild/genfiles/external/png_archive -isystem external/zlib_archive -isystem bazel-out/local-fastbuild/genfiles/external/zlib_archive -DEIGEN_AVOID_STL_ARRAY -Iexternal/gemmlowp -Wno-sign-compare -fno-exceptions -msse3 -pthread -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c external/org_tensorflow/tensorflow/core/kernels/cholesky_grad.cc -o bazel-out/local-fastbuild/bin/external/org_tensorflow/tensorflow/core/kernels/_objs/cholesky_grad/external/org_tensorflow/tensorflow/core/kernels/cholesky_grad.pic.o): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 4. gcc: internal compiler error: Killed (program cc1plus)

builds on Tensorflow Serving seem to be passing in Jenkins, so I'm a bit stumped. Is anyone else having these issues?

@markusnagel

This comment has been minimized.

Copy link

commented Apr 19, 2017

I have exactly the same issues :(
After upgrading to gcc-4.9 I get also your issue, and if after the gcc upgrade I reinstall bazel I'm back to the initial issue.
I run my docker on macOS, but I guess that should not make any difference. Any help would be appreciated.

@therealmitchconnors

This comment has been minimized.

Copy link
Author

commented Apr 21, 2017

I was able to get further along in the build process by upping the Docker image to 10 GB of RAM, but it still fails eventually. Considering that Serving's binaries don't seem to be available for download anywhere, I would think the dev team would be interested in keeping it build-able...

@mountaintom

This comment has been minimized.

Copy link
Contributor

commented Apr 21, 2017

Hi, you might want to look at this issue. It may be similar to what you are having an issue with.
#379

Additionally, sometimes, when there are network issues not all the dependencies are loaded when ./configure is run before the build. You may want to run ./configure again and check if the all dependencies loaded message is displayed.

@therealmitchconnors

This comment has been minimized.

Copy link
Author

commented Apr 21, 2017

Thanks, @mountaintom. That's where I got the idea to expand Docker's RAM up to 10 GB. I have also tried limiting the overall RAM usage, and am running one job at a time. I have tried re-running the ./configure script about six times now, and am still unable to get a successful build run...

@mountaintom

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2017

Quick update... Just did a new Docker build from scratch. Doing the build on my Mac ended with an error "Linking of rule '//tensorflow_serving/servables/tensorflow:regressor_test' failed: gcc failed: error executing command"

However, if I build on a cloud machine, with 32GB of RAM, the build works. The RAM usage topped out at about 16GB during the build. No additional local resource parameters were needed.

@therealmitchconnors

This comment has been minimized.

Copy link
Author

commented Apr 24, 2017

Hey @mountaintom, thanks for the research! Given that tensorflow can no longer be built on the average dev box, is there a way we can prioritize #409? If I want to use just tensorflow, it's easy enough to install it from pip or other package managers without building from source. If I want to use the Serving library, though, it appears there is no way to use it without building TF from source first, which is something of a deal-breaker.

This could be resolved either by distributing serving binaries through standard package management, or by allowing serving to build against an existing TF binary, which in turn is already distributed through package managers.

@gautamvasudevan

This comment has been minimized.

Copy link
Collaborator

commented Jul 23, 2018

This should be resolved with the latest docker images, available binaries, and instructions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.