Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to build from source with gcc 7.3.1 #25323

Closed
mrodozov opened this issue Jan 30, 2019 · 20 comments

Comments

@mrodozov
Copy link

commented Jan 30, 2019

Please make sure that this is a build/installation issue. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:build_template

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): CentOS Linux release 7.5.1804 run as image under Singularity and/or Docker on Scientific Linux CERN SLC release 6.10 (Carbon)
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: NA
  • TensorFlow installed from (source or binary): source
  • TensorFlow version: master, on commit 8eb3cbc
  • Python version: 2.7
  • Installed using virtualenv? pip? conda?: manually compiling using bazel
  • Bazel version (if compiling from source): 0.19.2
  • GCC/Compiler version (if compiling from source): GCC 7.3.1
  • CUDA/cuDNN version: NA
  • GPU model and memory: NA

Describe the problem
When compiling with gcc 7.3.1 rev 257125 (from gcc.gnu.org/svn) build chokes like this

ERROR: /tensorflow/tensorflow/core/kernels/BUILD:3206:1: C++ compilation of rule '//tensorflow/core/kernels:reduction_ops' failed (Exit 1)
In file included from external/eigen_archive/unsupported/Eigen/CXX11/Tensor:124:0,
                 from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:1,
                 from ./tensorflow/core/kernels/reduction_ops_common.h:27,
                 from tensorflow/core/kernels/reduction_ops_sum.cc:16:
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h: In static member function 'static void std::_Function_handler<void(_ArgTypes ...), _Functor>::_M_invoke(const std::_Any_data&, _ArgTypes&& ...) [with _Functor = Eigen::internal::TensorExecutor<Expression, Eigen::ThreadPoolDevice, Vectorizable, Tileable>::run(const Expression&, const Eigen::ThreadPoolDevice&) [with Expression = const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 0, 1, long int>, 16, Eigen::MakePointer>, const Eigen::TensorReductionOp<Eigen::internal::SumReducer<std::complex<float> >, const Eigen::IndexList<Eigen::type2index<0> >, const Eigen::TensorMap<Eigen::Tensor<const std::complex<float>, 1, 1, long int>, 16, Eigen::MakePointer>, Eigen::MakePointer> >; bool Vectorizable = true; bool Tileable = false]::<lambda(Eigen::internal::TensorExecutor<const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 0, 1, long int>, 16, Eigen::MakePointer>, const Eigen::TensorReductionOp<Eigen::internal::SumReducer<std::complex<float> >, const Eigen::IndexList<Eigen::type2index<0> >, const Eigen::TensorMap<Eigen::Tensor<const std::complex<float>, 1, 1, long int>, 16, Eigen::MakePointer>, Eigen::MakePointer> >, Eigen::ThreadPoolDevice, true, false>::StorageIndex, Eigen::internal::TensorExecutor<const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 0, 1, long int>, 16, Eigen::MakePointer>, const Eigen::TensorReductionOp<Eigen::internal::SumReducer<std::complex<float> >, const Eigen::IndexList<Eigen::type2index<0> >, const Eigen::TensorMap<Eigen::Tensor<const std::complex<float>, 1, 1, long int>, 16, Eigen::MakePointer>, Eigen::MakePointer> >, Eigen::ThreadPoolDevice, true, false>::StorageIndex)>; _ArgTypes = {long int, long int}]':
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h:801:9: internal compiler error: in emit_move_insn, at expr.c:3698
         values[i] = internal::InnerMostDimReducer<Self, Op>::reduce(*this, firstIndex + i * num_values_to_reduce,
         ^~~~~~
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 74.923s, Critical Path: 51.44s
INFO: 2018 processes: 2018 local.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully

exactly as described here:
sylabs/singularity#2536
When we changed the
gcc to
8.2.0 tag 2d79333765b691fa27d82c1737cb2f00ec6a4499 (from https://github.com/gcc-mirror/gcc)
and to
7.4.0 rev 268351 (from gcc.gnu.org/svn)
the build went fine.

Provide the exact sequence of commands / steps that you executed before running into the problem

export CXX_OPT_FLAGS=-std=c++11
BAZEL_OPTS="--output_user_root ../build build -s --verbose_failures -c opt --cxxopt=${CXX_OPT_FLAGS}"
BAZEL_EXTRA_OPTS="--action_env PYTHONPATH=${PYTHON27PATH} --distinct_host_configuration=false"
bazel $BAZEL_OPTS $BAZEL_EXTRA_OPTS //tensorflow/tools/pip_package:build_pip_package

Any other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

@mihaimaruseac

This comment has been minimized.

Copy link
Contributor

commented Jan 30, 2019

This might be gcc's fault, not Tensorflow's.

@lhelontra

This comment has been minimized.

Copy link

commented Feb 18, 2019

With tf 1.12.0 works fine, but with tf 1.13.0-rc2 I got same issue in my project tensorflow-on-arm

@mrodozov

This comment has been minimized.

Copy link
Author

commented Feb 19, 2019

@lhelontra

This comment has been minimized.

Copy link

commented Feb 20, 2019

@mrodozov works with gcc-8.2 only.

@mrodozov

This comment has been minimized.

Copy link
Author

commented Feb 22, 2019

@lhelontra for me it was working with 8.2 also without the extra flags, so local resources flags didn't help. it's something else (gcc 7.3.x related)

@lissyx

This comment has been minimized.

Copy link
Contributor

commented Mar 13, 2019

@lhelontra for me it was working with 8.2 also without the extra flags, so local resources flags didn't help. it's something else (gcc 7.3.x related)

It looks like we are reproducing that with GCC 7.2 from Linaro as well. Looks to be confirmed as an upstream GCC issue: https://bugzilla.redhat.com/show_bug.cgi?id=1570308

@eLvErDe

This comment has been minimized.

Copy link

commented Mar 19, 2019

Hello,

I can confirm TensorFlow cannot be built anymore on Debian Stable (Stretch) and Ubuntu LTS (Bionic) due to this "internal compiler error":

external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h: In static member function 'static void std::_Function_handler<void(_ArgTypes ...), _Functor>::_M_invoke(const std::_Any_data&, _ArgTypes&& ...) [with _Functor = Eigen::internal::TensorExecutor<Expression, Eigen::ThreadPoolDevice, Vectorizable, Tileable>::run(const Expression&, const Eigen::ThreadPoolDevice&) [with Expression = const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 0, 1, long int>, 16, Eigen::MakePointer>, const Eigen::TensorReductionOp<Eigen::internal::SumReducer<std::complex<float> >, const Eigen::IndexList<Eigen::type2index<0l> >, const Eigen::TensorMap<Eigen::Tensor<const std::complex<float>, 1, 1, long int>, 16, Eigen::MakePointer>, Eigen::MakePointer> >; bool Vectorizable = true; bool Tileable = false]::<lambda(Eigen::internal::TensorExecutor<const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 0, 1, long int>, 16, Eigen::MakePointer>, const Eigen::TensorReductionOp<Eigen::internal::SumReducer<std::complex<float> >, const Eigen::IndexList<Eigen::type2index<0l> >, const Eigen::TensorMap<Eigen::Tensor<const std::complex<float>, 1, 1, long int>, 16, Eigen::MakePointer>, Eigen::MakePointer> >, Eigen::ThreadPoolDevice, true, false>::StorageIndex, Eigen::internal::TensorExecutor<const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 0, 1, long int>, 16, Eigen::MakePointer>, const Eigen::TensorReductionOp<Eigen::internal::SumReducer<std::complex<float> >, const Eigen::IndexList<Eigen::type2index<0l> >, const Eigen::TensorMap<Eigen::Tensor<const std::complex<float>, 1, 1, long int>, 16, Eigen::MakePointer>, Eigen::MakePointer> >, Eigen::ThreadPoolDevice, true, false>::StorageIndex)>; _ArgTypes = {long int, long int}]':
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h:801:9: internal compiler error: in emit_move_insn, at expr.c:3547
         values[i] = internal::InnerMostDimReducer<Self, Op>::reduce(*this, firstIndex + i * num_values_to_reduce,
         ^~~~~~

Debian Stable comes with GCC 6.3 while Ubuntu LTS uses 7.3.

Regards,

@petertorelli

This comment has been minimized.

Copy link

commented Apr 2, 2019

@mrodozov @lhelontra ... what flavor compiler are you using in the module list? I'm using gcc/g++ 8.2 from the ARM website and /opt/arm/gcc-8.2.0_Generic-AArch64_Ubuntu-16.04_aarch64-linux/bin/gcc and I still get the error but on a different file/function (internal compiler error: in emit_move_insn), whilst on branch v1.13.1 (and bazel 0.19.0)

external/eigen_archive/unsupported/Eigen/CXX11/../../../Eigen/src/Core/products/GeneralBlockPanelKernel.h: In member function 'void Eigen::internal::gebp_kernel<LhsScalar, RhsScalar, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs>::operator()(const DataMapper&, const LhsScalar*, const RhsScalar*, Index, Index, Index, Eigen::internal::gebp_kernel<LhsScalar, RhsScalar, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs>::ResScalar, Index, Index, Index, Index) [with LhsScalar = Eigen::half; RhsScalar = Eigen::half; Index = long int; DataMapper = Eigen::internal::blas_data_mapper<Eigen::half, long int, 0, 0>; int mr = 2; int nr = 4; bool ConjugateLhs = false; bool ConjugateRhs = false]':
external/eigen_archive/unsupported/Eigen/CXX11/../../../Eigen/src/Core/products/GeneralBlockPanelKernel.h:1879:3: internal compiler error: in emit_move_insn, at expr.c:3722
   }
   ^
Please submit a full bug report,
with preprocessed source if appropriate.
@mrodozov

This comment has been minimized.

Copy link
Author

commented Apr 2, 2019

7.4 and 8.3 would do

@npanpaliya

This comment has been minimized.

Copy link
Contributor

commented Apr 2, 2019

Even with gcc 7.2, TF 2.0.0-alpha0 seems to be working. Not sure which TF commit has fixed the issue.

@petertorelli

This comment has been minimized.

Copy link

commented Apr 2, 2019

@mrodozov I rebuilt gcc 8.3.0 from sources this morning (6hr compile time!) and still the same error.

So now I've seen the error on TF v1.13.1 on my Ubuntu 16.04 AWS a4.xlarge compiling with native gcc 7.3.0, Linaro's generic gcc-8.2.0 (_Generic-AArch64_Ubuntu-16.04_aarch64-linux), and GCC's natively built 8.3.0. (And yes, I verified that the failing command is, in fact, using the correct compiler path). They all generate the same compiler error on the same line of code.

I suppose I could compile gcc 7.4.0, but I'm beginning to think this is not GCC issue. Would be great if you could make your 1.13.1 aarch64 wheel file available if you've been successful on multiple compilers.

EDIT: I just found @lhelontra's wheel files for 1.13.1 at
https://github.com/lhelontra/tensorflow-on-arm/releases

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
Target: aarch64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 7.3.0-27ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=aarch64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --enable-multiarch --enable-fix-cortex-a53-843419 --disable-werror --enable-checking=release --build=aarch64-linux-gnu --host=aarch64-linux-gnu --target=aarch64-linux-gnu
Thread model: posix
gcc version 7.3.0 (Ubuntu/Linaro 7.3.0-27ubuntu1~18.04) 

$ /usr/local/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/aarch64-unknown-linux-gnu/8.3.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: ./configure
Thread model: posix
gcc version 8.3.0 (GCC) 

$ /opt/arm/gcc-8.2.0_Generic-AArch64_Ubuntu-16.04_aarch64-linux/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/opt/arm/gcc-8.2.0_Generic-AArch64_Ubuntu-16.04_aarch64-linux/bin/gcc
COLLECT_LTO_WRAPPER=/opt/arm/gcc-8.2.0_Generic-AArch64_Ubuntu-16.04_aarch64-linux/bin/../libexec/gcc/aarch64-linux-gnu/8.2.0/lto-wrapper
Target: aarch64-linux-gnu
Configured with: ../configure --prefix /tmp/plgbuild/abs_build/761279_27578/trunk/rel_work/arm_builder/components/builder/work/current_tree/AArch64/opt/arm/gcc-8.2.0_Generic-AArch64_Ubuntu-16.04_aarch64-linux --enable-languages=c,c++,go,fortran --disable-multilib --enable-lto --disable-werror --enable-stage1-checking --enable-checking=release --with-build-config=bootstrap-debug --enable-plugins --enable-gnu-indirect-function --target aarch64-linux-gnu --host aarch64-linux-gnu --enable-multiarch --with-build-sysroot=/tmp/plgbuild/abs_build/761279_27578/trunk/rel_work/arm_builder/components/builder/roots/armada/SysRoots/Ubuntu/Ubuntu_16.04_aarch64.rootfs --with-pkgversion=ARM-build-7
Thread model: posix
gcc version 8.2.0 (ARM-build-7) 
@ariesadel

This comment has been minimized.

Copy link

commented Apr 4, 2019

We also run into the same problem trying to compile TensorFlow in our CentOS 7 with devtoolset-7. Does anyone have a solution for this?

@byronyi

This comment has been minimized.

Copy link
Contributor

commented Apr 8, 2019

It might because the usable memory of your system is too low compared to the number of parallel Bazel compilation tasks.

You could try setting it to a smaller value using: bazel build --jobs 4 //... or bazel build --ram_utilization_factor 50.

EDIT: wrong solution. Adding --config=opt seems to solve the problem with my gcc 6.3. Not sure why though.

@ymodak ymodak removed their assignment Apr 8, 2019

@sathyamoorthyrr

This comment has been minimized.

Copy link

commented Apr 15, 2019

Using Ubuntu 18.04, GCC-7.3.0, faced the same issue on building TF Serving 1.13 with
bazel build -c opt --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" tensorflow_serving/...
and also
bazel build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" tensorflow_serving/...

Similar to @byronyi's comment, adding --config=nativeopt does not give any error.

However, I have also used --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" to let the build be compatible with older ABI. Refer here

My final bazel build command,
bazel build --config=nativeopt --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" tensorflow_serving/...

Build completed successfully.

I installed the .deb file generated under tensorflow_serving/model_servers/ in another system with same configuration(Ubuntu 18.04, GCC-7.3.0). That worked fine.

Installing the same file under the Ubuntu 16.04, GCC-5.4.0 says the 'CXXABI_1.3.11' not found (required by tensorflow_model_server).

Don't --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" in the build command have any effect?
or is that being used incorrectly with --config=nativeopt in the final build command?

@gunan

This comment has been minimized.

Copy link
Member

commented May 11, 2019

I think there is a known gcc 7 issue. It is fixed in the later versions of gcc 7 and 8.

CXXABI issue is different. If you build TF on ubuntu 18, that binary wont run on ubuntu 16 or 14. Using D_GLIBCXX_USE_CXX11_ABI will only make it possible for TF to be able to run properly with binaries that are compiled using older libstdc++ versions.

As the main issue reported is a gcc issue, and fixed with patch releases in gcc, I will close this bug.

@gunan gunan closed this May 11, 2019

@zhuangli1987

This comment has been minimized.

Copy link

commented May 23, 2019

@mrodozov I rebuilt gcc 8.3.0 from sources this morning (6hr compile time!) and still the same error.

So now I've seen the error on TF v1.13.1 on my Ubuntu 16.04 AWS a4.xlarge compiling with native gcc 7.3.0, Linaro's generic gcc-8.2.0 (_Generic-AArch64_Ubuntu-16.04_aarch64-linux), and GCC's natively built 8.3.0. (And yes, I verified that the failing command is, in fact, using the correct compiler path). They all generate the same compiler error on the same line of code.

I suppose I could compile gcc 7.4.0, but I'm beginning to think this is not GCC issue. Would be great if you could make your 1.13.1 aarch64 wheel file available if you've been successful on multiple compilers.

EDIT: I just found @lhelontra's wheel files for 1.13.1 at https://github.com/lhelontra/tensorflow-on-arm/releases

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
Target: aarch64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 7.3.0-27ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=aarch64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --enable-multiarch --enable-fix-cortex-a53-843419 --disable-werror --enable-checking=release --build=aarch64-linux-gnu --host=aarch64-linux-gnu --target=aarch64-linux-gnu
Thread model: posix
gcc version 7.3.0 (Ubuntu/Linaro 7.3.0-27ubuntu1~18.04) 

$ /usr/local/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/aarch64-unknown-linux-gnu/8.3.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: ./configure
Thread model: posix
gcc version 8.3.0 (GCC) 

$ /opt/arm/gcc-8.2.0_Generic-AArch64_Ubuntu-16.04_aarch64-linux/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/opt/arm/gcc-8.2.0_Generic-AArch64_Ubuntu-16.04_aarch64-linux/bin/gcc
COLLECT_LTO_WRAPPER=/opt/arm/gcc-8.2.0_Generic-AArch64_Ubuntu-16.04_aarch64-linux/bin/../libexec/gcc/aarch64-linux-gnu/8.2.0/lto-wrapper
Target: aarch64-linux-gnu
Configured with: ../configure --prefix /tmp/plgbuild/abs_build/761279_27578/trunk/rel_work/arm_builder/components/builder/work/current_tree/AArch64/opt/arm/gcc-8.2.0_Generic-AArch64_Ubuntu-16.04_aarch64-linux --enable-languages=c,c++,go,fortran --disable-multilib --enable-lto --disable-werror --enable-stage1-checking --enable-checking=release --with-build-config=bootstrap-debug --enable-plugins --enable-gnu-indirect-function --target aarch64-linux-gnu --host aarch64-linux-gnu --enable-multiarch --with-build-sysroot=/tmp/plgbuild/abs_build/761279_27578/trunk/rel_work/arm_builder/components/builder/roots/armada/SysRoots/Ubuntu/Ubuntu_16.04_aarch64.rootfs --with-pkgversion=ARM-build-7
Thread model: posix
gcc version 8.2.0 (ARM-build-7) 

did you resolve this issue? I can not compile it with 7.4.0 7.3.0 too. I believe it is not gcc problem. Please do reopen this ticket. @gunan

@npanpaliya

This comment has been minimized.

Copy link
Contributor

commented May 24, 2019

Try with gcc 5.4. It worked for me.

@wormwang

This comment has been minimized.

Copy link

commented Jun 12, 2019

Ubuntu 18.04.2 ,TF 1.13
bazel 0.19.2
gcc 7.4 & gcc 8.3 meet same this error

external/eigen_archive/unsupported/Eigen/CXX11/../../../Eigen/src/Core/products/GeneralBlockPanelKernel.h:1879:3: internal compiler error: in emit_move_insn, at expr.c:3723
}
^

@mrodozov

This comment has been minimized.

Copy link
Author

commented Jun 12, 2019

yes we are building with 7.4.1
@npanpaliya sure, I might as well try assembler or something 😆

@zhuangli1987

This comment has been minimized.

Copy link

commented Jun 12, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.