New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bazel build failure of tensorflow with mkl and specific eigen3 flags #10157

Closed
drormeir opened this Issue May 24, 2017 · 9 comments

Comments

Projects
None yet
7 participants
@drormeir

drormeir commented May 24, 2017

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
    Don't have code yet

  • **OS Platform and Distribution
    Linux Ubuntu 16.04

  • TensorFlow installed from (source or binary):
    git clone latest revision (TensorFlow 1.1)

  • TensorFlow version (use command below):

  • Bazel version (if compiling from source):
    Build label: 0.4.5
    Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
    Build time: Thu Mar 16 12:19:38 2017 (1489666778)
    Build timestamp: 1489666778
    Build timestamp as int: 1489666778

  • CUDA/cuDNN version:
    None

  • GPU model and memory:
    None

  • Exact command to reproduce:
    I don't understand what the above sentence refers to?

Describe the problem

Bazel failed to build/compile tensor flow with mkl support
I added these compiler flags during the configure phase and they caused the compilation error:

-DEIGEN_USE_MKL_ALL -DMKL_ILP64

Source code / logs

Configure phase:


drormeirovich@drormeirovich-xps-13-9360:~/projects/tensorflow$ sudo ./configure 
Please specify the location of python. [Default is /usr/bin/python]: 
Found possible Python library paths:
  /usr/local/lib/python2.7/dist-packages
  /usr/lib/python2.7/dist-packages
Please input the desired Python library path to use.  Default is [/usr/local/lib/python2.7/dist-packages]

Using python library path: /usr/local/lib/python2.7/dist-packages
Do you wish to build TensorFlow with MKL support? [y/N] y
MKL support will be enabled for TensorFlow
Do you wish to download MKL LIB from the web? [Y/n] n
Please specify the location where MKL is installed. [Default is /opt/intel/mklml]: /opt/intel/mkl
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: -O3 -DNDEBUG -fPIC -DEIGEN_USE_MKL_ALL -DMKL_ILP64 -fopenmp -m64 -v -I/opt/intel/mkl/include
Do you wish to use jemalloc as the malloc implementation? [Y/n] 
jemalloc enabled
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] 
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N] 
No Hadoop File System support will be enabled for TensorFlow
Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N] 
No XLA support will be enabled for TensorFlow
Do you wish to build TensorFlow with VERBS support? [y/N] 
No VERBS support will be enabled for TensorFlow
Do you wish to build TensorFlow with OpenCL support? [y/N] 
No OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] 
No CUDA support will be enabled for TensorFlow
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
Configuration finished

Build command...

sudo bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package
Error output:

ERROR: /home/drormeirovich/projects/tensorflow/tensorflow/core/kernels/BUILD:998:1: C++ compilation of rule '//tensorflow/core/kernels:gather_functor' failed: gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -B/usr/bin -B/usr/bin -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG ... (remaining 64 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 6.2.0-3ubuntu11~16.04' --with-bugurl=file:///usr/share/doc/gcc-6/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-6 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-6-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-6-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-6-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 6.2.0 20160901 (Ubuntu 6.2.0-3ubuntu11~16.04) 
COLLECT_GCC_OPTIONS='-U' '_FORTIFY_SOURCE' '-fstack-protector' '-Wall' '-B' '/usr/bin' '-B' '/usr/bin' '-Wunused-but-set-parameter' '-Wno-free-nonheap-object' '-fno-omit-frame-pointer' '-g0' '-O2' '-D' '_FORTIFY_SOURCE=1' '-D' 'NDEBUG' '-ffunction-sections' '-fdata-sections' '-O3' '-D' 'NDEBUG' '-D' 'EIGEN_USE_MKL_ALL' '-D' 'MKL_ILP64' '-v' '-I' '/opt/intel/mkl/include' '-std=c++11' '-O3' '-D' 'NDEBUG' '-D' 'EIGEN_USE_MKL_ALL' '-D' 'MKL_ILP64' '-fopenmp' '-m64' '-v' '-I' '/opt/intel/mkl/include' '-MD' '-MF' 'bazel-out/local-opt/bin/tensorflow/core/kernels/_objs/gather_functor/tensorflow/core/kernels/gather_functor.pic.d' '-frandom-seed=bazel-out/local-opt/bin/tensorflow/core/kernels/_objs/gather_functor/tensorflow/core/kernels/gather_functor.pic.o' '-fPIC' '-D' 'EIGEN_MPL2_ONLY' '-iquote' '.' '-iquote' 'bazel-out/local-opt/genfiles' '-iquote' 'external/bazel_tools' '-iquote' 'bazel-out/local-opt/genfiles/external/bazel_tools' '-iquote' 'external/eigen_archive' '-iquote' 'bazel-out/local-opt/genfiles/external/eigen_archive' '-iquote' 'external/local_config_sycl' '-iquote' 'bazel-out/local-opt/genfiles/external/local_config_sycl' '-isystem' 'external/bazel_tools/tools/cpp/gcc3' '-isystem' 'external/eigen_archive' '-isystem' 'bazel-out/local-opt/genfiles/external/eigen_archive' '-D' 'EIGEN_AVOID_STL_ARRAY' '-I' 'external/gemmlowp' '-Wno-sign-compare' '-fno-exceptions' '-msse3' '-pthread' '-fno-canonical-system-headers' '-Wno-builtin-macro-redefined' '-D' '__DATE__="redacted"' '-D' '__TIMESTAMP__="redacted"' '-D' '__TIME__="redacted"' '-c' '-o' 'bazel-out/local-opt/bin/tensorflow/core/kernels/_objs/gather_functor/tensorflow/core/kernels/gather_functor.pic.o' '-mtune=generic' '-march=x86-64' '-pthread'
 /usr/lib/gcc/x86_64-linux-gnu/6/cc1plus -quiet -v -v -I /opt/intel/mkl/include -I /opt/intel/mkl/include -I external/gemmlowp -imultiarch x86_64-linux-gnu -MD bazel-out/local-opt/bin/tensorflow/core/kernels/_objs/gather_functor/tensorflow/core/kernels/gather_functor.pic.d -MF bazel-out/local-opt/bin/tensorflow/core/kernels/_objs/gather_functor/tensorflow/core/kernels/gather_functor.pic.d -MQ bazel-out/local-opt/bin/tensorflow/core/kernels/_objs/gather_functor/tensorflow/core/kernels/gather_functor.pic.o -D_GNU_SOURCE -D_REENTRANT -U _FORTIFY_SOURCE -D _FORTIFY_SOURCE=1 -D NDEBUG -D NDEBUG -D EIGEN_USE_MKL_ALL -D MKL_ILP64 -D NDEBUG -D EIGEN_USE_MKL_ALL -D MKL_ILP64 -D EIGEN_MPL2_ONLY -D EIGEN_AVOID_STL_ARRAY -D __DATE__="redacted" -D __TIMESTAMP__="redacted" -D __TIME__="redacted" -iquote . -iquote bazel-out/local-opt/genfiles -iquote external/bazel_tools -iquote bazel-out/local-opt/genfiles/external/bazel_tools -iquote external/eigen_archive -iquote bazel-out/local-opt/genfiles/external/eigen_archive -iquote external/local_config_sycl -iquote bazel-out/local-opt/genfiles/external/local_config_sycl -isystem external/bazel_tools/tools/cpp/gcc3 -isystem external/eigen_archive -isystem bazel-out/local-opt/genfiles/external/eigen_archive tensorflow/core/kernels/gather_functor.cc -quiet -dumpbase gather_functor.cc -m64 -msse3 -mtune=generic -march=x86-64 -auxbase-strip bazel-out/local-opt/bin/tensorflow/core/kernels/_objs/gather_functor/tensorflow/core/kernels/gather_functor.pic.o -g0 -O2 -O3 -O3 -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -Wno-sign-compare -Wno-builtin-macro-redefined -std=c++11 -version -fstack-protector -fno-omit-frame-pointer -ffunction-sections -fdata-sections -fopenmp -frandom-seed=bazel-out/local-opt/bin/tensorflow/core/kernels/_objs/gather_functor/tensorflow/core/kernels/gather_functor.pic.o -fPIC -fno-exceptions -fno-canonical-system-headers -Wformat-security -o /tmp/ccloY0qb.s
GNU C++11 (Ubuntu 6.2.0-3ubuntu11~16.04) version 6.2.0 20160901 (x86_64-linux-gnu)
	compiled by GNU C version 6.2.0 20160901, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version 0.15
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring nonexistent directory "external/bazel_tools/tools/cpp/gcc3"
ignoring nonexistent directory "bazel-out/local-opt/genfiles/external/eigen_archive"
ignoring duplicate directory "/usr/include/x86_64-linux-gnu/c++/6"
ignoring nonexistent directory "/usr/local/include/x86_64-linux-gnu"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-linux-gnu/6/../../../../x86_64-linux-gnu/include"
ignoring duplicate directory "/opt/intel/mkl/include"
ignoring nonexistent directory "bazel-out/local-opt/genfiles/external/bazel_tools"
ignoring duplicate directory "external/eigen_archive"
  as it is a non-system directory that duplicates a system directory
ignoring nonexistent directory "bazel-out/local-opt/genfiles/external/eigen_archive"
ignoring nonexistent directory "bazel-out/local-opt/genfiles/external/local_config_sycl"
#include "..." search starts here:
 .
 bazel-out/local-opt/genfiles
 external/bazel_tools
 external/local_config_sycl
#include <...> search starts here:
 /opt/intel/mkl/include
 external/gemmlowp
 external/eigen_archive
 /usr/include/c++/6
 /usr/include/x86_64-linux-gnu/c++/6
 /usr/include/c++/6/backward
 /usr/lib/gcc/x86_64-linux-gnu/6/include
 /usr/local/include
 /usr/lib/gcc/x86_64-linux-gnu/6/include-fixed
 /usr/include/x86_64-linux-gnu
 /usr/include
End of search list.
GNU C++11 (Ubuntu 6.2.0-3ubuntu11~16.04) version 6.2.0 20160901 (x86_64-linux-gnu)
	compiled by GNU C version 6.2.0 20160901, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version 0.15
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 23988a38771f71e4676d56931fe884f7
In file included from external/eigen_archive/unsupported/Eigen/CXX11/../../../Eigen/Core:522:0,
                 from external/eigen_archive/unsupported/Eigen/CXX11/Tensor:14,
                 from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:1,
                 from ./tensorflow/core/kernels/gather_functor.h:19,
                 from tensorflow/core/kernels/gather_functor.cc:52:
external/eigen_archive/unsupported/Eigen/CXX11/../../../Eigen/src/Core/products/GeneralMatrixMatrix_BLAS.h: In static member function 'static void Eigen::internal::general_matrix_matrix_product<Index, double, LhsStorageOrder, ConjugateLhs, double, RhsStorageOrder, ConjugateRhs, 0>::run(Index, Index, Index, const double*, Index, const double*, Index, double*, Index, double, Eigen::internal::level3_blocking<double, double>&, Eigen::internal::GemmParallelInfo<Index>*)':
external/eigen_archive/unsupported/Eigen/CXX11/../../../Eigen/src/Core/products/GeneralMatrixMatrix_BLAS.h:106:1: error: cannot convert 'Eigen::BlasIndex* {aka long long int*}' to 'const int*' for argument '3' to 'int dgemm_(const char*, const char*, const int*, const int*, const int*, const double*, const double*, const int*, const double*, const int*, const double*, double*, const int*)'
 GEMM_SPECIALIZATION(double,   d,  double, d)
 ^

@drormeir drormeir changed the title from bazel build failure of tensorflow with mkl to bazel build failure of tensorflow with mkl and specific eigen3 flags May 24, 2017

@jart

This comment has been minimized.

Show comment
Hide comment
@jart

jart May 26, 2017

Member

See also PointCloudLibrary/pcl#1266

Is the -DEIGEN_USE_MKL_ALL flag necessary? That's probably turning code on that wouldn't otherwise be enabled.

Did you modify the workspace.bzl file to upgrade Eigen?

Member

jart commented May 26, 2017

See also PointCloudLibrary/pcl#1266

Is the -DEIGEN_USE_MKL_ALL flag necessary? That's probably turning code on that wouldn't otherwise be enabled.

Did you modify the workspace.bzl file to upgrade Eigen?

@drormeir

This comment has been minimized.

Show comment
Hide comment
@drormeir

drormeir May 28, 2017

Hi,
The link you mentioned is about a compilation problem in Eigen, but it seems to be related to other type of conflict which I don't have. Moreover, it is written there that problem is solved with a newer version of Eigen while in my case I'm using the latest version (even newer than in that link)

I removed the MKL flag and now the compilation finished, however I wish to work with MKL support and I don't know how to confirm that the Eigen's code generation and compilation of tensorflow integrated successfully with my current project (which already uses MKL and 64bit integers as indexes).

I guess currently I don't have a better way to make sure it's OK besides using breakpoints inside Eigen source code and see if it calls the correct functions in MKL.

Regarding the bazel workspace, I didn't modify any file. Do I need to?

10x!

drormeir commented May 28, 2017

Hi,
The link you mentioned is about a compilation problem in Eigen, but it seems to be related to other type of conflict which I don't have. Moreover, it is written there that problem is solved with a newer version of Eigen while in my case I'm using the latest version (even newer than in that link)

I removed the MKL flag and now the compilation finished, however I wish to work with MKL support and I don't know how to confirm that the Eigen's code generation and compilation of tensorflow integrated successfully with my current project (which already uses MKL and 64bit integers as indexes).

I guess currently I don't have a better way to make sure it's OK besides using breakpoints inside Eigen source code and see if it calls the correct functions in MKL.

Regarding the bazel workspace, I didn't modify any file. Do I need to?

10x!

@jmchen-g

This comment has been minimized.

Show comment
Hide comment
@jmchen-g

jmchen-g May 30, 2017

Contributor

@jart Could you take a look please? Thanks.

Contributor

jmchen-g commented May 30, 2017

@jart Could you take a look please? Thanks.

@gogo40

This comment has been minimized.

Show comment
Hide comment
@gogo40

gogo40 Jun 29, 2017

@drormeir I fixed this problem editing the file Eigen/src/Core/util/MKL_support.h

I added the following code at line 116:

+#ifdef EIGEN_BLAS_INDEX
+typedef EIGEN_BLAS_INDEX BlasIndex;
+#elif defined(EIGEN_USE_MKL)
 typedef MKL_INT BlasIndex;
 #else
 typedef int BlasIndex;
 #endif

and in my CMakeLists I did:

if (USE_MKL)
    find_package(MKL REQUIRED)
    include_directories(${MKL_INCLUDE_DIRS})
    set(EXTRA_LIBRARIES ${EXTRA_LIBRARIES} ${MKL_LIBRARIES})
    add_definitions(-DENABLE_MKL)
    add_definitions(-DMKL_LP64)
    add_definitions(-DEIGEN_BLAS_INDEX=int)
endif (USE_MKL)

This problem is a conflict between MKL_INT and BLAS index type. In my case, the BLAS is using int as index and MKL_INT is a long long. I did this workaround and everything is working fine.

Also, I did a pull request to eigen repository. I`m waiting to be accepted.

https://bitbucket.org/eigen/eigen/pull-requests/320/add-macro-to-define-blas-index-type/diff#comment-39775997

gogo40 commented Jun 29, 2017

@drormeir I fixed this problem editing the file Eigen/src/Core/util/MKL_support.h

I added the following code at line 116:

+#ifdef EIGEN_BLAS_INDEX
+typedef EIGEN_BLAS_INDEX BlasIndex;
+#elif defined(EIGEN_USE_MKL)
 typedef MKL_INT BlasIndex;
 #else
 typedef int BlasIndex;
 #endif

and in my CMakeLists I did:

if (USE_MKL)
    find_package(MKL REQUIRED)
    include_directories(${MKL_INCLUDE_DIRS})
    set(EXTRA_LIBRARIES ${EXTRA_LIBRARIES} ${MKL_LIBRARIES})
    add_definitions(-DENABLE_MKL)
    add_definitions(-DMKL_LP64)
    add_definitions(-DEIGEN_BLAS_INDEX=int)
endif (USE_MKL)

This problem is a conflict between MKL_INT and BLAS index type. In my case, the BLAS is using int as index and MKL_INT is a long long. I did this workaround and everything is working fine.

Also, I did a pull request to eigen repository. I`m waiting to be accepted.

https://bitbucket.org/eigen/eigen/pull-requests/320/add-macro-to-define-blas-index-type/diff#comment-39775997

@drormeir

This comment has been minimized.

Show comment
Hide comment
@drormeir

drormeir commented Jun 29, 2017

Thanks!

@ljanyst

This comment has been minimized.

Show comment
Hide comment
@ljanyst

ljanyst Jul 14, 2017

It is not a bug in TensorFlow; it's a problem with Eigen.

When used in the MKL mode, Eigen's code defines BlasIndex to be whatever MKL_INT is:

typedef MKL_INT BlasIndex;

It is utterly confusing because the definition of MKL_INT varies depending on the size of indices you want to use in your program. However, instead of using the MKL's BLAS interface, which takes the size of MKL_INT into account, Eigen defines its own interface (https://bitbucket.org/eigen/eigen/src/e7027de735d6450c8ede3ce2f65166714c6aef50/Eigen/src/misc/blas.h?at=default&fileviewer=file-view-default) using only 32-bit long ints. This is the reason for the compilation errors you see.

@gogo40 Your patch is unnecessary if you use short indices (-DMKL_LP64) and makes a significant chunk of the Eigen's unit tests fail if you use long indices (-DMKL_ILP64 -DEIGEN_BLAS_INDEX=int). This is because the implementation of BLAS in libmkl_intel_ilp64.so expects 64-bit long indices.

@drormeir if you use -DMKL_LP64 instead of -DMKL_ILP64, TensorFlow will compile fine. I would not expect to see performance improvements though.

ljanyst commented Jul 14, 2017

It is not a bug in TensorFlow; it's a problem with Eigen.

When used in the MKL mode, Eigen's code defines BlasIndex to be whatever MKL_INT is:

typedef MKL_INT BlasIndex;

It is utterly confusing because the definition of MKL_INT varies depending on the size of indices you want to use in your program. However, instead of using the MKL's BLAS interface, which takes the size of MKL_INT into account, Eigen defines its own interface (https://bitbucket.org/eigen/eigen/src/e7027de735d6450c8ede3ce2f65166714c6aef50/Eigen/src/misc/blas.h?at=default&fileviewer=file-view-default) using only 32-bit long ints. This is the reason for the compilation errors you see.

@gogo40 Your patch is unnecessary if you use short indices (-DMKL_LP64) and makes a significant chunk of the Eigen's unit tests fail if you use long indices (-DMKL_ILP64 -DEIGEN_BLAS_INDEX=int). This is because the implementation of BLAS in libmkl_intel_ilp64.so expects 64-bit long indices.

@drormeir if you use -DMKL_LP64 instead of -DMKL_ILP64, TensorFlow will compile fine. I would not expect to see performance improvements though.

@drormeir

This comment has been minimized.

Show comment
Hide comment
@drormeir

drormeir Jul 15, 2017

drormeir commented Jul 15, 2017

@gogo40

This comment has been minimized.

Show comment
Hide comment
@gogo40

gogo40 commented Jul 28, 2017

Thank you @ljanyst

@gunan

This comment has been minimized.

Show comment
Hide comment
@gunan

gunan Sep 21, 2017

Member

Looks like this issue was resolved.
I will close this issue.

It is also obsolete now, as you can just use --config=mkl to build with MKL support.

Member

gunan commented Sep 21, 2017

Looks like this issue was resolved.
I will close this issue.

It is also obsolete now, as you can just use --config=mkl to build with MKL support.

@gunan gunan closed this Sep 21, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment