New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Redhat, Centos and many superclusters #110

Closed
trungnt13 opened this Issue Nov 11, 2015 · 60 comments

Comments

Projects
None yet
@trungnt13

trungnt13 commented Nov 11, 2015

Many clusters system using module with Redhat or Centos < 7 which is glibc 2.12

Since, bazel requires glibc 2.14 and the prebuilt version for linux requires glibc 2.17. It is hopeless to make tensorflow run on clusters.

Referred to this issue reported on bazel: bazelbuild/bazel#583

@vrv

This comment has been minimized.

Show comment
Hide comment
@vrv

vrv Nov 11, 2015

Contributor

Since we depend on bazel, this sounds like a bazel issue.

Feel free to re-open if bazel ends up supporting 2.12 or lower, and we can see what we can do.

Contributor

vrv commented Nov 11, 2015

Since we depend on bazel, this sounds like a bazel issue.

Feel free to re-open if bazel ends up supporting 2.12 or lower, and we can see what we can do.

@vrv vrv closed this Nov 11, 2015

@alantus

This comment has been minimized.

Show comment
Hide comment
@alantus

alantus Nov 30, 2015

Am I right that you depend on bazel only at build-time? If this is true then it can be viewed as something you could do something about too... You could also release static-linked packages that would be very useful to people stuck on clusters with old libraries...

alantus commented Nov 30, 2015

Am I right that you depend on bazel only at build-time? If this is true then it can be viewed as something you could do something about too... You could also release static-linked packages that would be very useful to people stuck on clusters with old libraries...

@urimerhav

This comment has been minimized.

Show comment
Hide comment
@urimerhav

urimerhav Dec 17, 2015

So did anyone find some way past this problem? I'm using redhat 6.4, as is my entire corporation. We're stuck on redhat 6.4. I'm not sure how to end up running tensorflow on such a machine...

urimerhav commented Dec 17, 2015

So did anyone find some way past this problem? I'm using redhat 6.4, as is my entire corporation. We're stuck on redhat 6.4. I'm not sure how to end up running tensorflow on such a machine...

@ttrouill

This comment has been minimized.

Show comment
Hide comment
@ttrouill

ttrouill Jan 20, 2016

I managed to have it running on a CentOS 6.7 : http://stackoverflow.com/a/34897674/1990516 :)
Tell me if it works for you.

Edit: I proposed an alternative solution also: http://stackoverflow.com/a/34900471/1990516

ttrouill commented Jan 20, 2016

I managed to have it running on a CentOS 6.7 : http://stackoverflow.com/a/34897674/1990516 :)
Tell me if it works for you.

Edit: I proposed an alternative solution also: http://stackoverflow.com/a/34900471/1990516

@urimerhav

This comment has been minimized.

Show comment
Hide comment
@urimerhav

urimerhav Jan 20, 2016

Thanks man! I'll look into it as soon as I can.

Sent from my IPhone

On Jan 20, 2016, at 2:41 AM, Théo Trouillon notifications@github.com wrote:

I managed to have it running on a CentOS 6.7 : http://stackoverflow.com/a/34897674/1990516 :)
Tell me if it works for you


Reply to this email directly or view it on GitHub.

urimerhav commented Jan 20, 2016

Thanks man! I'll look into it as soon as I can.

Sent from my IPhone

On Jan 20, 2016, at 2:41 AM, Théo Trouillon notifications@github.com wrote:

I managed to have it running on a CentOS 6.7 : http://stackoverflow.com/a/34897674/1990516 :)
Tell me if it works for you


Reply to this email directly or view it on GitHub.

@altaetran

This comment has been minimized.

Show comment
Hide comment
@altaetran

altaetran Jan 30, 2016

Could you let me know if this worked? I can't seem to get any of these other solutions working.

altaetran commented Jan 30, 2016

Could you let me know if this worked? I can't seem to get any of these other solutions working.

@urimerhav

This comment has been minimized.

Show comment
Hide comment
@urimerhav

urimerhav Feb 22, 2016

Since @ttrouill only says he got it working on 6.7 so I didn't check whether this works on 6.4 actually...

urimerhav commented Feb 22, 2016

Since @ttrouill only says he got it working on 6.7 so I didn't check whether this works on 6.4 actually...

@rdipietro

This comment has been minimized.

Show comment
Hide comment
@rdipietro

rdipietro Feb 29, 2016

Contributor

Both solutions seem to work, but they're not optimal. TensorFlow and Python seem to run okay, but if I try and run IPython, then with the first solution I get an Invalid ELF error, and with the second solution there is a memory leak and IPython continues to absorb all memory with time. I believe that this can also happen with other Python imports that rely on libraries that were compiled using the older libc.

I'd love to see a straightforward how-to-compile-bazel-with-old-glibc guide, but I haven't come across one yet.

Contributor

rdipietro commented Feb 29, 2016

Both solutions seem to work, but they're not optimal. TensorFlow and Python seem to run okay, but if I try and run IPython, then with the first solution I get an Invalid ELF error, and with the second solution there is a memory leak and IPython continues to absorb all memory with time. I believe that this can also happen with other Python imports that rely on libraries that were compiled using the older libc.

I'd love to see a straightforward how-to-compile-bazel-with-old-glibc guide, but I haven't come across one yet.

@rdipietro

This comment has been minimized.

Show comment
Hide comment
@rdipietro

rdipietro Feb 29, 2016

Contributor

Also bazelbuild/bazel#760 is relevant, but it's far from straightforward and my attempt to build bazel using this guide failed. Hopefully within the next few weeks I can give it some more time and continue that thread with the errors I end up getting.

Contributor

rdipietro commented Feb 29, 2016

Also bazelbuild/bazel#760 is relevant, but it's far from straightforward and my attempt to build bazel using this guide failed. Hopefully within the next few weeks I can give it some more time and continue that thread with the errors I end up getting.

ilblackdragon added a commit to ilblackdragon/tensorflow that referenced this issue Mar 9, 2016

@rdipietro

This comment has been minimized.

Show comment
Hide comment
@rdipietro

rdipietro Mar 26, 2016

Contributor

Compiling on CentOS still isn't all that straightforward, but I figured I'd give an overview here for now. This works for me with CentOS 6.7 and gcc 4.8.2, with GPU support (Cuda 7.0, cuDNN 4.0.7). A bazel modification for building with a custom gcc is in the works (bazelbuild/bazel#760) and should help streamline this later on.

The instructions here are specific to my base gcc path of /cm/shared/apps/gcc/4.8.2, but it should work for other configurations just by modifying the base path.

Paths for reference:
gcc path: /cm/shared/apps/gcc/4.8.2/bin/gcc
cpp path: /cm/shared/apps/gcc/4.8.2/bin/cpp
lib64 path: /cm/shared/apps/gcc/4.8.2/lib64
include1 dir: /cm/shared/apps/gcc/4.8.2/lib/gcc/x86_64-unknown-linux-gnu/4.8.2/include
include2 dir: /cm/shared/apps/gcc/4.8.2/lib/gcc/x86_64-unknown-linux-gnu/4.8.2/include-fixed
include3 dir: /cm/shared/apps/gcc/4.8.2/include/c++/4.8.2

Bazel

  1. git clone https://github.com/bazelbuild/bazel.git && cd bazel
  2. Edit tools/cpp/CROSSTOOL
    • Replace all occurrences of /usr/bin/gcc with gcc path
    • Replace all occurrences of /usr/bin/cpp with cpp path
    • After the toolpath containing gcc path, add the lines
      • linker_flag: "-Wl,-Rlib64 path"
      • cxx_builtin_include_directory: "include1 dir"
      • cxx_builtin_include_directory: "include2 dir"
      • cxx_builtin_include_directory: "include3 dir"
  3. Edit scripts/bootstrap/buildenv.sh
    • Comment out atexit "rm -fr ${DIR}"
  4. export EXTRA_BAZEL_ARGS='-s --verbose_failures --ignore_unsupported_sandboxing --genrule_strategy=standalone --spawn_strategy=standalone --jobs 8'
  5. ./compile.sh

TensorFlow

  1. git clone --recurse-submodules https://github.com/tensorflow/tensorflow && cd tensorflow
  2. Edit third_party/gpus/crosstool/CROSSTOOL, making the same changes we made for Bazel. (/usr/bin/gcc etc. likely won't need to be replaced, though.)
  3. Edit third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc
    • Replace all /usr/bin/gcc with gcc path.
    • Undo the temporary "fix" to find as by commenting out the line cmd = 'PATH=' + PREFIX_DIR + ' ' + cmd. (For me, this is necessary to find as.)
  4. ./configure
  5. export EXTRA_BAZEL_ARGS='-s --verbose_failures --ignore_unsupported_sandboxing --genrule_strategy=standalone --spawn_strategy=standalone --jobs 8'
  6. bazel build -c opt --config=cuda --linkopt '-lrt' --copt="-DGPR_BACKWARDS_COMPATIBILITY_MODE" --conlyopt="-std=c99" //tensorflow/tools/pip_package:build_pip_package
    • Why the strange flags? Because otherwise, after building with the older libc, we'll get an error about secure_getenv.
  7. bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/tensorflow_pkg
  8. pip install ~/tensorflow_pkg/*
Contributor

rdipietro commented Mar 26, 2016

Compiling on CentOS still isn't all that straightforward, but I figured I'd give an overview here for now. This works for me with CentOS 6.7 and gcc 4.8.2, with GPU support (Cuda 7.0, cuDNN 4.0.7). A bazel modification for building with a custom gcc is in the works (bazelbuild/bazel#760) and should help streamline this later on.

The instructions here are specific to my base gcc path of /cm/shared/apps/gcc/4.8.2, but it should work for other configurations just by modifying the base path.

Paths for reference:
gcc path: /cm/shared/apps/gcc/4.8.2/bin/gcc
cpp path: /cm/shared/apps/gcc/4.8.2/bin/cpp
lib64 path: /cm/shared/apps/gcc/4.8.2/lib64
include1 dir: /cm/shared/apps/gcc/4.8.2/lib/gcc/x86_64-unknown-linux-gnu/4.8.2/include
include2 dir: /cm/shared/apps/gcc/4.8.2/lib/gcc/x86_64-unknown-linux-gnu/4.8.2/include-fixed
include3 dir: /cm/shared/apps/gcc/4.8.2/include/c++/4.8.2

Bazel

  1. git clone https://github.com/bazelbuild/bazel.git && cd bazel
  2. Edit tools/cpp/CROSSTOOL
    • Replace all occurrences of /usr/bin/gcc with gcc path
    • Replace all occurrences of /usr/bin/cpp with cpp path
    • After the toolpath containing gcc path, add the lines
      • linker_flag: "-Wl,-Rlib64 path"
      • cxx_builtin_include_directory: "include1 dir"
      • cxx_builtin_include_directory: "include2 dir"
      • cxx_builtin_include_directory: "include3 dir"
  3. Edit scripts/bootstrap/buildenv.sh
    • Comment out atexit "rm -fr ${DIR}"
  4. export EXTRA_BAZEL_ARGS='-s --verbose_failures --ignore_unsupported_sandboxing --genrule_strategy=standalone --spawn_strategy=standalone --jobs 8'
  5. ./compile.sh

TensorFlow

  1. git clone --recurse-submodules https://github.com/tensorflow/tensorflow && cd tensorflow
  2. Edit third_party/gpus/crosstool/CROSSTOOL, making the same changes we made for Bazel. (/usr/bin/gcc etc. likely won't need to be replaced, though.)
  3. Edit third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc
    • Replace all /usr/bin/gcc with gcc path.
    • Undo the temporary "fix" to find as by commenting out the line cmd = 'PATH=' + PREFIX_DIR + ' ' + cmd. (For me, this is necessary to find as.)
  4. ./configure
  5. export EXTRA_BAZEL_ARGS='-s --verbose_failures --ignore_unsupported_sandboxing --genrule_strategy=standalone --spawn_strategy=standalone --jobs 8'
  6. bazel build -c opt --config=cuda --linkopt '-lrt' --copt="-DGPR_BACKWARDS_COMPATIBILITY_MODE" --conlyopt="-std=c99" //tensorflow/tools/pip_package:build_pip_package
    • Why the strange flags? Because otherwise, after building with the older libc, we'll get an error about secure_getenv.
  7. bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/tensorflow_pkg
  8. pip install ~/tensorflow_pkg/*
@rdipietro

This comment has been minimized.

Show comment
Hide comment
@rdipietro

rdipietro May 17, 2016

Contributor

Update: Previous process was for a commit after release 7.

Here are necessary changes for commit 1d4fd06, which is after release 8:

  1. You need Bazel 0.2.x. As of this writing, with appropriate environment variables, Bazel at HEAD compiles simply with ./compile.sh. Thank you @damienmg !
  2. You still need to make the above changes to the TensorFlow files, including the changes to CROSSTOOL etc. (For some reason the bazel auto config doesn't work here.)
  3. Edit third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc
    and replace #!/usr/bin/env python2.7 with
    #!/usr/bin/env /full/path/to/python2.7. This is a hack to avoid bazel's confined environment from failing to pick up our custom Python location.
  4. Edit bazel-out/host/bin/tensorflow/swig and add
    export LD_LIBRARY_PATH=custom:paths:$LD_LIBRARY_PATH
    before swig is run. Otherwise swig won't find libraries that exist in our LD_LIBRARY_PATH. This is another hack to get around the confined environment.
  5. Use the same bazel build command from above: bazel build -c opt --config=cuda --linkopt '-lrt' --copt="-DGPR_BACKWARDS_COMPATIBILITY_MODE" --conlyopt="-std=c99" //tensorflow/tools/pip_package:build_pip_package
  6. cd bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles and cp -r __main__/* .. This is a hack associated with #2040.
  7. Finally we can bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/tensorflow_pkg, and
  8. pip install ~/tensorflow_pkg/*
Contributor

rdipietro commented May 17, 2016

Update: Previous process was for a commit after release 7.

Here are necessary changes for commit 1d4fd06, which is after release 8:

  1. You need Bazel 0.2.x. As of this writing, with appropriate environment variables, Bazel at HEAD compiles simply with ./compile.sh. Thank you @damienmg !
  2. You still need to make the above changes to the TensorFlow files, including the changes to CROSSTOOL etc. (For some reason the bazel auto config doesn't work here.)
  3. Edit third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc
    and replace #!/usr/bin/env python2.7 with
    #!/usr/bin/env /full/path/to/python2.7. This is a hack to avoid bazel's confined environment from failing to pick up our custom Python location.
  4. Edit bazel-out/host/bin/tensorflow/swig and add
    export LD_LIBRARY_PATH=custom:paths:$LD_LIBRARY_PATH
    before swig is run. Otherwise swig won't find libraries that exist in our LD_LIBRARY_PATH. This is another hack to get around the confined environment.
  5. Use the same bazel build command from above: bazel build -c opt --config=cuda --linkopt '-lrt' --copt="-DGPR_BACKWARDS_COMPATIBILITY_MODE" --conlyopt="-std=c99" //tensorflow/tools/pip_package:build_pip_package
  6. cd bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles and cp -r __main__/* .. This is a hack associated with #2040.
  7. Finally we can bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/tensorflow_pkg, and
  8. pip install ~/tensorflow_pkg/*
@trungnt13

This comment has been minimized.

Show comment
Hide comment
@trungnt13

trungnt13 May 18, 2016

Our administrator managed to run pip installed tensorflow package on RHEL 6.7 server (without building bazel and tensorflow source), the core idea is get separated newer version of GLIBC version:

Fast test:

import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))
a = tf.constant(10)
b = tf.constant(32)
print(sess.run(a + b))

Note: this approach is only for running python scripts, remember that, every time you add $libcroot to your path all the shell commands are corrupted (i.e you cannot use ls, cd ...). You might use bash -l, or screen, or byobu before you try this so you don't mess up your own session.

trungnt13 commented May 18, 2016

Our administrator managed to run pip installed tensorflow package on RHEL 6.7 server (without building bazel and tensorflow source), the core idea is get separated newer version of GLIBC version:

Fast test:

import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))
a = tf.constant(10)
b = tf.constant(32)
print(sess.run(a + b))

Note: this approach is only for running python scripts, remember that, every time you add $libcroot to your path all the shell commands are corrupted (i.e you cannot use ls, cd ...). You might use bash -l, or screen, or byobu before you try this so you don't mess up your own session.

@rdipietro

This comment has been minimized.

Show comment
Hide comment
@rdipietro

rdipietro May 18, 2016

Contributor

Yeah that was described here a while back, but as you mention, it's not ideal. For example if you run Jupyter it'll lead to a memory leak / crash (at least on the system I tried it with).

Contributor

rdipietro commented May 18, 2016

Yeah that was described here a while back, but as you mention, it's not ideal. For example if you run Jupyter it'll lead to a memory leak / crash (at least on the system I tried it with).

@kskp

This comment has been minimized.

Show comment
Hide comment
@kskp

kskp Jun 23, 2016

@rdipietro

Edit tools/cpp/CROSSTOOL
After the toolpath containing gcc path, add the lines
linker_flag: "-Wl,-Rlib64 path"
cxx_builtin_include_directory: "include1 dir"
cxx_builtin_include_directory: "include2 dir"
cxx_builtin_include_directory: "include3 dir"

Should these lines be added after every occurence of the toolpath containing gcc path- i.e. twice wherever i changed the usr/bin/gcc ?

kskp commented Jun 23, 2016

@rdipietro

Edit tools/cpp/CROSSTOOL
After the toolpath containing gcc path, add the lines
linker_flag: "-Wl,-Rlib64 path"
cxx_builtin_include_directory: "include1 dir"
cxx_builtin_include_directory: "include2 dir"
cxx_builtin_include_directory: "include3 dir"

Should these lines be added after every occurence of the toolpath containing gcc path- i.e. twice wherever i changed the usr/bin/gcc ?

@rdipietro

This comment has been minimized.

Show comment
Hide comment
@rdipietro

rdipietro Jun 23, 2016

Contributor

I don't know what you mean by twice. I'm pretty sure I only inserted those lines once, although if you were to insert them in multiple places it probably wouldn't do any harm.

Contributor

rdipietro commented Jun 23, 2016

I don't know what you mean by twice. I'm pretty sure I only inserted those lines once, although if you were to insert them in multiple places it probably wouldn't do any harm.

@damienmg

This comment has been minimized.

Show comment
Hide comment
@damienmg

damienmg Jun 24, 2016

Member

@kskp @rdipietro : is that still needed with latest version of Bazel? If yes then we have an issue in the C++ detection code.

Member

damienmg commented Jun 24, 2016

@kskp @rdipietro : is that still needed with latest version of Bazel? If yes then we have an issue in the C++ detection code.

@rdipietro

This comment has been minimized.

Show comment
Hide comment
@rdipietro

rdipietro Jun 24, 2016

Contributor

Bazel compiles out of the box as long as I set CC correctly. I haven't tried with TensorFlow 0.9, but as of 0.8, I still had to make manual changes on CentOS.

Contributor

rdipietro commented Jun 24, 2016

Bazel compiles out of the box as long as I set CC correctly. I haven't tried with TensorFlow 0.9, but as of 0.8, I still had to make manual changes on CentOS.

@damienmg

This comment has been minimized.

Show comment
Hide comment
@damienmg

damienmg Jun 24, 2016

Member

You mean change to the cuda crosstool file?

On Fri, Jun 24, 2016 at 2:30 PM Robert DiPietro notifications@github.com
wrote:

Bazel compiles out of the box as long as I set CC correctly. I haven't
tried with TensorFlow 0.9, but as of 0.8, I still had to make manual
changes on CentOS.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#110 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ADjHf_Ij539IWtrDlTebMajjTTI87GSBks5qO83SgaJpZM4Gf6Qp
.

Member

damienmg commented Jun 24, 2016

You mean change to the cuda crosstool file?

On Fri, Jun 24, 2016 at 2:30 PM Robert DiPietro notifications@github.com
wrote:

Bazel compiles out of the box as long as I set CC correctly. I haven't
tried with TensorFlow 0.9, but as of 0.8, I still had to make manual
changes on CentOS.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#110 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ADjHf_Ij539IWtrDlTebMajjTTI87GSBks5qO83SgaJpZM4Gf6Qp
.

@rdipietro

This comment has been minimized.

Show comment
Hide comment
@rdipietro

rdipietro Jun 24, 2016

Contributor

Yes. My May 17 comment above includes everything I needed to do. Specifically, needed to edit CROSSTOOL and needed to introduce two hacks to get bazel to find things outside of its isolated environment.

Contributor

rdipietro commented Jun 24, 2016

Yes. My May 17 comment above includes everything I needed to do. Specifically, needed to edit CROSSTOOL and needed to introduce two hacks to get bazel to find things outside of its isolated environment.

@kskp

This comment has been minimized.

Show comment
Hide comment
@kskp

kskp Jun 24, 2016

@rdipietro Thanks for your reply. Sorry for my ignorance, but could you please tell me what toolpath is? I am assuming it is the block of code where the gcc path had to be changed. I did that twice in the entire file (Since it said to replace all occurences of /usr/bin/gcc). So do I have to add those lines after the block of code where I changed the /usr/bin/gcc path??

kskp commented Jun 24, 2016

@rdipietro Thanks for your reply. Sorry for my ignorance, but could you please tell me what toolpath is? I am assuming it is the block of code where the gcc path had to be changed. I did that twice in the entire file (Since it said to replace all occurences of /usr/bin/gcc). So do I have to add those lines after the block of code where I changed the /usr/bin/gcc path??

@kskp

This comment has been minimized.

Show comment
Hide comment
@kskp

kskp Jun 24, 2016

@rdipietro @damienmg I am not using the latest version of Bazel. I need the 0.2.2b version. I ultimately have to run Syntaxnet on Cent OS 6.7.

kskp commented Jun 24, 2016

@rdipietro @damienmg I am not using the latest version of Bazel. I need the 0.2.2b version. I ultimately have to run Syntaxnet on Cent OS 6.7.

@damienmg

This comment has been minimized.

Show comment
Hide comment
@damienmg

damienmg Jun 24, 2016

Member

0.2.2b should work too.

On Fri, Jun 24, 2016 at 2:55 PM kskp notifications@github.com wrote:

@rdipietro https://github.com/rdipietro @damienmg
https://github.com/damienmg I am not using the latest version of Bazel.
I need the 0.2.2b version. I ultimately have to run Syntaxnet on Cent OS
6.7.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#110 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ADjHf4sjm971bfucsyRzcsZk_rgAUo8qks5qO9ObgaJpZM4Gf6Qp
.

Member

damienmg commented Jun 24, 2016

0.2.2b should work too.

On Fri, Jun 24, 2016 at 2:55 PM kskp notifications@github.com wrote:

@rdipietro https://github.com/rdipietro @damienmg
https://github.com/damienmg I am not using the latest version of Bazel.
I need the 0.2.2b version. I ultimately have to run Syntaxnet on Cent OS
6.7.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#110 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ADjHf4sjm971bfucsyRzcsZk_rgAUo8qks5qO9ObgaJpZM4Gf6Qp
.

@kskp

This comment has been minimized.

Show comment
Hide comment
@kskp

kskp Jun 24, 2016

Oh, I tried a couple of weeks ago but it did not work. Will do it again today. Thanks for your reply.

kskp commented Jun 24, 2016

Oh, I tried a couple of weeks ago but it did not work. Will do it again today. Thanks for your reply.

@damienmg

This comment has been minimized.

Show comment
Hide comment
@damienmg

damienmg Jun 24, 2016

Member

note that you still have to do the CUDA CROSSTOOL modification for doing it with --config cuda

Member

damienmg commented Jun 24, 2016

note that you still have to do the CUDA CROSSTOOL modification for doing it with --config cuda

@kskp

This comment has been minimized.

Show comment
Hide comment
@kskp

kskp Jun 24, 2016

Oops, I am not configuring it with CUDA support. Is it a must?

kskp commented Jun 24, 2016

Oops, I am not configuring it with CUDA support. Is it a must?

@damienmg

This comment has been minimized.

Show comment
Hide comment
@damienmg

damienmg Jun 24, 2016

Member

You need to update tensorflow's CROSSTOOL for CUDA support. @davidzchen is
making the change to TF to have the same support but it has not yet landed.

On Fri, Jun 24, 2016 at 3:12 PM kskp notifications@github.com wrote:

Oops, I am not configuring it with CUDA support. Is it a must?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#110 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ADjHf4akIOCd-PCi8YNs-P7aoopVOUV2ks5qO9ejgaJpZM4Gf6Qp
.

Member

damienmg commented Jun 24, 2016

You need to update tensorflow's CROSSTOOL for CUDA support. @davidzchen is
making the change to TF to have the same support but it has not yet landed.

On Fri, Jun 24, 2016 at 3:12 PM kskp notifications@github.com wrote:

Oops, I am not configuring it with CUDA support. Is it a must?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#110 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ADjHf4akIOCd-PCi8YNs-P7aoopVOUV2ks5qO9ejgaJpZM4Gf6Qp
.

@davidzchen

This comment has been minimized.

Show comment
Hide comment
@davidzchen

davidzchen Jun 27, 2016

Member

FYI Here is the tracking bug for CUDA autoconfiguration: #2873.

It is partially working, but I still need to fix the remaining path issues, such as getting the Python SWIG wrapper to find the tensorflow library correctly.

Member

davidzchen commented Jun 27, 2016

FYI Here is the tracking bug for CUDA autoconfiguration: #2873.

It is partially working, but I still need to fix the remaining path issues, such as getting the Python SWIG wrapper to find the tensorflow library correctly.

jdoerrie added a commit to jdoerrie/tensorflow that referenced this issue Jun 28, 2016

Remove explicit dependency on Python 2.7 from crosstool_wrapper_drive…
…r_is_not_gcc

Many superclusters need to compile TensorFlow from source due to an outdated glibc version (see #110). In @rdipietro's excellent workaround post (tensorflow#110 (comment)) he mentions issues with the referenced Python version in this file. I have issues as well, but of a different nature. In my case the build script is unable to find `libpython2.7.so.1.0`, since only Python 3 is present on my machine. The issue originates from `crosstool_wrapper_driver_is_not_gcc` where the only Python 2.7 exclusive feature is the `print` statement. By `import`ing `print_function from __future__` the explicit dependency can be dropped and both versions of Python are supported.

vrv added a commit that referenced this issue Jun 28, 2016

Remove explicit dependency on Python 2.7 from crosstool_wrapper_drive…
…r_is_not_gcc (#3077)

Many superclusters need to compile TensorFlow from source due to an outdated glibc version (see #110). In @rdipietro's excellent workaround post (#110 (comment)) he mentions issues with the referenced Python version in this file. I have issues as well, but of a different nature. In my case the build script is unable to find `libpython2.7.so.1.0`, since only Python 3 is present on my machine. The issue originates from `crosstool_wrapper_driver_is_not_gcc` where the only Python 2.7 exclusive feature is the `print` statement. By `import`ing `print_function from __future__` the explicit dependency can be dropped and both versions of Python are supported.
@kskp

This comment has been minimized.

Show comment
Hide comment
@kskp

kskp Jun 30, 2016

@damienmg @rdipietro Bazel still does not compile.

Just for your information, my system info:

[sree@ds1 bazel]$ gcc -v
gcc version 4.8.2 20140120 (Red Hat 4.8.2-15) (GCC)

[sree@ds1 bazel]$ ldd --version
ldd (GNU libc) 2.12

[sree@ds1 bazel]$ which gcc
/usr/bin/gcc

[sree@ds1 bazel]$ g++ -v
gcc version 4.8.2 20140120 (Red Hat 4.8.2-15) (GCC)

[sree@ds1 bazel]$ which g++
/usr/bin/g++

To build bazel, I do the following:

  1. git clone https://github.com/bazelbuild/bazel.git
  2. cd bazel
  3. git rag -l
  4. git checkout tags/0.2.2b
  5. ./compile.sh

./compile.sh gives;
[sree@ds1 bazel]$ ./compile.sh
INFO: You can skip this first step by providing a path to the bazel binary as second argument:
INFO: ./compile.sh compile /path/to/bazel
🍃 Building Bazel from scratch......
🍃 Building Bazel with Bazel.
INFO: Found 1 target...
ERROR: /home/sree/bazel/src/main/cpp/util/BUILD:24:1: C++ compilation of rule '//src/main/cpp/util:md5' failed: gcc failed: error executing command
(cd /tmp/bazel.NO5ObMNe/out/bazel &&
exec env -
PATH=/home/sree/anaconda2/bin:/home/sree/bazel:/opt/jdk1.8.0_91/bin:/opt/jdk1.8.0_91/jre/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/sree/bin
/usr/bin/gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -Wl,-z,-relro,-z,now -B/usr/bin -B/usr/bin -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer '-std=c++0x' -iquote . -iquote bazel-out/local-fastbuild/genfiles -iquote external/bazel_tools -iquote bazel-out/local-fastbuild/genfiles/external/bazel_tools -isystem external/bazel_tools/tools/cpp/gcc3 -no-canonical-prefixes -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' '-frandom-seed=bazel-out/local-fastbuild/bin/src/main/cpp/util/_objs/md5/src/main/cpp/util/md5.pic.o' -MD -MF bazel-out/local-fastbuild/bin/src/main/cpp/util/_objs/md5/src/main/cpp/util/md5.pic.d -fPIC -c src/main/cpp/util/md5.cc -o bazel-out/local-fastbuild/bin/src/main/cpp/util/_objs/md5/src/main/cpp/util/md5.pic.o): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
gcc: error trying to exec 'cc1plus': execvp: No such file or directory
Target //src:bazel failed to build
INFO: Elapsed time: 3.147s, Critical Path: 0.07s

Building output/bazel

Am I even doing it right? I did not make any changes to tools/cpp/CROSSTOOL file.

kskp commented Jun 30, 2016

@damienmg @rdipietro Bazel still does not compile.

Just for your information, my system info:

[sree@ds1 bazel]$ gcc -v
gcc version 4.8.2 20140120 (Red Hat 4.8.2-15) (GCC)

[sree@ds1 bazel]$ ldd --version
ldd (GNU libc) 2.12

[sree@ds1 bazel]$ which gcc
/usr/bin/gcc

[sree@ds1 bazel]$ g++ -v
gcc version 4.8.2 20140120 (Red Hat 4.8.2-15) (GCC)

[sree@ds1 bazel]$ which g++
/usr/bin/g++

To build bazel, I do the following:

  1. git clone https://github.com/bazelbuild/bazel.git
  2. cd bazel
  3. git rag -l
  4. git checkout tags/0.2.2b
  5. ./compile.sh

./compile.sh gives;
[sree@ds1 bazel]$ ./compile.sh
INFO: You can skip this first step by providing a path to the bazel binary as second argument:
INFO: ./compile.sh compile /path/to/bazel
🍃 Building Bazel from scratch......
🍃 Building Bazel with Bazel.
INFO: Found 1 target...
ERROR: /home/sree/bazel/src/main/cpp/util/BUILD:24:1: C++ compilation of rule '//src/main/cpp/util:md5' failed: gcc failed: error executing command
(cd /tmp/bazel.NO5ObMNe/out/bazel &&
exec env -
PATH=/home/sree/anaconda2/bin:/home/sree/bazel:/opt/jdk1.8.0_91/bin:/opt/jdk1.8.0_91/jre/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/sree/bin
/usr/bin/gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -Wl,-z,-relro,-z,now -B/usr/bin -B/usr/bin -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer '-std=c++0x' -iquote . -iquote bazel-out/local-fastbuild/genfiles -iquote external/bazel_tools -iquote bazel-out/local-fastbuild/genfiles/external/bazel_tools -isystem external/bazel_tools/tools/cpp/gcc3 -no-canonical-prefixes -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' '-frandom-seed=bazel-out/local-fastbuild/bin/src/main/cpp/util/_objs/md5/src/main/cpp/util/md5.pic.o' -MD -MF bazel-out/local-fastbuild/bin/src/main/cpp/util/_objs/md5/src/main/cpp/util/md5.pic.d -fPIC -c src/main/cpp/util/md5.cc -o bazel-out/local-fastbuild/bin/src/main/cpp/util/_objs/md5/src/main/cpp/util/md5.pic.o): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
gcc: error trying to exec 'cc1plus': execvp: No such file or directory
Target //src:bazel failed to build
INFO: Elapsed time: 3.147s, Critical Path: 0.07s

Building output/bazel

Am I even doing it right? I did not make any changes to tools/cpp/CROSSTOOL file.

@damienmg

This comment has been minimized.

Show comment
Hide comment
@damienmg

damienmg Jun 30, 2016

Member

What does echo | gcc -E -xc++ - -v returns?

Member

damienmg commented Jun 30, 2016

What does echo | gcc -E -xc++ - -v returns?

@kskp

This comment has been minimized.

Show comment
Hide comment
@kskp

kskp Jun 30, 2016

@damienmg

Using built-in specs.
COLLECT_GCC=gcc
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/opt/rh/devtoolset-2/root/usr --mandir=/opt/rh/devtoolset-2/root/usr/share/man --infodir=/opt/rh/devtoolset-2/root/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --enable-languages=c,c++,fortran,lto --enable-plugin --with-linker-hash-style=gnu --enable-initfini-array --disable-libgcj --with-isl=/dev/shm/home/centos/rpm/BUILD/gcc-4.8.2-20140120/obj-x86_64-redhat-linux/isl-install --with-cloog=/dev/shm/home/centos/rpm/BUILD/gcc-4.8.2-20140120/obj-x86_64-redhat-linux/cloog-install --with-mpc=/dev/shm/home/centos/rpm/BUILD/gcc-4.8.2-20140120/obj-x86_64-redhat-linux/mpc-install --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.8.2 20140120 (Red Hat 4.8.2-15) (GCC)
COLLECT_GCC_OPTIONS='-E' '-v' '-mtune=generic' '-march=x86-64'
cc1plus -E -quiet -v -iprefix /usr/bin/../lib/gcc/x86_64-redhat-linux/4.8.2/ -D_GNU_SOURCE - -mtune=generic -march=x86-64
gcc: error trying to exec 'cc1plus': execvp: No such file or directory

kskp commented Jun 30, 2016

@damienmg

Using built-in specs.
COLLECT_GCC=gcc
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/opt/rh/devtoolset-2/root/usr --mandir=/opt/rh/devtoolset-2/root/usr/share/man --infodir=/opt/rh/devtoolset-2/root/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --enable-languages=c,c++,fortran,lto --enable-plugin --with-linker-hash-style=gnu --enable-initfini-array --disable-libgcj --with-isl=/dev/shm/home/centos/rpm/BUILD/gcc-4.8.2-20140120/obj-x86_64-redhat-linux/isl-install --with-cloog=/dev/shm/home/centos/rpm/BUILD/gcc-4.8.2-20140120/obj-x86_64-redhat-linux/cloog-install --with-mpc=/dev/shm/home/centos/rpm/BUILD/gcc-4.8.2-20140120/obj-x86_64-redhat-linux/mpc-install --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.8.2 20140120 (Red Hat 4.8.2-15) (GCC)
COLLECT_GCC_OPTIONS='-E' '-v' '-mtune=generic' '-march=x86-64'
cc1plus -E -quiet -v -iprefix /usr/bin/../lib/gcc/x86_64-redhat-linux/4.8.2/ -D_GNU_SOURCE - -mtune=generic -march=x86-64
gcc: error trying to exec 'cc1plus': execvp: No such file or directory

@kskp

This comment has been minimized.

Show comment
Hide comment
@kskp

kskp Jun 30, 2016

Also, I installed gcc 4.8.2 using the instructions given at: http://superuser.com/questions/381160/how-to-install-gcc-4-7-x-4-8-x-on-centos.
And since nothing happened, I did the following:

sudo mv /usr/bin/gcc /usr/bin/gcc.bak
sudo cp /opt/rh/devtoolset-2/root/usr/bin/gcc /usr/bin/gcc
sudo mv /usr/bin/g++ /usr/bin/g++.bak
sudo cp /opt/rh/devtoolset-2/root/usr/bin/g++ /usr/bin/g++

kskp commented Jun 30, 2016

Also, I installed gcc 4.8.2 using the instructions given at: http://superuser.com/questions/381160/how-to-install-gcc-4-7-x-4-8-x-on-centos.
And since nothing happened, I did the following:

sudo mv /usr/bin/gcc /usr/bin/gcc.bak
sudo cp /opt/rh/devtoolset-2/root/usr/bin/gcc /usr/bin/gcc
sudo mv /usr/bin/g++ /usr/bin/g++.bak
sudo cp /opt/rh/devtoolset-2/root/usr/bin/g++ /usr/bin/g++

@trungnt13

This comment has been minimized.

Show comment
Hide comment
@trungnt13

trungnt13 Jul 1, 2016

Tensorflow is built successfully on CPU, however, it is failed on GPU.

I keep getting this error, even though I modified all path in CROSSTOOL and crosstool_wrapper... from /usr/bin to my gcc path

ERROR: /homeappl/home/trungnt/.cache/bazel/_bazel_trungnt/07601e513c2336fd42387644d3f95e2b/external/protobuf/BUILD:331:1: Linking of rule '@protobuf//:protoc' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command 
  (cd /homeappl/home/trungnt/.cache/bazel/_bazel_trungnt/07601e513c2336fd42387644d3f95e2b/execroot/tensorflow && \
  exec env - \
  third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -o bazel-out/host/bin/external/protobuf/protoc bazel-out/host/bin/external/protobuf/_objs/protoc/external/protobuf/src/google/protobuf/compiler/main.o bazel-out/host/bin/external/protobuf/libprotoc_lib.a bazel-out/host/bin/external/protobuf/libprotobuf.a bazel-out/host/bin/external/protobuf/libprotobuf_lite.a -lpthread -lstdc++ -B/appl/opt/gcc/4.9.1/bin/ -pie -Wl,-z,relro,-z,now -no-canonical-prefixes -pass-exit-codes '-Wl,--build-id=md5' '-Wl,--hash-style=gnu' -Wl,-S -Wl,--gc-sections): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
collect2: fatal error: cannot find 'ld'
compilation terminated.
Target //tensorflow/cc:tutorials_example_trainer failed to build
INFO: Elapsed time: 71.231s, Critical Path: 56.80s

trungnt13 commented Jul 1, 2016

Tensorflow is built successfully on CPU, however, it is failed on GPU.

I keep getting this error, even though I modified all path in CROSSTOOL and crosstool_wrapper... from /usr/bin to my gcc path

ERROR: /homeappl/home/trungnt/.cache/bazel/_bazel_trungnt/07601e513c2336fd42387644d3f95e2b/external/protobuf/BUILD:331:1: Linking of rule '@protobuf//:protoc' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command 
  (cd /homeappl/home/trungnt/.cache/bazel/_bazel_trungnt/07601e513c2336fd42387644d3f95e2b/execroot/tensorflow && \
  exec env - \
  third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -o bazel-out/host/bin/external/protobuf/protoc bazel-out/host/bin/external/protobuf/_objs/protoc/external/protobuf/src/google/protobuf/compiler/main.o bazel-out/host/bin/external/protobuf/libprotoc_lib.a bazel-out/host/bin/external/protobuf/libprotobuf.a bazel-out/host/bin/external/protobuf/libprotobuf_lite.a -lpthread -lstdc++ -B/appl/opt/gcc/4.9.1/bin/ -pie -Wl,-z,relro,-z,now -no-canonical-prefixes -pass-exit-codes '-Wl,--build-id=md5' '-Wl,--hash-style=gnu' -Wl,-S -Wl,--gc-sections): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
collect2: fatal error: cannot find 'ld'
compilation terminated.
Target //tensorflow/cc:tutorials_example_trainer failed to build
INFO: Elapsed time: 71.231s, Critical Path: 56.80s
@mukul1992

This comment has been minimized.

Show comment
Hide comment
@mukul1992

mukul1992 Jul 13, 2016

Hello,
@rdipietro : I am trying to install tensorflow/0.9.0 on a cluster running CentOS 6.7. I have bazel installed already. Here is the error I am getting.
ERROR: /gpfs_home/mdave/.cache/bazel/_bazel_mdave/541ff47a1a214f62e91d090e1e816e43/external/highwayhash/BUILD:17:1: C++ compilation of rule '@highwayhash//:sip_hash' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object ... (remaining 36 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 127. /gpfs/runtime/opt/python/2.7.3/bin/python2.7: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory Target //tensorflow/tools/pip_package:build_pip_package failed to build

I suppose the fix for this, as mentioned by you in the step-wise directions is:
4. Edit bazel-out/host/bin/tensorflow/swig and add export LD_LIBRARY_PATH=custom:paths:$LD_LIBRARY_PATH before swig is run. Otherwise swigwon't find libraries that exist in our LD_LIBRARY_PATH. This is another hack to get around the confined environment.

This should add the python library path while setting up the build but I do not seem to find a file such as bazel-out/host/bin/tensorflow/swig in the source tree, while the bazel-out/host/bin/tensorflow directory does exist. If I create a file named swig myself and add the command to export the paths, it still does not work. Any ideas? I have followed all other steps as mentioned.

Thank you for the help. Your responses here have already been very helpful. :)

mukul1992 commented Jul 13, 2016

Hello,
@rdipietro : I am trying to install tensorflow/0.9.0 on a cluster running CentOS 6.7. I have bazel installed already. Here is the error I am getting.
ERROR: /gpfs_home/mdave/.cache/bazel/_bazel_mdave/541ff47a1a214f62e91d090e1e816e43/external/highwayhash/BUILD:17:1: C++ compilation of rule '@highwayhash//:sip_hash' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object ... (remaining 36 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 127. /gpfs/runtime/opt/python/2.7.3/bin/python2.7: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory Target //tensorflow/tools/pip_package:build_pip_package failed to build

I suppose the fix for this, as mentioned by you in the step-wise directions is:
4. Edit bazel-out/host/bin/tensorflow/swig and add export LD_LIBRARY_PATH=custom:paths:$LD_LIBRARY_PATH before swig is run. Otherwise swigwon't find libraries that exist in our LD_LIBRARY_PATH. This is another hack to get around the confined environment.

This should add the python library path while setting up the build but I do not seem to find a file such as bazel-out/host/bin/tensorflow/swig in the source tree, while the bazel-out/host/bin/tensorflow directory does exist. If I create a file named swig myself and add the command to export the paths, it still does not work. Any ideas? I have followed all other steps as mentioned.

Thank you for the help. Your responses here have already been very helpful. :)

@rdipietro

This comment has been minimized.

Show comment
Hide comment
@rdipietro

rdipietro Jul 14, 2016

Contributor

Hi @mukul1992

Sorry, I'm still working with 0.8, so haven't battled with the 0.9 changes yet.

Here is a suggestion:

Use --verbose_failures with bazel, so that error messages aren't truncated. Then sift through the failure to find out which script ends up causing the issue. Then try putting export LD_LIBRARY_PATH=your:custom:paths:$LD_LIBRARY_PATH at the top of that file.

Hopefully that might help. I don't think I'll have the time to get around to compiling 0.9 for a while. If that doesn't work, I suggest shooting back to 0.8 for now (assuming you don't need something that's cutting edge?).

Contributor

rdipietro commented Jul 14, 2016

Hi @mukul1992

Sorry, I'm still working with 0.8, so haven't battled with the 0.9 changes yet.

Here is a suggestion:

Use --verbose_failures with bazel, so that error messages aren't truncated. Then sift through the failure to find out which script ends up causing the issue. Then try putting export LD_LIBRARY_PATH=your:custom:paths:$LD_LIBRARY_PATH at the top of that file.

Hopefully that might help. I don't think I'll have the time to get around to compiling 0.9 for a while. If that doesn't work, I suggest shooting back to 0.8 for now (assuming you don't need something that's cutting edge?).

@mukul1992

This comment has been minimized.

Show comment
Hide comment
@mukul1992

mukul1992 Jul 14, 2016

Hi @rdipietro , thanks for replying.

So, I switched back to 0.8. I am now using Bazel 0.3.0 (any previous version which would work better?).
Here is the output. I am just including the ERROR part which is in Bold. Again, I did complete other steps. I cannot figure out where to add the LD_LIBRARY_PATH thing so that it picks up the libpython library.

Output:

[mdave@login001 tensorflow]$ bazel build -c opt --config=cuda --linkopt '-lrt' --copt="-DGPR_BACKWARDS_COMPATIBILITY_MODE" --conlyopt="-std=c99" //tensorflow/tools/pip_package:build_pip_package -s --verbose_failures --ignore_unsupported_sandboxing --genrule_strategy=standalone --spawn_strategy=standalone
Warning: ignoring LD_PRELOAD in environment.

INFO: Found 1 target...

@re2//:re2 [action 'Compiling external/re2/re2/compile.cc [for host]']

.(cd /gpfs_home/mdave/.cache/bazel/bazel_mdave/c9818020e0087a4155dff2f5c73aa150/execroot/tensorflow &&
exec env -
PATH=/gpfs/runtime/opt/git/2.2.1/bin:/gpfs/runtime/opt/gcc/4.9.2/bin:/gpfs/runtime/opt/java/8u66/bin:/gpfs/runtime/opt/bazel/0.3.0/bin:/gpfs/runtime/opt/matlab/R2014a/bin:/gpfs/runtime/opt/perl/5.18.2/bin:/gpfs/runtime/opt/python/2.7.3/bin:/gpfs/runtime/opt/intel/2013.1.106/bin:/gpfs/runtime/opt/centos-updates/6.3/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/ibutils/bin:/gpfs/runtime/bin:/users/mdave/bin
third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections -g0 '-std=c++11' '-frandom-seed=bazel-out/host/bin/external/re2/objs/re2/external/re2/re2/compile.o' -iquote external/re2 -iquote bazel-out/host/genfiles/external/re2 -iquote external/bazel_tools -iquote bazel-out/host/genfiles/external/bazel_tools -isystem external/re2 -isystem bazel-out/host/genfiles/external/re2 -isystem external/bazel_tools/tools/cpp/gcc3 -no-canonical-prefixes -Wno-builtin-macro-redefined '-D__DATE
="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fno-canonical-system-headers -MD -MF bazel-out/host/bin/external/re2/_objs/re2/external/re2/re2/compile.d -c external/re2/re2/compile.cc -o bazel-out/host/bin/external/re2/_objs/re2/external/re2/re2/compile.o)

ERROR: /gpfs_home/mdave/.cache/bazel/bazel_mdave/c9818020e0087a4155dff2f5c73aa150/external/re2/BUILD:9:1: C++ compilation of rule '@re2//:re2' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command
(cd /gpfs_home/mdave/.cache/bazel/bazel_mdave/c9818020e0087a4155dff2f5c73aa150/execroot/tensorflow &&
exec env -
PATH=/gpfs/runtime/opt/git/2.2.1/bin:/gpfs/runtime/opt/gcc/4.9.2/bin:/gpfs/runtime/opt/java/8u66/bin:/gpfs/runtime/opt/bazel/0.3.0/bin:/gpfs/runtime/opt/matlab/R2014a/bin:/gpfs/runtime/opt/perl/5.18.2/bin:/gpfs/runtime/opt/python/2.7.3/bin:/gpfs/runtime/opt/intel/2013.1.106/bin:/gpfs/runtime/opt/centos-updates/6.3/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/ibutils/bin:/gpfs/runtime/bin:/users/mdave/bin
third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections -g0 '-std=c++11' '-frandom-seed=bazel-out/host/bin/external/re2/objs/re2/external/re2/re2/compile.o' -iquote external/re2 -iquote bazel-out/host/genfiles/external/re2 -iquote external/bazel_tools -iquote bazel-out/host/genfiles/external/bazel_tools -isystem external/re2 -isystem bazel-out/host/genfiles/external/re2 -isystem external/bazel_tools/tools/cpp/gcc3 -no-canonical-prefixes -Wno-builtin-macro-redefined '-D__DATE
="redacted"' '-D__TIMESTAMP
_="redacted"' '-D__TIME__="redacted"' -fno-canonical-system-headers -MD -MF bazel-out/host/bin/external/re2/_objs/re2/external/re2/re2/compile.d -c external/re2/re2/compile.cc -o bazel-out/host/bin/external/re2/_objs/re2/external/re2/re2/compile.o): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 127.
/gpfs/runtime/opt/python/2.7.3/bin/python2.7: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory

Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 13.883s, Critical Path: 5.22s

mukul1992 commented Jul 14, 2016

Hi @rdipietro , thanks for replying.

So, I switched back to 0.8. I am now using Bazel 0.3.0 (any previous version which would work better?).
Here is the output. I am just including the ERROR part which is in Bold. Again, I did complete other steps. I cannot figure out where to add the LD_LIBRARY_PATH thing so that it picks up the libpython library.

Output:

[mdave@login001 tensorflow]$ bazel build -c opt --config=cuda --linkopt '-lrt' --copt="-DGPR_BACKWARDS_COMPATIBILITY_MODE" --conlyopt="-std=c99" //tensorflow/tools/pip_package:build_pip_package -s --verbose_failures --ignore_unsupported_sandboxing --genrule_strategy=standalone --spawn_strategy=standalone
Warning: ignoring LD_PRELOAD in environment.

INFO: Found 1 target...

@re2//:re2 [action 'Compiling external/re2/re2/compile.cc [for host]']

.(cd /gpfs_home/mdave/.cache/bazel/bazel_mdave/c9818020e0087a4155dff2f5c73aa150/execroot/tensorflow &&
exec env -
PATH=/gpfs/runtime/opt/git/2.2.1/bin:/gpfs/runtime/opt/gcc/4.9.2/bin:/gpfs/runtime/opt/java/8u66/bin:/gpfs/runtime/opt/bazel/0.3.0/bin:/gpfs/runtime/opt/matlab/R2014a/bin:/gpfs/runtime/opt/perl/5.18.2/bin:/gpfs/runtime/opt/python/2.7.3/bin:/gpfs/runtime/opt/intel/2013.1.106/bin:/gpfs/runtime/opt/centos-updates/6.3/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/ibutils/bin:/gpfs/runtime/bin:/users/mdave/bin
third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections -g0 '-std=c++11' '-frandom-seed=bazel-out/host/bin/external/re2/objs/re2/external/re2/re2/compile.o' -iquote external/re2 -iquote bazel-out/host/genfiles/external/re2 -iquote external/bazel_tools -iquote bazel-out/host/genfiles/external/bazel_tools -isystem external/re2 -isystem bazel-out/host/genfiles/external/re2 -isystem external/bazel_tools/tools/cpp/gcc3 -no-canonical-prefixes -Wno-builtin-macro-redefined '-D__DATE
="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fno-canonical-system-headers -MD -MF bazel-out/host/bin/external/re2/_objs/re2/external/re2/re2/compile.d -c external/re2/re2/compile.cc -o bazel-out/host/bin/external/re2/_objs/re2/external/re2/re2/compile.o)

ERROR: /gpfs_home/mdave/.cache/bazel/bazel_mdave/c9818020e0087a4155dff2f5c73aa150/external/re2/BUILD:9:1: C++ compilation of rule '@re2//:re2' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command
(cd /gpfs_home/mdave/.cache/bazel/bazel_mdave/c9818020e0087a4155dff2f5c73aa150/execroot/tensorflow &&
exec env -
PATH=/gpfs/runtime/opt/git/2.2.1/bin:/gpfs/runtime/opt/gcc/4.9.2/bin:/gpfs/runtime/opt/java/8u66/bin:/gpfs/runtime/opt/bazel/0.3.0/bin:/gpfs/runtime/opt/matlab/R2014a/bin:/gpfs/runtime/opt/perl/5.18.2/bin:/gpfs/runtime/opt/python/2.7.3/bin:/gpfs/runtime/opt/intel/2013.1.106/bin:/gpfs/runtime/opt/centos-updates/6.3/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/ibutils/bin:/gpfs/runtime/bin:/users/mdave/bin
third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections -g0 '-std=c++11' '-frandom-seed=bazel-out/host/bin/external/re2/objs/re2/external/re2/re2/compile.o' -iquote external/re2 -iquote bazel-out/host/genfiles/external/re2 -iquote external/bazel_tools -iquote bazel-out/host/genfiles/external/bazel_tools -isystem external/re2 -isystem bazel-out/host/genfiles/external/re2 -isystem external/bazel_tools/tools/cpp/gcc3 -no-canonical-prefixes -Wno-builtin-macro-redefined '-D__DATE
="redacted"' '-D__TIMESTAMP
_="redacted"' '-D__TIME__="redacted"' -fno-canonical-system-headers -MD -MF bazel-out/host/bin/external/re2/_objs/re2/external/re2/re2/compile.d -c external/re2/re2/compile.cc -o bazel-out/host/bin/external/re2/_objs/re2/external/re2/re2/compile.o): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 127.
/gpfs/runtime/opt/python/2.7.3/bin/python2.7: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory

Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 13.883s, Critical Path: 5.22s

@kskp

This comment has been minimized.

Show comment
Hide comment
@kskp

kskp Jul 21, 2016

@rdipietro Hi, I have tried everything you gave here- changed the CROSSTOOL files and everything but it does not work. I started fresh again and believe I have bazel working. Can you please look at my description here and suggest something. Thanks a lot!

kskp commented Jul 21, 2016

@rdipietro Hi, I have tried everything you gave here- changed the CROSSTOOL files and everything but it does not work. I started fresh again and believe I have bazel working. Can you please look at my description here and suggest something. Thanks a lot!

@rdipietro

This comment has been minimized.

Show comment
Hide comment
@rdipietro

rdipietro Jul 21, 2016

Contributor

I really don't know what to suggest. Other than perhaps asking TensorFlow
to build binaries for CentOS 6.7. I think this would save a lot of people a
lot of trouble and would repeatedly save all this trouble each new release,
but I don't know if they're willing to do it.

On Thu, Jul 21, 2016 at 11:21 AM, kskp notifications@github.com wrote:

@rdipietro https://github.com/rdipietro Hi, I have tried everything you
gave here- changed the CROSSTOOL files and everything but it does not work.
I started fresh again and believe I have bazel working. Can you please look
at my description here tensorflow/models#276
and suggest something. Thanks a lot!


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#110 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AE6XX5jGX-7ZS0arN1p7eyvJNSGB4QLjks5qX46DgaJpZM4Gf6Qp
.

Contributor

rdipietro commented Jul 21, 2016

I really don't know what to suggest. Other than perhaps asking TensorFlow
to build binaries for CentOS 6.7. I think this would save a lot of people a
lot of trouble and would repeatedly save all this trouble each new release,
but I don't know if they're willing to do it.

On Thu, Jul 21, 2016 at 11:21 AM, kskp notifications@github.com wrote:

@rdipietro https://github.com/rdipietro Hi, I have tried everything you
gave here- changed the CROSSTOOL files and everything but it does not work.
I started fresh again and believe I have bazel working. Can you please look
at my description here tensorflow/models#276
and suggest something. Thanks a lot!


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#110 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AE6XX5jGX-7ZS0arN1p7eyvJNSGB4QLjks5qX46DgaJpZM4Gf6Qp
.

@kskp

This comment has been minimized.

Show comment
Hide comment
@kskp

kskp Jul 21, 2016

@rdipietro Sorry, but didn't you mention you had tensorflow running on centos 6.7 and gcc 4.8.2? Were you able to run Syntaxnet also? I am stuck with the Centos 6.6 cluster and need to get Syntaxnet running on this. It works fine on Centos 7. :(

kskp commented Jul 21, 2016

@rdipietro Sorry, but didn't you mention you had tensorflow running on centos 6.7 and gcc 4.8.2? Were you able to run Syntaxnet also? I am stuck with the Centos 6.6 cluster and need to get Syntaxnet running on this. It works fine on Centos 7. :(

@cirocavani

This comment has been minimized.

Show comment
Hide comment
@cirocavani

cirocavani Aug 25, 2016

@kskp I created a Dockerfile that compiles TensorFlow 0.9 CPU for CentOS 6, I tested in CentOS 6 and RedHat EL 6.5. You can use a standalone machine to generate the TensorFlow Package and test in you site. (your standalone machine will need to have Docker, I tested in linux and macOS with Docker for Mac installed)

https://github.com/cirocavani/tensorflow-poc/tree/master/tensorflow_centos6

(main.sh is the procedure script)

I did also an installer for TensorFlow with miniconda2 to run in Red Hat 6.5 without any pre-requirement software.

https://github.com/cirocavani/tensorflow-poc/tree/master/tensorflow_installer

(main.sh is the procedure script)

This procedure creates the installer file tensorflow.sh with Miniconda2, TensorFlow 0.9, deps and python program (executing this files will install Miniconda, install TensorFlow and run the training script).

My main case is to run TensorFlow in Hadoop (Red Hat EL 6.5), there is another POC for this:

https://github.com/cirocavani/tensorflow-poc/tree/master/yarn_training

With this setup, I am running the TF Learn's Wide and Deep Example in Hadoop.

cirocavani commented Aug 25, 2016

@kskp I created a Dockerfile that compiles TensorFlow 0.9 CPU for CentOS 6, I tested in CentOS 6 and RedHat EL 6.5. You can use a standalone machine to generate the TensorFlow Package and test in you site. (your standalone machine will need to have Docker, I tested in linux and macOS with Docker for Mac installed)

https://github.com/cirocavani/tensorflow-poc/tree/master/tensorflow_centos6

(main.sh is the procedure script)

I did also an installer for TensorFlow with miniconda2 to run in Red Hat 6.5 without any pre-requirement software.

https://github.com/cirocavani/tensorflow-poc/tree/master/tensorflow_installer

(main.sh is the procedure script)

This procedure creates the installer file tensorflow.sh with Miniconda2, TensorFlow 0.9, deps and python program (executing this files will install Miniconda, install TensorFlow and run the training script).

My main case is to run TensorFlow in Hadoop (Red Hat EL 6.5), there is another POC for this:

https://github.com/cirocavani/tensorflow-poc/tree/master/yarn_training

With this setup, I am running the TF Learn's Wide and Deep Example in Hadoop.

@zym1010

This comment has been minimized.

Show comment
Hide comment
@zym1010

zym1010 Sep 30, 2016

I have succeeded in compiling a GPU, Python 3.5 version of TensorFlow 0.10.0 on a CentOS 6 Docker, and it ran well on our university's CentOS 6 cluster. Check https://github.com/leelabcnbc/DevOps/tree/master/Docker/tensorflow/0.10.0/centos6/py35. Basically, it's replacing some hardcoded lines in CROSSTOOL-related items, and adding -lm to everything to prevent errors like #2291. I think Google can make compiling TensorFlow on CentOS less frustrating, if they make some hardcoded stuff link to correct locations.

zym1010 commented Sep 30, 2016

I have succeeded in compiling a GPU, Python 3.5 version of TensorFlow 0.10.0 on a CentOS 6 Docker, and it ran well on our university's CentOS 6 cluster. Check https://github.com/leelabcnbc/DevOps/tree/master/Docker/tensorflow/0.10.0/centos6/py35. Basically, it's replacing some hardcoded lines in CROSSTOOL-related items, and adding -lm to everything to prevent errors like #2291. I think Google can make compiling TensorFlow on CentOS less frustrating, if they make some hardcoded stuff link to correct locations.

@i3v

This comment has been minimized.

Show comment
Hide comment
@i3v

i3v Dec 7, 2016

I've just managed to build tensorflow 0.12rc0 on CentOS6.5, which only had gcc-4.4.7 compiler by default, without having root privileges. (At least, it's successfully passing most simple tests, like this one).

In short, I had to:

  1. Build newer gcc, hardcoding paths to as,ld and nm (a workaround for gcc: error trying to exec 'as': execvp: No such file or directory)

  2. Since I've used gcc, installed to my own $HOME, I had to explicitly specify correct linker library directories here (a workaround for version 'GLIBCXX_3.4.20' not found (required by bazel-out/host/bin/external/protobuf/protoc))

  3. Add -lrt and -lm linker flags to the same place (just like suggested by @zym1010)

Same story, with few more details.

i3v commented Dec 7, 2016

I've just managed to build tensorflow 0.12rc0 on CentOS6.5, which only had gcc-4.4.7 compiler by default, without having root privileges. (At least, it's successfully passing most simple tests, like this one).

In short, I had to:

  1. Build newer gcc, hardcoding paths to as,ld and nm (a workaround for gcc: error trying to exec 'as': execvp: No such file or directory)

  2. Since I've used gcc, installed to my own $HOME, I had to explicitly specify correct linker library directories here (a workaround for version 'GLIBCXX_3.4.20' not found (required by bazel-out/host/bin/external/protobuf/protoc))

  3. Add -lrt and -lm linker flags to the same place (just like suggested by @zym1010)

Same story, with few more details.

@yliu120

This comment has been minimized.

Show comment
Hide comment
@yliu120

yliu120 Dec 17, 2016

I built the latest Tensorflow (github master branch) with GPU support on a supercomputing center (CentOS 6.7 with gcc 4.9.2/Generally with a customized cc tool chain). I pointed out some of environment variables settings that are necessary for a success built. Just to document here for future reference:

http://biophysics.med.jhmi.edu/~yliu120/tensorflow.html

yliu120 commented Dec 17, 2016

I built the latest Tensorflow (github master branch) with GPU support on a supercomputing center (CentOS 6.7 with gcc 4.9.2/Generally with a customized cc tool chain). I pointed out some of environment variables settings that are necessary for a success built. Just to document here for future reference:

http://biophysics.med.jhmi.edu/~yliu120/tensorflow.html

@VittalP

This comment has been minimized.

Show comment
Hide comment
@VittalP

VittalP Jan 23, 2017

Thanks @rdipietro ! I have been able to successfully install r0.12 with Bazel 0.4.3 on a cluster. Some of your suggestions needed to be modified to cater to the changes in the new version of TF and Bazel. But, your suggestions provided a solid starting point. When I get the time, I will write up the changes that I had to make.

VittalP commented Jan 23, 2017

Thanks @rdipietro ! I have been able to successfully install r0.12 with Bazel 0.4.3 on a cluster. Some of your suggestions needed to be modified to cater to the changes in the new version of TF and Bazel. But, your suggestions provided a solid starting point. When I get the time, I will write up the changes that I had to make.

@rdipietro

This comment has been minimized.

Show comment
Hide comment
@rdipietro

rdipietro Jan 23, 2017

Contributor

You're welcome @VittalP :)

I have an updated set of notes that works as of 1.0.0 alpha:

First of all Bazel finally just works. Can download the newest 0.4.x source code (dist zip version), run ./compile.sh, then add the printed output path to PATH.

TensorFlow unfortunately still doesn't just work. So (replacing my paths with yours):

  1. In configure, replace bazel clean --expunge with bazel clean --expunge_async

  2. In third_party/gpus/crosstool/CROSSTOOL.tpl, replace all occurrences of /usr/bin/cpp with /cm/shared/apps/gcc/4.8.2/bin/cpp

  3. In third_party/gpus/crosstool/CROSSTOOL.tpl, after the line -B/usr/bin/, add the lines

linker_flag: "-Wl,-R/cm/shared/apps/gcc/4.8.2/lib64"
cxx_builtin_include_directory: "/cm/shared/apps/gcc/4.8.2/lib/gcc/x86_64-unknown-linux-gnu/4.8.2/include"
cxx_builtin_include_directory: "/cm/shared/apps/gcc/4.8.2/lib/gcc/x86_64-unknown-linux-gnu/4.8.2/include-fixed"
cxx_builtin_include_directory: "/cm/shared/apps/gcc/4.8.2/include/c++/4.8.2"

  1. In third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl, replace NVCC_PATH = CURRENT_DIR + '/../../../cuda/bin/nvcc' with NVCC_PATH = ('/cm/shared/apps/cuda/7.5/bin/nvcc')

  2. In third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl, replace LLVM_HOST_COMPILER_PATH = ('/usr/bin/gcc') with LLVM_HOST_COMPILER_PATH = ('/cm/shared/apps/gcc/4.8.2/bin/gcc')

  3. In third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl, comment out the line cmd = 'PATH=' + PREFIX_DIR + ' ' + cmd

I configured with cuda 7.5, cudnn 5, compute compatibility 3.5 and built with bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

Contributor

rdipietro commented Jan 23, 2017

You're welcome @VittalP :)

I have an updated set of notes that works as of 1.0.0 alpha:

First of all Bazel finally just works. Can download the newest 0.4.x source code (dist zip version), run ./compile.sh, then add the printed output path to PATH.

TensorFlow unfortunately still doesn't just work. So (replacing my paths with yours):

  1. In configure, replace bazel clean --expunge with bazel clean --expunge_async

  2. In third_party/gpus/crosstool/CROSSTOOL.tpl, replace all occurrences of /usr/bin/cpp with /cm/shared/apps/gcc/4.8.2/bin/cpp

  3. In third_party/gpus/crosstool/CROSSTOOL.tpl, after the line -B/usr/bin/, add the lines

linker_flag: "-Wl,-R/cm/shared/apps/gcc/4.8.2/lib64"
cxx_builtin_include_directory: "/cm/shared/apps/gcc/4.8.2/lib/gcc/x86_64-unknown-linux-gnu/4.8.2/include"
cxx_builtin_include_directory: "/cm/shared/apps/gcc/4.8.2/lib/gcc/x86_64-unknown-linux-gnu/4.8.2/include-fixed"
cxx_builtin_include_directory: "/cm/shared/apps/gcc/4.8.2/include/c++/4.8.2"

  1. In third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl, replace NVCC_PATH = CURRENT_DIR + '/../../../cuda/bin/nvcc' with NVCC_PATH = ('/cm/shared/apps/cuda/7.5/bin/nvcc')

  2. In third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl, replace LLVM_HOST_COMPILER_PATH = ('/usr/bin/gcc') with LLVM_HOST_COMPILER_PATH = ('/cm/shared/apps/gcc/4.8.2/bin/gcc')

  3. In third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl, comment out the line cmd = 'PATH=' + PREFIX_DIR + ' ' + cmd

I configured with cuda 7.5, cudnn 5, compute compatibility 3.5 and built with bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

@yliu120

This comment has been minimized.

Show comment
Hide comment
@yliu120

yliu120 Jan 23, 2017

@rdipietro @VittalP I have wrote an explanation on the installation of the latest Tensorflow right before @VittalP 's post. But you guys just simply ignored my post... As a jhuer, I kindly note that I have sent my instructions to MARCC's guy and there is already a tensorflow module on MARCC.

If you like to read my post to see where is different. http://biophysics.med.jhmi.edu/~yliu120/tensorflow.html

If something needs to be updated, please inform me of that.

yliu120 commented Jan 23, 2017

@rdipietro @VittalP I have wrote an explanation on the installation of the latest Tensorflow right before @VittalP 's post. But you guys just simply ignored my post... As a jhuer, I kindly note that I have sent my instructions to MARCC's guy and there is already a tensorflow module on MARCC.

If you like to read my post to see where is different. http://biophysics.med.jhmi.edu/~yliu120/tensorflow.html

If something needs to be updated, please inform me of that.

@rdipietro

This comment has been minimized.

Show comment
Hide comment
@rdipietro

rdipietro Jan 23, 2017

Contributor

Sorry! I didn't notice that you had posted here. But note that you are making changes that I didn't need to make. Probably depends on specific versions of TF / cuda / gcc / whatever.

Side note: I still compile on MARCC because they only installed TF for Python 2.x, whereas I'm using 3.x.

Contributor

rdipietro commented Jan 23, 2017

Sorry! I didn't notice that you had posted here. But note that you are making changes that I didn't need to make. Probably depends on specific versions of TF / cuda / gcc / whatever.

Side note: I still compile on MARCC because they only installed TF for Python 2.x, whereas I'm using 3.x.

@yliu120

This comment has been minimized.

Show comment
Hide comment
@yliu120

yliu120 Jan 23, 2017

I have updated my webpage for building tensorflow 1.0.0 with python 3.5.2. I provided two wheels on the webpage as well.

Please refer to:
http://biophysics.med.jhmi.edu/~yliu120/tensorflow.html

yliu120 commented Jan 23, 2017

I have updated my webpage for building tensorflow 1.0.0 with python 3.5.2. I provided two wheels on the webpage as well.

Please refer to:
http://biophysics.med.jhmi.edu/~yliu120/tensorflow.html

@fraudies

This comment has been minimized.

Show comment
Hide comment
@fraudies

fraudies Mar 10, 2017

For whoever wants to compile TensorFlow 1.0 on RedHat 6 and with Python 2.7, I provide a detailed step-by-step guide here: https://www.linkedin.com/pulse/compiling-tensorflow-10-python-27-redhat-6-florian-raudies

fraudies commented Mar 10, 2017

For whoever wants to compile TensorFlow 1.0 on RedHat 6 and with Python 2.7, I provide a detailed step-by-step guide here: https://www.linkedin.com/pulse/compiling-tensorflow-10-python-27-redhat-6-florian-raudies

@rdipietro

This comment has been minimized.

Show comment
Hide comment
@rdipietro

rdipietro May 25, 2017

Contributor

And here we go again for r1.2. (Note: since r1.0, the Bazel configuration file organization has been mucked with.)

Bazel: Need new ish version. 0.4.3 did not work, 0.4.5 did. Again, Bazel now compiles easily even with older CentOS / glibc, so this is straightforward.

Required edits for TensorFlow:

vim third_party/gpus/crosstool/CROSSTOOL_nvcc.tpl
%s~/usr/bin/cpp~/cm/shared/apps/gcc/4.8.2/bin/cpp~g
And after linker_flag: "-B/usr/bin/" add

  linker_flag: "-Wl,-R/cm/shared/apps/gcc/4.8.2/lib64"
  cxx_builtin_include_directory: "/cm/shared/apps/gcc/4.8.2/lib/gcc/x86_64-unknown-linux-gnu/4.8.2/include"
  cxx_builtin_include_directory: "/cm/shared/apps/gcc/4.8.2/lib/gcc/x86_64-unknown-linux-gnu/4.8.2/include-fixed"
  cxx_builtin_include_directory: "/cm/shared/apps/gcc/4.8.2/include/c++/4.8.2"

vim third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl
NVCC_PATH = '/cm/shared/apps/cuda/7.5/bin/nvcc'

Final notes: Wouldn't work with Cuda 7.5, CuDNN 5 (cuda compilation errors). Success with Cuda 8.0, CuDNN 5.

Contributor

rdipietro commented May 25, 2017

And here we go again for r1.2. (Note: since r1.0, the Bazel configuration file organization has been mucked with.)

Bazel: Need new ish version. 0.4.3 did not work, 0.4.5 did. Again, Bazel now compiles easily even with older CentOS / glibc, so this is straightforward.

Required edits for TensorFlow:

vim third_party/gpus/crosstool/CROSSTOOL_nvcc.tpl
%s~/usr/bin/cpp~/cm/shared/apps/gcc/4.8.2/bin/cpp~g
And after linker_flag: "-B/usr/bin/" add

  linker_flag: "-Wl,-R/cm/shared/apps/gcc/4.8.2/lib64"
  cxx_builtin_include_directory: "/cm/shared/apps/gcc/4.8.2/lib/gcc/x86_64-unknown-linux-gnu/4.8.2/include"
  cxx_builtin_include_directory: "/cm/shared/apps/gcc/4.8.2/lib/gcc/x86_64-unknown-linux-gnu/4.8.2/include-fixed"
  cxx_builtin_include_directory: "/cm/shared/apps/gcc/4.8.2/include/c++/4.8.2"

vim third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl
NVCC_PATH = '/cm/shared/apps/cuda/7.5/bin/nvcc'

Final notes: Wouldn't work with Cuda 7.5, CuDNN 5 (cuda compilation errors). Success with Cuda 8.0, CuDNN 5.

@zym1010 zym1010 referenced this issue May 25, 2017

Closed

tf 1.1 on centos #26

lukeiwanski pushed a commit to codeplaysoftware/tensorflow that referenced this issue Oct 26, 2017

[OpenCL] Provides atomic free MaxPool gradients (#110)
* [OpenCL] Provides atomic free MaxPool3DGrad

Atomic support in SYCL is not designed in a way that plays nicely with
Tensorflow and Eigen. Here we provide a new implementation for
MaxPool3DGrad which does not rely on atomics, and so avoids any such
problems.

* [OpenCL] Provides atomic free MaxPoolGrad

Atomic support in SYCL is not designed in a way that plays nicely with
Tensorflow and Eigen. Here we provide a new implementation for
MaxPoolGrad which does not rely on atomics, and so avoids any such
problems.

* [OpenCL] Changes expected NaN behaviour in test

The new SYCL kernels provide the same behaviour as the CUDA and cuDNN
kernels when an input tensor only contains NaN and the test needs to
reflect this.

As NaN cannot be compared to any other float value, it makes little
sense to decide which of the NaNs is the maximum, and so which NaN
should have the error propagated to it.

* [OpenCL] Removes unneeded SYCL atomic functions

* [OpenCL] Tidies SYCL MaxPoolGrad kernels

Some tidying up and also adds a local accumulator value which will be
written to memory at the end of the kernel, to decrease the number og
memory writes in the kernel.
@JoyChopra1298

This comment has been minimized.

Show comment
Hide comment
@JoyChopra1298

JoyChopra1298 Feb 24, 2018

I am working on a CentOS 6 cluster which uses Lustre filesystem. I am unable to make Bazel work on it since it can't use file locking. Refer this issue. So would it be possible for tensorflow to support other build tools?

Edit : Error: unexpected result from F_SETLK: Function not implemented. Also refer the hyper-link above

JoyChopra1298 commented Feb 24, 2018

I am working on a CentOS 6 cluster which uses Lustre filesystem. I am unable to make Bazel work on it since it can't use file locking. Refer this issue. So would it be possible for tensorflow to support other build tools?

Edit : Error: unexpected result from F_SETLK: Function not implemented. Also refer the hyper-link above

@yliu120

This comment has been minimized.

Show comment
Hide comment
@yliu120

yliu120 Feb 24, 2018

@JoyChopra1298
Up in this thread, lots of people built bazel and tf on CentOS 6. I am sure it can be built. Since you didn’t paste any error message, I am not sure what is your problem. But if you said Bazel can’t work with Lustre, you can move bazel ‘s output_user_root to /tmp/bazel. Usually the tmpfs is a locally mounted fs on a single node.

yliu120 commented Feb 24, 2018

@JoyChopra1298
Up in this thread, lots of people built bazel and tf on CentOS 6. I am sure it can be built. Since you didn’t paste any error message, I am not sure what is your problem. But if you said Bazel can’t work with Lustre, you can move bazel ‘s output_user_root to /tmp/bazel. Usually the tmpfs is a locally mounted fs on a single node.

@JoyChopra1298

This comment has been minimized.

Show comment
Hide comment
@JoyChopra1298

JoyChopra1298 Feb 24, 2018

@yliu120 Thank you using bazel's output_user_root option worked.

JoyChopra1298 commented Feb 24, 2018

@yliu120 Thank you using bazel's output_user_root option worked.

@owenyoung75

This comment has been minimized.

Show comment
Hide comment
@owenyoung75

owenyoung75 Jul 15, 2018

I have some similar problem here, in step2 for TF specifically.
My cluster on campus uses module with Redhat which is glibc 2.12.
I successfully installed bazel 0.15.0. But when I tried to move forward to bazel build TF, I got a long log, a part of which appears as:

/home2/my_name/.cache/bazel/_bazel_my_name/b9c3b9594c932d1e804df44467c1c0d2/external/boringssl/BUILD:115:1: C++ compilation of rule '@boringssl//:crypto' failed (Exit 1)
external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S: Assembler messages:
external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S:37: Error: suffix or operands invalid for vpxor' external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S:80: Error: no such instruction:vpbroadcastq .Land_mask(%rip),%ymm15'
external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S:91: Error: suffix or operands invalid for vpaddq' external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S:92: Error: no such instruction:vpbroadcastq 0-128(%rsi),%ymm10'
external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S:93: Error: suffix or operands invalid for vpaddq' external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S:95: Error: suffix or operands invalid forvpaddq'
external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S:97: Error: suffix or operands invalid for vpaddq' external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S:99: Error: suffix or operands invalid forvpaddq'
external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S:101: Error: suffix or operands invalid for vpaddq' external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S:103: Error: suffix or operands invalid forvpaddq'
external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S:105: Error: suffix or operands invalid for vpaddq' external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S:107: Error: suffix or operands invalid forvpxor'
external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S:110: Error: suffix or operands invalid for vpmuludq' external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S:111: Error: no such instruction:vpbroadcastq 32-128(%rsi),%ymm11'
...

And when I used --verbose_failures to monitor the building process, I obtained the output organized in error_records.txt
error_records.txt

Can anyone help with this issue?

owenyoung75 commented Jul 15, 2018

I have some similar problem here, in step2 for TF specifically.
My cluster on campus uses module with Redhat which is glibc 2.12.
I successfully installed bazel 0.15.0. But when I tried to move forward to bazel build TF, I got a long log, a part of which appears as:

/home2/my_name/.cache/bazel/_bazel_my_name/b9c3b9594c932d1e804df44467c1c0d2/external/boringssl/BUILD:115:1: C++ compilation of rule '@boringssl//:crypto' failed (Exit 1)
external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S: Assembler messages:
external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S:37: Error: suffix or operands invalid for vpxor' external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S:80: Error: no such instruction:vpbroadcastq .Land_mask(%rip),%ymm15'
external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S:91: Error: suffix or operands invalid for vpaddq' external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S:92: Error: no such instruction:vpbroadcastq 0-128(%rsi),%ymm10'
external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S:93: Error: suffix or operands invalid for vpaddq' external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S:95: Error: suffix or operands invalid forvpaddq'
external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S:97: Error: suffix or operands invalid for vpaddq' external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S:99: Error: suffix or operands invalid forvpaddq'
external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S:101: Error: suffix or operands invalid for vpaddq' external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S:103: Error: suffix or operands invalid forvpaddq'
external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S:105: Error: suffix or operands invalid for vpaddq' external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S:107: Error: suffix or operands invalid forvpxor'
external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S:110: Error: suffix or operands invalid for vpmuludq' external/boringssl/linux-x86_64/crypto/fipsmodule/rsaz-avx2.S:111: Error: no such instruction:vpbroadcastq 32-128(%rsi),%ymm11'
...

And when I used --verbose_failures to monitor the building process, I obtained the output organized in error_records.txt
error_records.txt

Can anyone help with this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment