New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorflow for Nvidia TX1 #851

Closed
jmtatsch opened this Issue Jan 23, 2016 · 85 comments

Comments

Projects
None yet
@jmtatsch
Contributor

jmtatsch commented Jan 23, 2016

Hello,

@maxcuda has recently got tensorflow running on the tk1 as documented in blogpost http://cudamusing.blogspot.de/2015/11/building-tensorflow-for-jetson-tk1.html but since then been unable to repeatedly build it. I am now trying to get tensorflow running on a tx1 tegra platform and need some support.

Much trouble seems to come from Eigen variadic templates and using C++11 initializer lists, both of wich could work according to http://devblogs.nvidia.com/parallelforall/cplusplus-11-in-cuda-variadic-templates/.
In theory std=c++11 should be set according to crosstool. Nevertheless, nvcc crashes happily on all of them. This smells as if the "-std=c++11" flag is not properly set.
How can I verify/enforce this?

Also in tensorflow.bzl, variadic templates in Eigen are said to be disabled
We have to disable variadic templates in Eigen for NVCC even though std=c++11 are enabled
is that still necessary?

Here is my build workflow:

git clone —recurse-submodules git@github.com:jmtatsch/tensorflow.git
cd tensorflow
grep -Rl "lib64"| xargs sed -i 's/lib64/lib/g' # no lib64 for tx1 yet 
./configure
bazel build -c opt --local_resources 2048,0.5,1.0 --verbose_failures --config=cuda //tensorflow/cc:tutorials_example_trainer
@bhack

This comment has been minimized.

Contributor

bhack commented Jan 23, 2016

@bhack

This comment has been minimized.

Contributor

bhack commented Jan 23, 2016

Are you using jetpack 2?

@jmtatsch

This comment has been minimized.

Contributor

jmtatsch commented Jan 23, 2016

No, JetPack does not support running directly on the L4T platform.

@bhack

This comment has been minimized.

Contributor

bhack commented Jan 23, 2016

I meant if you have flashed the board with jetpack 2 to have cuda 7 support.

@jmtatsch jmtatsch changed the title from tensorflow for Jetson TX1 to tensorflow for Nvidia TX1 Jan 23, 2016

@jmtatsch

This comment has been minimized.

Contributor

jmtatsch commented Jan 23, 2016

Ah, yes I have Cuda 7 support and used jetpack 2. To be more precise, the target is not actually the Jetson TX1 but an repurposed Nvida Sield TV flashed to L4T 23.1 for Jetson.

@vincentvanhoucke

This comment has been minimized.

Member

vincentvanhoucke commented Jan 23, 2016

@Yangqing FYI

@benoitsteiner

This comment has been minimized.

Contributor

benoitsteiner commented Feb 4, 2016

I think there is a TX1 that I could use to take a look. I'll see what I can do.

@benoitsteiner benoitsteiner self-assigned this Feb 4, 2016

@robagar

This comment has been minimized.

robagar commented Feb 6, 2016

In theory, can TensorFlow run usefully on the TK1? Or is the 2G memory too small for, say, face verification?

@benoitsteiner

This comment has been minimized.

Contributor

benoitsteiner commented Feb 8, 2016

@robagar It all depends on how large your network is and whether you intend to train the model on TK1 or just run inference. Two GB of memory is plenty to run inference on almost any model.

@benoitsteiner

This comment has been minimized.

Contributor

benoitsteiner commented Feb 10, 2016

I have worked around an issue that prevented nvcc from compiling the Eigen codebase on Tegra X1 (https://bitbucket.org/eigen/eigen/commits/d0950ac79c0404047379eb5a927a176dbb9d12a5).
However, so far I haven't succeeded in setting up bazel on the Tegra X1, so I haven't been able to start working on the other issues reported in http://cudamusing.blogspot.de/2015/11/building-tensorflow-for-jetson-tk1.html

@jmtatsch

This comment has been minimized.

Contributor

jmtatsch commented Feb 11, 2016

That's good news ;) Whats the problem with bazel? maxcuda's instructions for building bazel worked quite well for me..

@jmtatsch

This comment has been minimized.

Contributor

jmtatsch commented Feb 13, 2016

For building bazel I had to use a special java build which can cope with the 32bit rootfs on a 64bit machine

wget http://www.java.net/download/jdk8u76/archive/b02/binaries/jdk-8u76-ea-bin-b02-linux-arm-vfp-hflt-04_jan_2016.tar.gz
sudo tar -zxvf jdk-8u76-ea-bin-b02-linux-arm-vfp-hflt-04_jan_2016.tar.gz -C /usr/lib/jvm
sudo update-alternatives --install "/usr/bin/java" "java" "/usr/lib/jvm/jdk1.8.0_76/bin/java" 1
sudo update-alternatives --config java

There seems to be one eigen issue I can't get around:

bazel build -c opt --local_resources 2048,0.5,1.0 --verbose_failures --config=cuda //tensorflow/cc:tutorials_example_trainer
WARNING: Sandboxed execution is not supported on your system and thus hermeticity of actions cannot be guaranteed. See http://bazel.io/docs/bazel-user-manual.html#sandboxing for more information. You can turn off this warning via --ignore_unsupported_sandboxing.
INFO: Found 1 target...
INFO: From Compiling tensorflow/core/kernels/cross_op_gpu.cu.cc:
At end of source: warning: routine is both "inline" and "noinline"

external/eigen_archive/eigen-eigen-c5e90d9e764e/unsupported/Eigen/CXX11/src/Tensor/TensorEvaluator.h(125): warning: routine is both "inline" and "noinline"

At end of source: warning: routine is both "inline" and "noinline"

external/eigen_archive/eigen-eigen-c5e90d9e764e/unsupported/Eigen/CXX11/src/Tensor/TensorEvaluator.h(125): warning: routine is both "inline" and "noinline"

./tensorflow/core/lib/strings/strcat.h(195): internal error: assertion failed at: "/dvs/p4/build/sw/rel/gpu_drv/r346/r346_00/drivers/compiler/edg/EDG_4.9/src/decl_inits.c", line 3251


1 catastrophic error detected in the compilation of "/tmp/tmpxft_0000682d_00000000-8_cross_op_gpu.cu.cpp4.ii".
Compilation aborted.
Aborted
ERROR: /opt/tensorflow/tensorflow/core/BUILD:331:1: output 'tensorflow/core/_objs/gpu_kernels/tensorflow/core/kernels/cross_op_gpu.cu.o' was not created.
ERROR: /opt/tensorflow/tensorflow/core/BUILD:331:1: not all outputs were created.
Target //tensorflow/cc:tutorials_example_trainer failed to build
INFO: Elapsed time: 2271.358s, Critical Path: 2260.25s

Can you have a look at TensorEvaluator.h please?

@benoitsteiner

This comment has been minimized.

Contributor

benoitsteiner commented Feb 23, 2016

I still haven't been able to install bazel. That said, the assertion you're facing seems to be triggered by the variadic template at line 195 of ./tensorflow/core/lib/strings/strcat.h. I would just comment this code and see how it goes.

@ggaabe

This comment has been minimized.

ggaabe commented Feb 24, 2016

When you say maxcuda has "been unable to repeatedly build it" since then, does that mean that tensorflow is no longer working on the TK1 again? Because I just ordered the TK1 with the express purpose of being able to run tensorflow :-/

@maxcuda

This comment has been minimized.

maxcuda commented Feb 24, 2016

Yes, I have been unable to recompile the latest versions. The wheel I built around Thanksgiving should still work but it is quite an old version.

@jmtatsch

This comment has been minimized.

Contributor

jmtatsch commented Feb 27, 2016

Commenting the variadic template at line 195 helps a little but at line 234 there is a another template that seems to be required. Any hints how to rewrite that in nvcc friendly manner?

@jmtatsch

This comment has been minimized.

Contributor

jmtatsch commented Mar 11, 2016

@benoitsteiner
any suggestions how this could be rewritten in a nvcc compatible manner?

// Support 5 or more arguments
template <typename... AV>
inline void StrAppend(string *dest, const AlphaNum &a, const AlphaNum &b,
                      const AlphaNum &c, const AlphaNum &d, const AlphaNum &e,
                      const AV &... args) {
  internal::AppendPieces(dest,
                         {a.Piece(), b.Piece(), c.Piece(), d.Piece(), e.Piece(),
                          static_cast<const AlphaNum &>(args).Piece()...});
}
@martinwicke

This comment has been minimized.

Member

martinwicke commented Mar 16, 2016

@damienmg FYI

@jas0n1ee

This comment has been minimized.

jas0n1ee commented Mar 18, 2016

Hi folks, I'm also working on building everything from scratch on tx1. There is lots of discussions here and also on nvidia developer forums. But by now I haven't seen any well summarized instruction besides that tk1's. Can we start another repo or script file so people can work on it more efficient?

@jmtatsch

This comment has been minimized.

Contributor

jmtatsch commented Mar 20, 2016

Imho we have to first solve the fundamental issue of the variadic templates not working with nvcc. Either the developers would have to do without those templates which is backwards and probably not going to happen or nvidia has to step up and make nvcc more compatible? In theory nvcc should already be able to deal with your own variadic templates, but external e.g. STL headers won't "just work" because of the need to annotate all functions called on the device with "host device". Maybe someone knows a good way how to get around this issue....

@benoitsteiner

This comment has been minimized.

Contributor

benoitsteiner commented Mar 21, 2016

@jmtatsch At the moment, the version of cuda that is shipped with the tegra x1 has problems with variadic templates. Nvidia is aware of this and working on a fix. I updated Eigen a few weeks ago to disable the use of variadic templates when compiling on tegra x1, and that seems to fix the bulk of the problem. However, StrCat and StrAppend still rely on variadic templates. Until nvidia releases a fix, the best solution is to comment out the variadic versions of StrCat and StrAppend, and create non variadic versions of StrCat and StrAppend with up to 11 arguments (since that's what TensorFlow currently needs).
There are a couple of ways to avoid the STL issues: a brittle solution is to only compile optimized kernels. The compiler then inlines the STL code at which point the lack of host device annotation doesn't matter since there is no function call to resolve. A better solution is to replace all the STL functionality with custom code. We've started to do this in Eigen by reimplementing most of the STL functions we need in the Eigen::numext namespace. This is tedious by much more reliable than relying on inlining to bypass the problem.

@maxcuda

This comment has been minimized.

maxcuda commented May 17, 2016

I have a build of TF 0.8 but it requires a new 7.0 compiler that is not yet available to the general public.
I am building a wheel on a Jetson TK1, I will make it available after some testing.
I will update the instructions on how to build from source on cudamusing.

@robagar

This comment has been minimized.

robagar commented May 17, 2016

Good work @maxcuda! Will it build on the TX1 too?

@maxcuda

This comment has been minimized.

maxcuda commented May 17, 2016

Yes, it will build on TX1 too. I fixed a problem with the new memory allocator to take in account the 32bit OS. Some basic tests are passing but the label_image test is giving the wrong results so there may be some other places with 32bit issues.

@maxcuda

This comment has been minimized.

maxcuda commented May 17, 2016

@benoitsteiner , with the new compiler your change to Eigen is not required anymore ( and it is forcing to edit a bunch of files). Could you please remove the check and re-enable variadic templates ?

@benoitsteiner

This comment has been minimized.

Contributor

benoitsteiner commented May 17, 2016

@maxcuda Where can I download the new cuda compiler? I'd like to make sure that I don't introduce new problems when I enable variadic templates again.

@girving girving added the triaged label Jun 6, 2016

@tylerfox

This comment has been minimized.

tylerfox commented Jun 17, 2016

@maxcuda is the new 7.0 compiler you were referencing part of Jetpack 2.2 that was just released?

@dwightcrow

This comment has been minimized.

dwightcrow commented Nov 10, 2016

@ShawnXuan - these are files in the cloned bazel repo. The change proposed on StackOverflow for example would be made to CPU.java as shown in the diff. You can see which files elirex changed in addition by looking at their diff. Hope that helps

@piotrchmiel

This comment has been minimized.

piotrchmiel commented Nov 12, 2016

@elirex Did you manage to compile ?

@elirex

This comment has been minimized.

Contributor

elirex commented Nov 14, 2016

@piotrchmiel Yes, I successfully completed the compilation. I add 8GB swap space and run bazel build -c opt --local_resources 3072,4.0,1.0 --verbose_failures --config=cuda //tensorflow/tools/pip_package:build_pip_package

At compiling, I through free -h and top command to look the memory usage status. Tensorflow need to use about 8GB memory to compile.

@piotrchmiel

This comment has been minimized.

piotrchmiel commented Nov 14, 2016

Thank you 👍 I will try to repeat your steps :-)

@MattKleinsmith

This comment has been minimized.

MattKleinsmith commented Nov 25, 2016

Question:

For those that compiled TensorFlow 0.9 on the Jetson TX1, which options did you use during the TensorFlow ./configure step?

Error 1:

I received Error: unexpected EOF from Bazel server after following the steps from this StackOverflow guide from a fresh install of JetPack 2.3.

Two bazel issue responders (1, 2) suggested people use the --jobs 4 or --jobs 20 option when receiving this error, in case the error was due to a lack of memory.

I'm ran bazel again, this time with the --jobs 4; however, I received a new error ("Error 2", below).

The remainder of the error said, Contents of /home/ubuntu/.cache/bazel/_bazel_ubuntu/(xxxx)/server/jvm.out':` with no further output.

Error 2:

ERROR: /home/ubuntu/tensorflow/tensorflow/core/kernels/BUILD:309:1: C++ compilation of rule '//tensorflow/core/kernels:mirror_pad_op' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object ... (remaining 105 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 4. gcc: internal compiler error: Killed (program cc1plus)

I didn't use bazel clean --expunge before the second attempt. Maybe that caused the error.

Plan:

  • Run bazel clean --expunge
  • Rerun bazel to create the cache folder
  • Readd config.guess and config.sub to the cache folder
  • Create 8GB of swap space
  • Try bazel build -c opt --local_resources 3072,4.0,1.0 --verbose_failures --config=cuda //tensorflow/tools/pip_package:build_pip_package because @elirex had success with it.
@MattKleinsmith

This comment has been minimized.

MattKleinsmith commented Nov 25, 2016

It worked

Following this StackOverflow guide but with an 8 GB swap file and using the following command successfully built TensorFlow 0.9 on the Jetson TX1 from a fresh install of JetPack 2.3:

bazel build -c opt --local_resources 3072,4.0,1.0 --verbose_failures --config=cuda //tensorflow/tools/pip_package:build_pip_package

I used the default settings for TensorFlow's ./configure script except to enable GPU support.

My build took at least 6 hours. It'll be faster if you use an SSD instead of a USB drive.

Thanks to Dwight Crow, @elirex, @tylerfox, everyone that helped them, and everyone in this thread for spending time on this problem.

Creating a swap file

# Create a swapfile for Ubuntu at the current directory location
fallocate -l *G swapfile
# List out the file
ls -lh swapfile
# Change permissions so that only root can use it
chmod 600 swapfile
# List out the file
ls -lh swapfile
# Set up the Linux swap area
mkswap swapfile
# Now start using the swapfile
sudo swapon swapfile
# Show that it's now being used
swapon -s

Adapted from JetsonHack's gist.

I used this USB drive to store my swap file.

The most memory I saw my system use was 7.7 GB (3.8 GB on Mem and 3.9 GB on Swap). The most swap memory I saw used at once was 4.4 GB. I used free -h to view memory usage.

Creating the pip package and installing

Adapted from the TensorFlow docs:

$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

# The name of the .whl file will depend on your platform.
$ pip install /tmp/tensorflow_pkg/tensorflow-0.9.0-py2-none-any.whl
@elirex

This comment has been minimized.

Contributor

elirex commented Nov 29, 2016

I use bazel build -c opt --local_resources 1024,4.0,1.0 --jobs 4 --verbose_failures --config=cuda //tensorflow/tools/pip_package:build_pip_package and without allocate swap to build tensorflow r.09 on the TX1 JetPack 2.3 that pass compilation.

@corenel

This comment has been minimized.

corenel commented Dec 4, 2016

Could anyone build TF r0.11 on TX1 yet?

@zxwind

This comment has been minimized.

zxwind commented Dec 5, 2016

Thanks for all the information here, got tensorflow r0.11.0 installed with Jetpack 2.3.1 on tx1. Following @elirex 's steps, make sure using the exact version of protobuf, grpc and bazel. I build tensorflow r0.11.0 instead of v0.11.0.rc2. When compiling, following @MatthewKleinsmith 's step to add swap file, you need a big swap, I tried 6G but failed in the middle with out of memory error, tried again with 10G swap file works. It took me about 5 hours for the compiling with swapfile allocated on usb drive.

@loliverhennigh

This comment has been minimized.

Contributor

loliverhennigh commented Dec 7, 2016

Is tensorflow working correctly on the TX1, ie. able to run inference and get good results? When I installed tensorflow on a TK1 it ran just fine however the convolutional layers where producing bad results. I could train fully connected models on mnist just fine but when I tried to use conv layers it stopped converging. Is this problem persistent in the TX1 build?

@tugwitt

This comment has been minimized.

tugwitt commented Dec 8, 2016

Continually get this when running ./compile.sh for Bazel:
Building Bazel from scratch
gPRC Java plugin not found in

If I pull 0.2.3 I don't get the error, only with 0.3.x

@ArekSredzki

This comment has been minimized.

ArekSredzki commented Dec 14, 2016

@zxwind How is TF 0.11 performance working for you on the TX1?

@rwightman

This comment has been minimized.

rwightman commented Jan 17, 2017

FYI, I've got a branch off r1.0 with some hacks to build the r1.0 release on TX1 with Jetpack 2.3.1.

In addition to the previously mentioned issues, there is a change in Eigen after the revision used on the TF r0.11 branch that causes the CUDA compiler to crash with an internal error. I changed workspace.bzl on r1.0 branch to point to the older Eigen revision. In order for that to build I had to remove the EXPM1 op that was added after r0.11. It's all rather ugly but got me up and running.

Interesting to note, with the r1.0.0a build I'm able to run inference on a Resnet50 based network at 128x96 resolution that was running out of memory on r0.11. For anyone curious on benchmark numbers, was getting approx 15fps with single frame batches.

Link to a tag on my clone of TF with binary wheels for anyone interested. The wheels will likely only work on a Jetpack 2.3.1 (L4T 24.2.1). No guarantees there aren't some serious issues but I've verified results on the networks I'm using right now.
https://github.com/rwightman/tensorflow/releases/tag/v1.0.0-alpha-tegra-ugly_hack

@drpngx

This comment has been minimized.

Member

drpngx commented Jan 24, 2017

Closing since @rwightman / @MatthewKleinsmith solution seems to work, though not quite a seamless out-the-box experience. Feel free to reopen.

@drpngx drpngx closed this Jan 24, 2017

@sunsided

This comment has been minimized.

sunsided commented Feb 19, 2017

@rwightman May I humbly ask you to provide another wheel for the r1.0 stable version?

@sumitkamath

This comment has been minimized.

sumitkamath commented Feb 23, 2017

@rwightman How were you able to build tensorflow without gRPC? Thanks!

Edit: never mind, I saw your repo : https://github.com/jetsonhacks/installTensorFlowTX1/

Thanks for setting that up.

@bfreskura

This comment has been minimized.

bfreskura commented Mar 16, 2017

@sunsided Here's the Python 3.5.2 version for TF 1.0.1 that @dkopljar and I managed to build: https://drive.google.com/open?id=0B2jw9AHXtUJ_OFJDV19TWTEyaWc

@syed-ahmed

This comment has been minimized.

Contributor

syed-ahmed commented Mar 22, 2017

Hello all, I was able to install TensorFlow v1.0.1 on the new Jetson TX2. I had to follow similar process as mentioned above in this thread (protobuf, grpc, swapfile etc). For bazel, I downloaded bazel-0.4.5-dist.zip and applied @dtrebbien's change. Here is the pip wheel of my installation if it helps anyone. It's for Python 2.7: https://drive.google.com/file/d/0Bxl-G9VJ61mBYmZPY0hLSlFaUDg/view?usp=sharing
And here the step by step procedure: https://syed-ahmed.gitbooks.io/nvidia-jetson-tx2-recipes/content/first-question.html

bazel-io pushed a commit to bazelbuild/bazel that referenced this issue Apr 7, 2017

Add "aarch64" to the set of ARM CPU archs
This change, suggested by @tylerfox at
tensorflow/tensorflow#851 (comment)
allows Bazel 0.4.5 to be built on a Jetson TX1 with JetPack 3.0.

The other of @tylerfox's suggested changes was made in 7c4afb6.

Refs #1264

Closes #2703.
PiperOrigin-RevId: 152498304

tarasglek pushed a commit to tarasglek/tensorflow that referenced this issue Jun 20, 2017

Merge pull request tensorflow#851 from tae-jun/patch-2
slim: Typos at datasets/flowers.py
@MoAbd

This comment has been minimized.

MoAbd commented Jul 3, 2017

Hello all, I was able to install TensorFlow v1.0.1 on Tegra X1 using the build by @barty777
Is there build availabke for TensorFlow v1.2 ?

@gvoysey

This comment has been minimized.

gvoysey commented Aug 8, 2017

@barty777 you wouldn't happen to have 3.6 wheels, would you? 🙏

@bfreskura

This comment has been minimized.

bfreskura commented Aug 8, 2017

@gvoysey Unfortunately no. :(

@pantor

This comment has been minimized.

pantor commented Aug 23, 2017

Here is the wheel file for TensorFlow 1.2, Nvidia TX1 and Python 2.7: https://drive.google.com/file/d/0B-Ljdh8jFZRbTnVNdGtGMHA2Ymc/view?usp=sharing

@gvoysey

This comment has been minimized.

gvoysey commented Aug 23, 2017

i've been able to build a tensorflow wheel for python 3.6 for TX1, but i cannot build tensorflow-GPU support successfully. See https://stackoverflow.com/questions/45825708/error-building-tensorflow-gpu-1-1-0-on-nvidia-jetson-tx1-aarch64 for details.

@saisankargochhayat

This comment has been minimized.

saisankargochhayat commented Jan 27, 2018

Sorry for the late comment, can anyone please help me regarding setting up tensorflow in Nvidia tk1?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment