tensorflow for Nvidia TX1 #851

jmtatsch · 2016-01-23T11:21:23Z

Hello,

@maxcuda has recently got tensorflow running on the tk1 as documented in blogpost http://cudamusing.blogspot.de/2015/11/building-tensorflow-for-jetson-tk1.html but since then been unable to repeatedly build it. I am now trying to get tensorflow running on a tx1 tegra platform and need some support.

Much trouble seems to come from Eigen variadic templates and using C++11 initializer lists, both of wich could work according to http://devblogs.nvidia.com/parallelforall/cplusplus-11-in-cuda-variadic-templates/.
In theory std=c++11 should be set according to crosstool. Nevertheless, nvcc crashes happily on all of them. This smells as if the "-std=c++11" flag is not properly set.
How can I verify/enforce this?

Also in tensorflow.bzl, variadic templates in Eigen are said to be disabled
We have to disable variadic templates in Eigen for NVCC even though std=c++11 are enabled
is that still necessary?

Here is my build workflow:

git clone —recurse-submodules git@github.com:jmtatsch/tensorflow.git
cd tensorflow
grep -Rl "lib64"| xargs sed -i 's/lib64/lib/g' # no lib64 for tx1 yet 
./configure
bazel build -c opt --local_resources 2048,0.5,1.0 --verbose_failures --config=cuda //tensorflow/cc:tutorials_example_trainer

The text was updated successfully, but these errors were encountered:

bhack · 2016-01-23T11:26:40Z

See http://devblogs.nvidia.com/parallelforall/power-cpp11-cuda-7/

bhack · 2016-01-23T11:48:48Z

Are you using jetpack 2?

jmtatsch · 2016-01-23T11:52:54Z

No, JetPack does not support running directly on the L4T platform.

bhack · 2016-01-23T11:54:48Z

I meant if you have flashed the board with jetpack 2 to have cuda 7 support.

jmtatsch · 2016-01-23T12:06:11Z

Ah, yes I have Cuda 7 support and used jetpack 2. To be more precise, the target is not actually the Jetson TX1 but an repurposed Nvida Sield TV flashed to L4T 23.1 for Jetson.

vincentvanhoucke · 2016-01-23T15:47:41Z

@Yangqing FYI

benoitsteiner · 2016-02-04T02:33:38Z

I think there is a TX1 that I could use to take a look. I'll see what I can do.

robagar · 2016-02-06T17:31:26Z

In theory, can TensorFlow run usefully on the TK1? Or is the 2G memory too small for, say, face verification?

benoitsteiner · 2016-02-08T20:46:54Z

@robagar It all depends on how large your network is and whether you intend to train the model on TK1 or just run inference. Two GB of memory is plenty to run inference on almost any model.

benoitsteiner · 2016-02-10T17:41:21Z

I have worked around an issue that prevented nvcc from compiling the Eigen codebase on Tegra X1 (https://bitbucket.org/eigen/eigen/commits/d0950ac79c0404047379eb5a927a176dbb9d12a5).
However, so far I haven't succeeded in setting up bazel on the Tegra X1, so I haven't been able to start working on the other issues reported in http://cudamusing.blogspot.de/2015/11/building-tensorflow-for-jetson-tk1.html

jmtatsch · 2016-02-11T11:57:04Z

That's good news ;) Whats the problem with bazel? maxcuda's instructions for building bazel worked quite well for me..

jmtatsch · 2016-02-13T09:02:59Z

For building bazel I had to use a special java build which can cope with the 32bit rootfs on a 64bit machine

wget http://www.java.net/download/jdk8u76/archive/b02/binaries/jdk-8u76-ea-bin-b02-linux-arm-vfp-hflt-04_jan_2016.tar.gz
sudo tar -zxvf jdk-8u76-ea-bin-b02-linux-arm-vfp-hflt-04_jan_2016.tar.gz -C /usr/lib/jvm
sudo update-alternatives --install "/usr/bin/java" "java" "/usr/lib/jvm/jdk1.8.0_76/bin/java" 1
sudo update-alternatives --config java

There seems to be one eigen issue I can't get around:

bazel build -c opt --local_resources 2048,0.5,1.0 --verbose_failures --config=cuda //tensorflow/cc:tutorials_example_trainer
WARNING: Sandboxed execution is not supported on your system and thus hermeticity of actions cannot be guaranteed. See http://bazel.io/docs/bazel-user-manual.html#sandboxing for more information. You can turn off this warning via --ignore_unsupported_sandboxing.
INFO: Found 1 target...
INFO: From Compiling tensorflow/core/kernels/cross_op_gpu.cu.cc:
At end of source: warning: routine is both "inline" and "noinline"

external/eigen_archive/eigen-eigen-c5e90d9e764e/unsupported/Eigen/CXX11/src/Tensor/TensorEvaluator.h(125): warning: routine is both "inline" and "noinline"

At end of source: warning: routine is both "inline" and "noinline"

external/eigen_archive/eigen-eigen-c5e90d9e764e/unsupported/Eigen/CXX11/src/Tensor/TensorEvaluator.h(125): warning: routine is both "inline" and "noinline"

./tensorflow/core/lib/strings/strcat.h(195): internal error: assertion failed at: "/dvs/p4/build/sw/rel/gpu_drv/r346/r346_00/drivers/compiler/edg/EDG_4.9/src/decl_inits.c", line 3251


1 catastrophic error detected in the compilation of "/tmp/tmpxft_0000682d_00000000-8_cross_op_gpu.cu.cpp4.ii".
Compilation aborted.
Aborted
ERROR: /opt/tensorflow/tensorflow/core/BUILD:331:1: output 'tensorflow/core/_objs/gpu_kernels/tensorflow/core/kernels/cross_op_gpu.cu.o' was not created.
ERROR: /opt/tensorflow/tensorflow/core/BUILD:331:1: not all outputs were created.
Target //tensorflow/cc:tutorials_example_trainer failed to build
INFO: Elapsed time: 2271.358s, Critical Path: 2260.25s

Can you have a look at TensorEvaluator.h please?

benoitsteiner · 2016-02-23T03:29:42Z

I still haven't been able to install bazel. That said, the assertion you're facing seems to be triggered by the variadic template at line 195 of ./tensorflow/core/lib/strings/strcat.h. I would just comment this code and see how it goes.

ggaabe · 2016-02-24T01:06:18Z

When you say maxcuda has "been unable to repeatedly build it" since then, does that mean that tensorflow is no longer working on the TK1 again? Because I just ordered the TK1 with the express purpose of being able to run tensorflow :-/

maxcuda · 2016-02-24T01:11:36Z

Yes, I have been unable to recompile the latest versions. The wheel I built around Thanksgiving should still work but it is quite an old version.

jmtatsch · 2016-02-27T05:54:55Z

Commenting the variadic template at line 195 helps a little but at line 234 there is a another template that seems to be required. Any hints how to rewrite that in nvcc friendly manner?

jmtatsch · 2016-03-11T06:03:44Z

@benoitsteiner
any suggestions how this could be rewritten in a nvcc compatible manner?

// Support 5 or more arguments
template <typename... AV>
inline void StrAppend(string *dest, const AlphaNum &a, const AlphaNum &b,
                      const AlphaNum &c, const AlphaNum &d, const AlphaNum &e,
                      const AV &... args) {
  internal::AppendPieces(dest,
                         {a.Piece(), b.Piece(), c.Piece(), d.Piece(), e.Piece(),
                          static_cast<const AlphaNum &>(args).Piece()...});
}

martinwicke · 2016-03-16T16:52:43Z

@damienmg FYI

jas0n1ee · 2016-03-18T20:21:15Z

Hi folks, I'm also working on building everything from scratch on tx1. There is lots of discussions here and also on nvidia developer forums. But by now I haven't seen any well summarized instruction besides that tk1's. Can we start another repo or script file so people can work on it more efficient?

jmtatsch · 2016-03-20T10:51:34Z

Imho we have to first solve the fundamental issue of the variadic templates not working with nvcc. Either the developers would have to do without those templates which is backwards and probably not going to happen or nvidia has to step up and make nvcc more compatible? In theory nvcc should already be able to deal with your own variadic templates, but external e.g. STL headers won't "just work" because of the need to annotate all functions called on the device with "host device". Maybe someone knows a good way how to get around this issue....

benoitsteiner · 2016-03-21T19:48:31Z

@jmtatsch At the moment, the version of cuda that is shipped with the tegra x1 has problems with variadic templates. Nvidia is aware of this and working on a fix. I updated Eigen a few weeks ago to disable the use of variadic templates when compiling on tegra x1, and that seems to fix the bulk of the problem. However, StrCat and StrAppend still rely on variadic templates. Until nvidia releases a fix, the best solution is to comment out the variadic versions of StrCat and StrAppend, and create non variadic versions of StrCat and StrAppend with up to 11 arguments (since that's what TensorFlow currently needs).
There are a couple of ways to avoid the STL issues: a brittle solution is to only compile optimized kernels. The compiler then inlines the STL code at which point the lack of host device annotation doesn't matter since there is no function call to resolve. A better solution is to replace all the STL functionality with custom code. We've started to do this in Eigen by reimplementing most of the STL functions we need in the Eigen::numext namespace. This is tedious by much more reliable than relying on inlining to bypass the problem.

maxcuda · 2016-05-17T01:14:43Z

I have a build of TF 0.8 but it requires a new 7.0 compiler that is not yet available to the general public.
I am building a wheel on a Jetson TK1, I will make it available after some testing.
I will update the instructions on how to build from source on cudamusing.

robagar · 2016-05-17T07:42:47Z

Good work @maxcuda! Will it build on the TX1 too?

maxcuda · 2016-05-17T14:06:30Z

Yes, it will build on TX1 too. I fixed a problem with the new memory allocator to take in account the 32bit OS. Some basic tests are passing but the label_image test is giving the wrong results so there may be some other places with 32bit issues.

maxcuda · 2016-05-17T16:44:28Z

@benoitsteiner , with the new compiler your change to Eigen is not required anymore ( and it is forcing to edit a bunch of files). Could you please remove the check and re-enable variadic templates ?

benoitsteiner · 2016-05-17T22:48:09Z

@maxcuda Where can I download the new cuda compiler? I'd like to make sure that I don't introduce new problems when I enable variadic templates again.

tylerfox · 2016-06-17T13:31:23Z

@maxcuda is the new 7.0 compiler you were referencing part of Jetpack 2.2 that was just released?

piotrchmiel · 2016-11-12T13:26:45Z

@elirex Did you manage to compile ?

elirex · 2016-11-14T03:17:50Z

@piotrchmiel Yes, I successfully completed the compilation. I add 8GB swap space and run bazel build -c opt --local_resources 3072,4.0,1.0 --verbose_failures --config=cuda //tensorflow/tools/pip_package:build_pip_package

At compiling, I through free -h and top command to look the memory usage status. Tensorflow need to use about 8GB memory to compile.

piotrchmiel · 2016-11-14T18:37:52Z

Thank you 👍 I will try to repeat your steps :-)

MattKleinsmith · 2016-11-25T08:02:58Z

Question:

For those that compiled TensorFlow 0.9 on the Jetson TX1, which options did you use during the TensorFlow ./configure step?

Error 1:

I received Error: unexpected EOF from Bazel server after following the steps from this StackOverflow guide from a fresh install of JetPack 2.3.

Two bazel issue responders (1, 2) suggested people use the --jobs 4 or --jobs 20 option when receiving this error, in case the error was due to a lack of memory.

I'm ran bazel again, this time with the --jobs 4; however, I received a new error ("Error 2", below).

The remainder of the error said, Contents of /home/ubuntu/.cache/bazel/_bazel_ubuntu/(xxxx)/server/jvm.out':` with no further output.

Error 2:

ERROR: /home/ubuntu/tensorflow/tensorflow/core/kernels/BUILD:309:1: C++ compilation of rule '//tensorflow/core/kernels:mirror_pad_op' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object ... (remaining 105 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 4. gcc: internal compiler error: Killed (program cc1plus)

I didn't use bazel clean --expunge before the second attempt. Maybe that caused the error.

Plan:

Run bazel clean --expunge
Rerun bazel to create the cache folder
Readd config.guess and config.sub to the cache folder
Create 8GB of swap space
Try bazel build -c opt --local_resources 3072,4.0,1.0 --verbose_failures --config=cuda //tensorflow/tools/pip_package:build_pip_package because @elirex had success with it.

MattKleinsmith · 2016-11-25T15:17:54Z

It worked

Following this StackOverflow guide but with an 8 GB swap file and using the following command successfully built TensorFlow 0.9 on the Jetson TX1 from a fresh install of JetPack 2.3:

bazel build -c opt --local_resources 3072,4.0,1.0 --verbose_failures --config=cuda //tensorflow/tools/pip_package:build_pip_package

I used the default settings for TensorFlow's ./configure script except to enable GPU support.

My build took at least 6 hours. It'll be faster if you use an SSD instead of a USB drive.

Thanks to Dwight Crow, @elirex, @tylerfox, everyone that helped them, and everyone in this thread for spending time on this problem.

Creating a swap file

# Create a swapfile for Ubuntu at the current directory location
fallocate -l *G swapfile
# List out the file
ls -lh swapfile
# Change permissions so that only root can use it
chmod 600 swapfile
# List out the file
ls -lh swapfile
# Set up the Linux swap area
mkswap swapfile
# Now start using the swapfile
sudo swapon swapfile
# Show that it's now being used
swapon -s

Adapted from JetsonHack's gist.

I used this USB drive to store my swap file.

The most memory I saw my system use was 7.7 GB (3.8 GB on Mem and 3.9 GB on Swap). The most swap memory I saw used at once was 4.4 GB. I used free -h to view memory usage.

Creating the pip package and installing

Adapted from the TensorFlow docs:

$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

# The name of the .whl file will depend on your platform.
$ pip install /tmp/tensorflow_pkg/tensorflow-0.9.0-py2-none-any.whl

elirex · 2016-11-29T13:35:17Z

I use bazel build -c opt --local_resources 1024,4.0,1.0 --jobs 4 --verbose_failures --config=cuda //tensorflow/tools/pip_package:build_pip_package and without allocate swap to build tensorflow r.09 on the TX1 JetPack 2.3 that pass compilation.

corenel · 2016-12-04T04:38:26Z

Could anyone build TF r0.11 on TX1 yet?

zxwind · 2016-12-05T05:34:56Z

Thanks for all the information here, got tensorflow r0.11.0 installed with Jetpack 2.3.1 on tx1. Following @elirex 's steps, make sure using the exact version of protobuf, grpc and bazel. I build tensorflow r0.11.0 instead of v0.11.0.rc2. When compiling, following @MatthewKleinsmith 's step to add swap file, you need a big swap, I tried 6G but failed in the middle with out of memory error, tried again with 10G swap file works. It took me about 5 hours for the compiling with swapfile allocated on usb drive.

loliverhennigh · 2016-12-07T15:37:47Z

Is tensorflow working correctly on the TX1, ie. able to run inference and get good results? When I installed tensorflow on a TK1 it ran just fine however the convolutional layers where producing bad results. I could train fully connected models on mnist just fine but when I tried to use conv layers it stopped converging. Is this problem persistent in the TX1 build?

nottug · 2016-12-08T02:09:13Z

Continually get this when running ./compile.sh for Bazel:
Building Bazel from scratch
gPRC Java plugin not found in

If I pull 0.2.3 I don't get the error, only with 0.3.x

ArekSredzki · 2016-12-14T20:31:07Z

@zxwind How is TF 0.11 performance working for you on the TX1?

rwightman · 2017-01-17T01:53:54Z

FYI, I've got a branch off r1.0 with some hacks to build the r1.0 release on TX1 with Jetpack 2.3.1.

In addition to the previously mentioned issues, there is a change in Eigen after the revision used on the TF r0.11 branch that causes the CUDA compiler to crash with an internal error. I changed workspace.bzl on r1.0 branch to point to the older Eigen revision. In order for that to build I had to remove the EXPM1 op that was added after r0.11. It's all rather ugly but got me up and running.

Interesting to note, with the r1.0.0a build I'm able to run inference on a Resnet50 based network at 128x96 resolution that was running out of memory on r0.11. For anyone curious on benchmark numbers, was getting approx 15fps with single frame batches.

Link to a tag on my clone of TF with binary wheels for anyone interested. The wheels will likely only work on a Jetpack 2.3.1 (L4T 24.2.1). No guarantees there aren't some serious issues but I've verified results on the networks I'm using right now.
https://github.com/rwightman/tensorflow/releases/tag/v1.0.0-alpha-tegra-ugly_hack

drpngx · 2017-01-24T02:01:03Z

Closing since @rwightman / @MatthewKleinsmith solution seems to work, though not quite a seamless out-the-box experience. Feel free to reopen.

sunsided · 2017-02-19T01:19:41Z

@rwightman May I humbly ask you to provide another wheel for the r1.0 stable version?

sumitkamath · 2017-02-23T19:15:19Z

@rwightman How were you able to build tensorflow without gRPC? Thanks!

Edit: never mind, I saw your repo : https://github.com/jetsonhacks/installTensorFlowTX1/

Thanks for setting that up.

bfreskura · 2017-03-16T20:34:13Z

@sunsided Here's the Python 3.5.2 version for TF 1.0.1 that @dkopljar and I managed to build: https://drive.google.com/open?id=0B2jw9AHXtUJ_OFJDV19TWTEyaWc

syed-ahmed · 2017-03-22T17:09:53Z

Hello all, I was able to install TensorFlow v1.0.1 on the new Jetson TX2. I had to follow similar process as mentioned above in this thread (protobuf, grpc, swapfile etc). For bazel, I downloaded bazel-0.4.5-dist.zip and applied @dtrebbien's change. Here is the pip wheel of my installation if it helps anyone. It's for Python 2.7: https://drive.google.com/file/d/0Bxl-G9VJ61mBYmZPY0hLSlFaUDg/view?usp=sharing
And here the step by step procedure: https://syed-ahmed.gitbooks.io/nvidia-jetson-tx2-recipes/content/first-question.html

@tylerfox

This change, suggested by @tylerfox at tensorflow/tensorflow#851 (comment) allows Bazel 0.4.5 to be built on a Jetson TX1 with JetPack 3.0. The other of @tylerfox's suggested changes was made in 7c4afb6. Refs #1264 Closes #2703. PiperOrigin-RevId: 152498304

slim: Typos at datasets/flowers.py

MoAbd · 2017-07-03T11:33:20Z

Hello all, I was able to install TensorFlow v1.0.1 on Tegra X1 using the build by @Barty777
Is there build availabke for TensorFlow v1.2 ?

gvoysey · 2017-08-08T21:27:39Z

@Barty777 you wouldn't happen to have 3.6 wheels, would you? 🙏

bfreskura · 2017-08-08T21:51:27Z

@gvoysey Unfortunately no. :(

pantor · 2017-08-23T16:31:10Z

Here is the wheel file for TensorFlow 1.2, Nvidia TX1 and Python 2.7: https://drive.google.com/file/d/0B-Ljdh8jFZRbTnVNdGtGMHA2Ymc/view?usp=sharing

gvoysey · 2017-08-23T16:37:55Z

i've been able to build a tensorflow wheel for python 3.6 for TX1, but i cannot build tensorflow-GPU support successfully. See https://stackoverflow.com/questions/45825708/error-building-tensorflow-gpu-1-1-0-on-nvidia-jetson-tx1-aarch64 for details.

saisankargochhayat · 2018-01-27T06:34:27Z

Sorry for the late comment, can anyone please help me regarding setting up tensorflow in Nvidia tk1?

@tylerfox

This change, suggested by @tylerfox at tensorflow/tensorflow#851 (comment) allows Bazel 0.4.5 to be built on a Jetson TX1 with JetPack 3.0. The other of @tylerfox's suggested changes was made in 7c4afb6. Refs #1264 Closes #2703. PiperOrigin-RevId: 152498304

jmtatsch changed the title ~~tensorflow for Jetson TX1~~ tensorflow for Nvidia TX1 Jan 23, 2016

benoitsteiner self-assigned this Feb 4, 2016

girving added the triaged label Jun 6, 2016

drpngx closed this as completed Jan 24, 2017

sunsided mentioned this issue Mar 4, 2017

Build fails Tensorflow 0.9, with cuda 8.0 on Ubuntu 16.04, ppc64le #3845

Closed

dusty-nv mentioned this issue Mar 9, 2017

platform does not have FP16 support. dusty-nv/jetson-inference#54

Closed

ghost mentioned this issue Mar 18, 2017

Add "aarch64" to the set of ARM CPU archs bazelbuild/bazel#2703

Closed

tarasglek pushed a commit to tarasglek/tensorflow that referenced this issue Jun 20, 2017

Merge pull request tensorflow#851 from tae-jun/patch-2

f565b80

slim: Typos at datasets/flowers.py

tensorflow for Nvidia TX1 #851

tensorflow for Nvidia TX1 #851

Comments

jmtatsch commented Jan 23, 2016

bhack commented Jan 23, 2016

bhack commented Jan 23, 2016

jmtatsch commented Jan 23, 2016

bhack commented Jan 23, 2016

jmtatsch commented Jan 23, 2016

vincentvanhoucke commented Jan 23, 2016

benoitsteiner commented Feb 4, 2016

robagar commented Feb 6, 2016

benoitsteiner commented Feb 8, 2016

benoitsteiner commented Feb 10, 2016

jmtatsch commented Feb 11, 2016

jmtatsch commented Feb 13, 2016

benoitsteiner commented Feb 23, 2016

ggaabe commented Feb 24, 2016

maxcuda commented Feb 24, 2016

jmtatsch commented Feb 27, 2016

jmtatsch commented Mar 11, 2016

martinwicke commented Mar 16, 2016

jas0n1ee commented Mar 18, 2016

jmtatsch commented Mar 20, 2016

benoitsteiner commented Mar 21, 2016

maxcuda commented May 17, 2016

robagar commented May 17, 2016

maxcuda commented May 17, 2016

maxcuda commented May 17, 2016

benoitsteiner commented May 17, 2016

tylerfox commented Jun 17, 2016

piotrchmiel commented Nov 12, 2016

elirex commented Nov 14, 2016

piotrchmiel commented Nov 14, 2016

MattKleinsmith commented Nov 25, 2016 • edited Loading

Question:

Error 1:

Error 2:

Plan:

MattKleinsmith commented Nov 25, 2016 • edited Loading

It worked

Creating a swap file

Creating the pip package and installing

elirex commented Nov 29, 2016

corenel commented Dec 4, 2016

zxwind commented Dec 5, 2016

loliverhennigh commented Dec 7, 2016

nottug commented Dec 8, 2016 • edited Loading

ArekSredzki commented Dec 14, 2016

rwightman commented Jan 17, 2017

drpngx commented Jan 24, 2017

sunsided commented Feb 19, 2017

sumitkamath commented Feb 23, 2017 • edited Loading

bfreskura commented Mar 16, 2017 • edited Loading

syed-ahmed commented Mar 22, 2017 • edited Loading

MoAbd commented Jul 3, 2017

gvoysey commented Aug 8, 2017

bfreskura commented Aug 8, 2017

pantor commented Aug 23, 2017

gvoysey commented Aug 23, 2017

saisankargochhayat commented Jan 27, 2018

MattKleinsmith commented Nov 25, 2016 •

edited

Loading

MattKleinsmith commented Nov 25, 2016 •

edited

Loading

nottug commented Dec 8, 2016 •

edited

Loading

sumitkamath commented Feb 23, 2017 •

edited

Loading

bfreskura commented Mar 16, 2017 •

edited

Loading

syed-ahmed commented Mar 22, 2017 •

edited

Loading