Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEON detection on aarch64 / TX1 #766

Closed
dusty-nv opened this issue Sep 16, 2016 · 20 comments
Closed

NEON detection on aarch64 / TX1 #766

dusty-nv opened this issue Sep 16, 2016 · 20 comments

Comments

@dusty-nv
Copy link

dusty-nv commented Sep 16, 2016

Torch is failing to build on ARMv8 (Tegra X1) under Ubuntu 16.04 aarch64.
During configuration:
-- Could not find hardware support for NEON on this machine.

Then while building:

[ 46%] Building C object lib/TH/CMakeFiles/TH.dir/THVector.c.o
In file included from /tmp/luarocks_torch-scm-1-6670/torch7/lib/TH/THVector.c:2:0:
/tmp/luarocks_torch-scm-1-6670/torch7/lib/TH/generic/THVectorDispatch.c: In function ‘THByteVector_vectorDispatchInit’:
/tmp/luarocks_torch-scm-1-6670/torch7/lib/TH/generic/simd/simd.h:60:3: error: impossible constraint in ‘asm’
   asm volatile ( "cpuid\n\t"
   ^
/tmp/luarocks_torch-scm-1-6670/torch7/lib/TH/generic/simd/simd.h:60:3: error: impossible constraint in ‘asm’
   asm volatile ( "cpuid\n\t"
   ^
lib/TH/CMakeFiles/TH.dir/build.make:350: recipe for target 'lib/TH/CMakeFiles/TH.dir/THVector.c.o' failed

It looks like FindARM.cmake searches /proc/cpuinfo for 'neon'. However, ARMv8 output is the following:

Features        : fp asimd aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x1
CPU part        : 0xd07
CPU revision    : 1
Hardware        : jetson_tx1

It appears like other [1][2] ARMv8 platforms may also list asimd in leui of neon in their /proc/cpuinfo.

@dusty-nv
Copy link
Author

dusty-nv commented Sep 16, 2016

Note: after manually patching FindARM.cmake to set NEON_TRUE and omitting -mfpu=neon (as mentioned here) from CMakeLists, I encounter the same errors from issue 763. Anyone know a workaround for now?

@soumith
Copy link
Member

soumith commented Oct 2, 2016

@dusty-nv #763 is now fixed in master

@soumith
Copy link
Member

soumith commented Oct 2, 2016

if you can send a patch to FindARM.cmake to work for armv8, it would be much appreciated.

@dusty-nv
Copy link
Author

dusty-nv commented Oct 2, 2016

Thanks Soumith. Try these patches below for FindARM.cmake and lib/TH/CMakeLists are a bit crude, because I still define NEON cflags directive on ARMv8 (but without -mfpu=neon) so it compiles the rest of the code normally:

FindARM-patch.txt
CMakeLists-patch.txt

Let me know if/when you commit to master so I can re-build from master.

@rtarquini
Copy link

Did a pull today 10/14/2016. There is still are problems on the X1 build.

(1) /pkg/torch/lib/TH/CMakeLists.txt
FIND_PACKAGE(ARM) - does not result in NEON_FOUND.
i.e the default case does not set -D__NEON___
- I'm only building for the X1/64 bit, so I just always define this
(2) /pkg/torch/lib/TH/vector/NEON.c
Assembly instructions are invalid for ARM64.
- reverted to generic 'C' functions from previous ThVector.c
I can post this file if interested.
(3) Build processes are being killed. (Low memory?)

Not sure how to resolve #3? Ideas?

-Rich

@rtarquini
Copy link

Replacing the ../luarocks make with ../luarocks install resolves the memory issues with the problematic rocks such as cutorch. The rock specs in the distribution need to be refreshed.

@soumith
Copy link
Member

soumith commented Oct 16, 2016

@rtarquini fixed via torch/cutorch@ac40c05

@dusty-nv
Copy link
Author

@rtarquini can you post your ThVector.c? I'm getting invalid assembly instructions too after forcing NEON_FOUND with latest pull.

@rtarquini
Copy link

ThVector.c includes ./vector/NEON.c. That is the file with the mods.

NEON.c.txt

@dusty-nv
Copy link
Author

Thanks @rtarquini, I got master building on aarch64 again with the file. On TX1 it still requires swap and enough disk space in /tmp for me with jopts=1, when cutorch begins compiling large templates in THCTensorMathPointwise.cu.

@Atcold
Copy link
Contributor

Atcold commented Nov 3, 2016

Installation on TX1 still breaks (fixed with the two patches and the new NEON.c) and it's taking ages (it's been 4 hours that I've been trying to install cutorch). Installing an older version does take a minute or two.
For the swap, I just plugged in a SSD as "additional RAM".
This is a nightmare, especially with deadlines coming up too soon.

@hit1001
Copy link

hit1001 commented Nov 7, 2016

While trying out on aarch64 platform i am still getting these errors.

@Atcold
Copy link
Contributor

Atcold commented Nov 7, 2016

@hit1001, you need to cd into torch7 repo, and patch <CMakeLists-patch.txt, replace lib/TH/vector/NEON.c with NEON.c, cd into torch7/lib/TH/cmake/ and run patch <FindARM-patch.txt.
You will also need to create a swap file. See this for a nice tutorial about the how.

@soumith
Copy link
Member

soumith commented Nov 7, 2016

@Atcold if you hav ea solution to this, can you please send in a PR? I dont have an AArch64 to test out, and hence was unable to fix it myself.

@Atcold
Copy link
Contributor

Atcold commented Nov 7, 2016

Oh, OK. I didn't know @soumith.
The problem is that it takes one whole night to install cutorch. I'm not sure this is a feasible solution. Moreover, one needs an external SSD drive (swapping on USB is a suicide).

@hit1001
Copy link

hit1001 commented Nov 11, 2016

@Atcold, i am trying to install torch/distro. so the files you mentioned lies in /pkg/torch/TH ..
The thing are torch7 and cutorch additional packages apart from those in directory pkg? after patching it still doesnt build up for me.
If yes, how do i install them? @dusty-nv :_ Do i follow the CMakePrebuild.sh for torch7?

@Atcold
Copy link
Contributor

Atcold commented Nov 15, 2016

@hit1001, this is my ~/torch/pkg/torch repo on my TX1.
status
You can simply apply this patch.txt.
Let me know if you need further assistance.

Here is my current setup.
img_20161115_143233

@soumith
Copy link
Member

soumith commented Nov 15, 2016

@Atcold i would appreciate it if you send a PR so that folks dont need to manually apply this patch. I cant create the PR, as I cannot test the ARM64 side.
Thanks.

@Atcold
Copy link
Contributor

Atcold commented Nov 15, 2016

@soumith, I am not the author of the code. I've just used @dusty-nv and @rtarquini work. Shall I send a PR citing the authors? I am also not aware if this breaks anything else.
Please, instruct.

@soumith
Copy link
Member

soumith commented Nov 15, 2016

yes, please do. we will catch other issues in contbuild

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants