New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable building with CUDA support on Mac OS X #664
Conversation
|
Nice! I think this is clearly and improvement over the existing one. I tried on my macbook with cuda and it works. I also published a small tutorial on how to use this patch]. Hope it can be merged soon. @vrv @mrry thoughts ? |
|
Thanks for trying this out @Mistobaan and writing a tutorial! I'd add a step for installing GNU coreutils ("brew install coreutils") to the tutorial - most people probably don't have it installed. |
|
This is very nice! Thanks for this contribution -- we'll have @zheng-xq take a look at this soon. |
|
@ville-k good point. Updated :) |
|
@leary-google, could you review the stream-execuctor portion of this change? |
|
@Mistobaan I followed your instructions, but "brew cask install cuda" defaults to CUDA 7.0, and @ville-k patch is using 7.5 by default. I tried both. With CUDA 7.0, I get a compile error as such: By manually downloading CUDA 7.5 and installing it, it compiles. |
|
@ville-k In testing your change I find that the newly added ALT_PATH doesn't really work. If you have I think having the ALT_PATH stuff is hard to get right. |
|
@fpmchu Thanks for testing the ALT_PATH build scenario for cuDNN! The fix for the problem you discovered turned out to be pretty simple - I have it on a separate branch for now: @vrv What's the project's policy for issues found and fixed during PR review? New commits on the existing PR or open a separate PR? |
|
New commits on existing PR seems fine -- we'll just ask to squash the commits prior to validation and merging. |
|
@ville-k Cool. While the fix looks ok, I'm still not understanding the purpose of adding ALT_PATH. Is it just to allow users to put them in /usr/local/lib? What's wrong with using /usr/local/cuda/lib? |
|
@fpmchu ALT_PATH is there to support existing lib search path functionality for both linux and mac. The original configure and cuda_config.sh scripts look for cudnn.so.6.5 under both "/usr/local/cuda" and "/usr/local/cuda/lib64". Depending on the platform, these locations will now be searched if the user inputs "/usr/local/cuda" as the cudnn install dir:
Mac
|
|
Thanks @fpmchu to try the tutorial out. I think it installed 7.0 because you have an old cuda formula. I updated the tutorial to suggest to update homebrew first and check for the cuda version. |
|
Thanks @Mistobaan. I actually think you mean |
|
Because of peculiarities in our internal build process, we won't be able to merge this right now. I'll leave this open since it may be useful for people. When we find someone to resolve the internal problems, we may be able to absorb it at a later time. I'm sorry about that -- I would love to have this in. |
|
Can one of the admins verify this patch? |
|
Thanks for the update @martinwicke ! Is the main issue causing problems with your internal build process the automatic generation of the "platform.bzl" file? |
|
That, and reuse of shared code elsewhere. The code elsewhere can be updated
|
|
I get a segfault when the cuda libs are not setup properly due to |
|
I eventually did get this working but the version of Eigen referenced in here is very broken. Eigen HEAD (fd9611fa2d9c) does work aside from a nvcc build break in TensorIntDiv.h, Previous failure looked like this:
After a bit of hunting around it turns out that the mutex instances had already been destructed after someone called And it turns out that all the |
|
I'm not able to reproduce the issue with LD_LIBRARY_PATH you mentioned @NathanHowell Which version of OSX and Python are you using? |
|
@ville-k the segfault might have been from an older version of Xcode, I upgraded to 7.2 trying to track down the other issue and I think it's been fixed... but it should use |
|
@NathanHowell I was surprised about LD_LIBRARY_PATH working too when I stumbled into it working by accident. Apple added this at some point for UNIX conformance - they're checking both env vars nowadays: |
|
Worked for me as well with @Mistobaan 's instructions. Note that |
|
@ville-k would you mind terribly rebasing? On linux, updates to bazel have created a number of installation issues where fixes were rolled into the git in the last few weeks. |
| fi | ||
| if [ -e "$CUDNN_INSTALL_PATH/libcudnn.so${TF_CUDNN_EXT}" -o -e "$CUDNN_INSTALL_PATH/lib64/libcudnn.so${TF_CUDNN_EXT}" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the error here is:
ERROR: /workspace/third_party/gpus/cuda/BUILD:94:12: in srcs attribute of cc_library rule //third_party/gpus/cuda:cudnn: file '//third_party/gpus/cuda:lib64/libcudnn.so.' is misplaced here (expected .cc, .cpp, .cxx, .c++, .C, .c, .h, .hh, .hpp, .hxx, .inc, .S, .s, .asm, .a, .pic.a, .lo, .pic.lo, .so, .dylib, .o or .pic.o).
Prior to your change, if the user did not set a version, it would default to libcudnn.so, which is a symlink to whatever the latest installed version is (and this is what our tests use). Now it's being set to libcudnn.so.${TF_CUDNN_VERSION} and that variable is empty.
Is it possible to switch back to the old mode of doing things, where the default is the empty version and the user can specify a specific version (OS X users will manually type in 4 and 7.5 for the versions). That will also make sure that existing users who depend on the current behavior aren't broken by this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a change that makes the *_library_path functions in platform.bzl handle an empty version string. I'll take a look tomorrow to see if I can make the configure script work the way it did before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vrv My latest commit restores the original behavior of defaulting to the symlinked cuda libraries when version is left empty in configure.
|
Good work on the update. There may still be an issue with cudnn v5 when packaged/installed with pip. For both cudnn v4 and v5, the compile completes successfully. Invoking an example trainer,
However, when packaged and installed with pip, a compile with cudnn 4 works as expected but cudnn 5 will fail on load of the cudnn.5.dylib library:
This is the point at which libcudnn.5.dylib loads. Since cudnn 5 works with the example trainer, the packaging must be breaking something, perhaps a misplaced non-versioned symlink? Odd that cudnn 4 works though. Perhaps a misplaced non-versioned symlink? |
|
@markb729: right now we don't package one binary that works with both cudnn4 and 5, because the APIs are different. At some point I think it would be conceivable to implement the stream executor in such a way that different cudnn versions are different implementations of the same interface, and we dispatch exactly once during initialization to the right one. |
| if PLATFORM == "Darwin": | ||
| return "lib/lib{}.{}.dylib".format(name, version) | ||
| else: | ||
| return "lib64/lib{}.so.{}".format(name, version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem is that you are assuming the version is always set -- but we allow people to use the unversioned library (libfoo.so, not libfoo.so.version).
I'll kick off a test just to make sure, but I still think this may not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you were viewing an outdated diff - this was fixed in a previous commit: 0933043
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I still am terrible at github.
|
@tensorflow-jenkins test this please |
| elif test -e ${CUDNN_INSTALL_PATH}/include/cudnn.h; then | ||
| CUDNN_HEADER_PATH=${CUDNN_INSTALL_PATH}/include | ||
| elif test -e /usr/include/cudnn.h; then | ||
| CUDNN_HEADER_PATH=/usr/include |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this possibly the cause of the test failure?
ERROR: cannot find cudnn.h under: /usr/lib/x86_64-linux-gnu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I bet! looks like I rebased that final elif out of existence:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vrv should be fixed now .
|
One more try! test this please |
|
Woohoo!! thank you so much for this contribution. We'll try our best to keep it working, though without OS X / GPU test machines, we can't promise too much. |
|
@vrv Awesome! I really appreciate your thoughtful feedback and help in figuring out the build issues! |
|
Finally! Thanks for all the hard work! I will get an external GPU so we can test this and make sure it continues to work. |
|
Nice ! |
|
Hi guys, I'm new here. I was wondering if there shouldn't be a "MacOS GPU Tests" in the lists of tests being run at http://ci.tensorflow.org to make sure the new CUDA/GPU functionality keeps working? |
|
Yes. We're installing hardware for that over the weekend. A test should
|
|
I'm trying to build with CUDA 8.0, CuDNN 5.0, and clang-703.0.31 (CUDA test projects seem to build just fine). I get the following error: INFO: Found 1 target... Any ideas? Edit: Oops! Never mind, I forget to install coreutils. I'm running into more trouble with CUDA 8.0, but it looks like these are known issues not related to this ticket. |
|
Since Apple does not sell any computers with Nvidia GPUs, could you tell us what hardware you are using this with ? Is is some sort of Thunderbolt-attached GPU enclosure with an Nvidia card in it ? |
|
Once-upon-a-time Apple did provide computers with NVIDIA chips. I have a 2012 MacBook Pro with a 650M. |
|
Yes. I have the same the same MacBook pro with 650m card. |
|
I have a 2014 MacBook Pro with a 750m card which is still acceptably recent. |
|
@martin-gorner Apple sells refurbished laptops with NVidia, ie, http://www.apple.com/shop/product/FE294LL/A/refurbished-154-inch-macbook-pro-23ghz-quad-core-intel-i7-with-retina-display |
|
Has anyone tried this with GPU in a Thunderbolt enclosure ? This video seems to imply that CUDA works well in that situation: https://youtu.be/Bsf9lHM8qLk |
|
Our (new) Mac GPU tests run in just such a setup: Mac Pro + Quadro M4000 in
|
i have the same mac pro. do you have it using gpu?i had been trying for 3 days trying to find good tutorials to make it work but nothing. |
native readlink command behaving differently from the GNU version - you can
install it using homebrew: "brew install coreutils"
incompatibility - the toolkit versions (CUDA & cuDNN) are now controlled by
variables set in the configure script
overcome bazel limitations for using "select" to set the platform
specific names and paths of CUDA libraries
is not yet supported by the version of clang shipped by Apple. __thread
supports only primitive types, but is more performant