-
Notifications
You must be signed in to change notification settings - Fork 74.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compatibility with Cuda 10.1? #26289
Comments
@jiapei100 For Faster resolutions, Please provide the details requested at issue template here. Please check tested build configurations here. Tensorflow 1.12 was built with CUDA9.0. So when you install some of the paths in different modules are looking for CUDA9.0. If you want to use CUDA10 or CUDA 10.1, use latest version of TF and check whether CUDA and cuDNN are referencing correct paths. First, uninstall python and tensorflow and reinstall following the instructions here but use new version of CUDA and cuDNN. Please let me know how it progresses. Thanks! |
Since based on cublas issues and cublas new features. the cublas is in the seperate directory and therefore you may could use soft link to temporarily fix it. |
I meet the same problem about cublas. Path |
So, how to modify the code and complie the path for CUDA 10.1. |
Symlinking doesn't work, try build tensorflow master
Besides, all libcu***.so.10.1 are missing, not just libcublas, it is a pain to have to create symlink for all these even if symlinking works. |
I have the same problem using tensorflow 1.13.1 and CUDA 10.1. Use of symlinks 'worked' for me in so far as there were no path errors - however attempting to run the tensorflow sanity check results in a core dump, presumably as there are genuine incompatabilities between those versions of tensorflow and CUDA
EDIT: the following link commands fixed the path errors
|
@beew |
If you are heading down this particular rabbit hole, you will find #26155 relevant. |
Interestingly, got it working on windows, with an unsupported compiler - #28086 |
This is due to conflicting behaviour at configure/build time and run time. Have a look at the patch in #28093 for a temporary workaround. |
Dear I install tensorflow-gpu using pip3. the error that i get when i run: File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in During handling of the above exception, another exception occurred: Traceback (most recent call last): Failed to load the native TensorFlow runtime. See https://www.tensorflow.org/install/errors for some common reasons and solutions. Include the entire stack trace can you please help me? Cheers |
If you want to use TensorFlow with CUDA 10.1, you currently need to build it from source. The binaries we ship are built for CUDA 10.0. |
Closing this issue since chsigg's explanation addresses the issue. Feel free to reopen if have any further questions. Thanks! |
chsigg's suggestion is not helpful. Most of us are building from source. |
Please re-open this issue. |
|
The error you are seeing is an unrelated bug that will be fixed shortly. Or
you can use a commit from 2 days ago for now.
…On Wed, Jun 12, 2019, 03:16 Lei Mao ***@***.***> wrote:
`
tensorflow/lite/delegates/nnapi/nnapi_delegate.cc: In static member
function 'static void
tflite::StatefulNnApiDelegate::DoFreeBufferHandle(TfLiteContext*,
TfLiteDelegate*, TfLiteBufferHandle*)':
tensorflow/lite/delegates/nnapi/nnapi_delegate.cc:2136:31: warning:
comparison between signed and unsigned integer expressions [-Wsign-compare]
if (*handle >= 0 && *handle < delegate_data->tensor_memory_map.size()) {
~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ERROR:
/home/leimao/.cache/bazel/_bazel_leimao/b5222491e5f5e6954d481a492cdf3412/external/nccl_archive/BUILD.bazel:54:1:
undeclared inclusion(s) in rule ***@***.***_archive//:device_lib':
this rule is missing dependency declarations for the following files
included by 'external/nccl_archive/src/collectives/device/functions.cu.cc
':
'external/nccl_archive/src/collectives/device/common.h'
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 101.561s, Critical Path: 19.52s
INFO: 513 processes: 513 local.
FAILED: Build did NOT complete successfully
`
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#26289?email_source=notifications&email_token=ABZM5DXBBSGF4XNOXVQGXADP2BE5HA5CNFSM4G3LCTW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXO6EKI#issuecomment-501080617>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABZM5DSNAHFSDXGKHERD5ADP2BE5HANCNFSM4G3LCTWQ>
.
|
I agree that it sucks for interfaces to change in a point release, and it's
too bad nvidia hasn't released a working 10.0 on rhel/centos 8. At the
same time, it's been 8 months since 10.1 came out. Some timeline for a TF
update would be nice.
…On Sun, Oct 27, 2019, 12:17 PM Roger Weihrauch ***@***.***> wrote:
Hi all
Since my own experiences on trying and failing installations with many
different combinations of PythonX.x.x, PipX, Debian 9.x/10.x and CUDA
10.0.x/10.1.x, I think this is NVIDIA's part on solution. Since the actual
errors with latest AND legacy releases of TF (as described from many users
above) can only be solved by hands-on doing symlinking, the actual quality
level NVIDIA shows on its heroical CUDA env is a very high crappy/shitty
one. So, last but not least, me (and maybe some of you also) are damned to
stay on Kernel version maximum 4.19 (best 4.9, does also not run with
kernel 5.2.x), CUDA10.0.x incl. patch, cuDNN7.4.2 (i think is the latest
for CUDA10.0.x) and system based pip version. Also, focusing on those
versions, I get cuda up and running using the GPU on TF; nevertheless, also
this version/installation shows missing some libraries after setting up
CUDA, all necessary device files, shown be CUDA10.0.x version, are missing
in CUDA10.1.x, which are NOT included. Also the NVIDIA_Samples will not be
compiled completely since missing depending libs and programs.
So, in my eyes NVIDIA is doing a very bad and unuseful job here. They are
going to scare away their users from CUDA this way.
That is only my opinion on this.
Regards,
Roger
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#26289?email_source=notifications&email_token=AA2AYWMSERUHFXTQU7EFMBLQQXLKLA5CNFSM4G3LCTW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECLFSJI#issuecomment-546724133>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA2AYWOJ5PTHCE7MFODA3NTQQXLKLANCNFSM4G3LCTWQ>
.
|
I would also like to point out (in addition to everything that is already said) that PyTorch officially only displays CUDA 10.1 support. For someone like me who often end up having TF and PyTorch in same environment, this makes it very difficult to have them install side by side. Fortunately PyTorch builds have been superbly flexible with CUDA and I eventually figured out that merely including |
In addition to what @sytelus said, the exact statement is: |
Unfortunately, the issue I'm currently running into is that cuda 10.0
doesn't even exist for centos 8. Perhaps centos / redhat was not the most
convenient linux distro to choose, but switching to a different distro
would be even more inconvenient at this point.
…On Wed, Nov 6, 2019 at 8:57 PM Jeffrey Wardman ***@***.***> wrote:
In addition to what @sytelus <https://github.com/sytelus> said, the exact
statement is:
conda install pytorch torchvision cudatoolkit=10.0 python==X -c pytorch
where X is your current python version if you don't want to upgrade it.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#26289?email_source=notifications&email_token=AA2AYWKPRR3MWPUEVBDVP4LQSON4RA5CNFSM4G3LCTW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDJ3EOA#issuecomment-550744632>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA2AYWKWEN336LADF5HCCD3QSON4RANCNFSM4G3LCTWQ>
.
|
We are working on updating TF builds to CUDA 10.1. |
Release a timeline. So that we can adjust accordingly. Simply staying mum for 6 months is unacceptable. We also have jobs. |
Looking at #34327 seems like TF team already solved the compatibility issue. Hopefully the 2.1 release happens soon |
As an intermediary workaround, I had no issues symlinking the affected libraries to *.so.10.0. |
Very unprofessional attitude towards managing a software. |
Nightly release builds and r2.1 release are built against CUDA 10.1. I'm going to close this issue. @dexception, I agree I could have handled this better. Sorry. |
|
TF 2.0 release will probably stay on CUDA 10.0. You can build TF with CUDA 10.1 from any branch, or use nightly releases before TF 2.1 is released. |
Are you sure about that? But I build from source on branch: r1.13 with the cuda 10.1, and obviously it does not work.
Could you help me, to verify tf-1.* work with cuda10.1? To be precise, I compiled with XLA=on (tensorRT=off). If turn it off, everything just fine. Thus i thought the above XLA related problem is not big and solvable. Please check it. |
TF r1.13 does not build with CUDA 10.1. Please use r1.15 or r2.x. |
9 months and Tensorflow is still catching up. Oh boy ! Is this library too complicated to work with ? |
Version 2.1 uploaded yesterday. |
@dishkakrauch What's about CUDA 10,2? |
You can always compile from source to get compatibility with other CUDA versions. We only provide a subset of the infinite amount of combinations that you can build with (python version, operating system version, cuda version, compiler version, libraries/dependencies versions, etc.) For all other cases all that we can do is provide instructions on how to build from source and let community do the builds. |
Just downgraded to |
I have made a script for install tf 1.12 with CUDA 10.1 in Ubuntu. |
For folks who want to use tensorflow 1.xx with CUDA 10.1, I find conda has the correct compilation of tensorflow against CUDA 10.1. Just install tensorflow 1.xx with conda within your conda environment. E.g., I install with the following command |
I "fixed" by changing this line to always return false def _should_check_soname(version, static):
return False tensorflow/third_party/gpus/cuda_configure.bzl Lines 478 to 479 in 5bc7265
|
Just wanna compile tensorflow 1.12 against Cuda 10.1 ?
Any suggestions?
The text was updated successfully, but these errors were encountered: