Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignoring visible gpu device (device: 0, name: GeForce GTX 780M compute capability: 3.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5. #46653

Closed
eagle-hub opened this issue Jan 25, 2021 · 12 comments
Assignees
Labels
comp:gpu GPU related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 2.4 for issues related to TF 2.4 type:build/install Build and install issues

Comments

@eagle-hub
Copy link

eagle-hub commented Jan 25, 2021

Please make sure that this is a build/installation issue. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:build_template

System information

  • OS Platform and Distribution: Linux Ubuntu 18.04
  • TensorFlow installed from (source or binary): From source
  • TensorFlow version: 2.4.1
  • Python version: 3.8.5
  • Installed using virtualenv? pip? conda?: conda
  • Bazel version (if compiling from source): bazel 3.1.0
  • GCC/Compiler version (if compiling from source): gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
  • CUDA/cuDNN version: 10.1/8
  • GPU model and memory: dual GeForce GTX 780M with 2 GB each

Describe the problem

I built tensorflow (version 2.4.1) from source becuase my old gpu compute capability is 3.0.
I followed the instructions from https://www.tensorflow.org/install/source
and from https://medium.com/@mccann.matt/compiling-tensorflow-with-cuda-3-0-support-42d8fe0bf3b5

I did the build process 5 times, every time I change some parameters in ./configure
I also manualy edited .tf_configure.bazelrc (suggested parameters from #24126 (comment)) to turn XLA off by
removing --config=XLA
and
adding the line build --define with_xla_support=false

and every time I run python to check if tensorflow is using my gpu, I got False as shown below.

>>> tf.test.is_gpu_available(True,3.0) 2021-01-25 09:21:44.146701: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-01-25 09:21:44.148484: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 780M computeCapability: 3.0 coreClock: 0.797GHz coreCount: 8 deviceMemorySize: 3.94GiB deviceMemoryBandwidth: 149.01GiB/s 2021-01-25 09:21:44.148617: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-01-25 09:21:44.150416: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 1 with properties: pciBusID: 0000:07:00.0 name: GeForce GTX 780M computeCapability: 3.0 coreClock: 0.797GHz coreCount: 8 deviceMemorySize: 3.94GiB deviceMemoryBandwidth: 149.01GiB/s 2021-01-25 09:21:44.150466: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1 2021-01-25 09:21:44.150508: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10 2021-01-25 09:21:44.150535: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10 2021-01-25 09:21:44.150559: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2021-01-25 09:21:44.150585: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2021-01-25 09:21:44.150611: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10 2021-01-25 09:21:44.150638: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10 2021-01-25 09:21:44.150664: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 2021-01-25 09:21:44.150773: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-01-25 09:21:44.152515: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-01-25 09:21:44.154623: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-01-25 09:21:44.156318: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1816] Ignoring visible gpu device (device: 0, name: GeForce GTX 780M, pci bus id: 0000:01:00.0, compute capability: 3.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5. 2021-01-25 09:21:44.156445: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-01-25 09:21:44.158101: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1816] Ignoring visible gpu device (device: 1, name: GeForce GTX 780M, pci bus id: 0000:07:00.0, compute capability: 3.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5. 2021-01-25 09:21:44.158151: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-01-25 09:21:44.158161: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0 1 2021-01-25 09:21:44.158167: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N Y 2021-01-25 09:21:44.158173: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 1: Y N False
So I did not know what do now.

@eagle-hub eagle-hub added the type:build/install Build and install issues label Jan 25, 2021
@Saduf2019 Saduf2019 added TF 2.4 for issues related to TF 2.4 comp:gpu GPU related issues labels Jan 25, 2021
@Saduf2019
Copy link
Contributor

@eagle-hub
Please share simple indented stand alone code to replicate the issue or if possible share a colab git with the error reported.

@Saduf2019 Saduf2019 added the stat:awaiting response Status - Awaiting response from author label Jan 25, 2021
@eagle-hub
Copy link
Author

eagle-hub commented Jan 25, 2021

@Saduf2019
`
import tensorflow as tf

tf.test.is_gpu_available()

tf.config.list_physical_devices('GPU')

tf.test.is_gpu_available(True,3.0)

tf.test.is_gpu_available()
`
I used jupyter to run code and also from terminal. I will attach 2 files for both running.
Please note that I'm running the code on my machine as I mentioned in the first comment.

py_tf
py_tf_2

@eagle-hub
Copy link
Author

@tensorflowbutler @Saduf2019
Any advice on how to build from source and support GeForce GTX 780M compute capability: 3.0

@Saduf2019 Saduf2019 removed the stat:awaiting response Status - Awaiting response from author label Jan 28, 2021
@Saduf2019 Saduf2019 assigned ymodak and unassigned Saduf2019 Jan 28, 2021
@ymodak
Copy link
Contributor

ymodak commented Mar 23, 2021

Unfortunately we don't have an exhaustive guide for TF build with cuda compute 3.0
I think reaching out to community (SO) can be your best bet here.
I found one article that may help in your case.

@ymodak ymodak added the stat:awaiting response Status - Awaiting response from author label Mar 23, 2021
@google-ml-butler
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

@google-ml-butler google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Mar 30, 2021
@ymodak ymodak closed this as completed Apr 2, 2021
@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@alimoche
Copy link

I have the same problem but I tried with version 2.3.2 since it supports cuda 10.1 that works with compute capability 3.0.

In my case I followed [https://github.com//issues/27840] to turn off XLA and edited .tf_configure.bazelrc, after running ./configure script, and appended --define with_xla_support=false to build:xla line.

After a successful compilation I run the example:

python3 -c "import tensorflow as tf; print(\"Num GPUs Available: \", len(tf.config.list_physical_devices('GPU')))"

and the output:

...
... I tensorflow/core/common_runtime/gpu/gpu_device.cc:1812] Ignoring visible gpu device (device: 0, name: Quadro K600, pci bus id: 0
000:04:00.0, compute capability: 3.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5. 
Num GPUs Available:  0

I gave it a second try but this time I removed build:xla line and added

build --define with_xla_support=false
build --action_env=TF_ENABLE_XLA=0

unfortunately with the same result.

Is there a proper way to disable XLA in v2.x?
is build --define=with_xla_support=false the same as build --define with_xla_support=false?

Also notice that ./configure script explicitly says that compute capability 3.0 is not supported

Please note that each additional compute capability significantly increases your build time and binary size,
and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 3.0]: 3.0


WARNING: XLA does not support CUDA compute capabilities lower than 3.5. Disable XLA when running on older GPUs.

does this mean that v2.x can never run on a gpu with compute capability 3.0?

@ymodak answer does not help because the article is for v1.x and v1.x is known to work with cc 3.0

@alimoche
Copy link

alimoche commented Apr 26, 2021

I solved the problem in my case using tensorflow 2.1.3 + cuda 10.1 + cudnn 7.6.5 + bazel 0.27.2:

...
2021-04-26 19:28:16.651806: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but the
re must be at least one NUMA node, so returning NUMA node zero 
2021-04-26 19:28:16.652966: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0 
Num GPUs Available:  1

looking at the code in tensorflow/core/common_runtime/gpu/gpu_device.cc it should also work with v2.2 and for greater versions it might work defining TF_EXTRA_CUDA_CAPABILITIES in bazel command line

--copt=-DTF_EXTRA_CUDA_CAPABILITIES=3.0

@alimoche
Copy link

This is to confirm that the following recipe solved this problem in my case with tensorflow 2.3.2 + cuda 10.1 + cudnn 7.6.5 + bazel 3.10:

  1. disable XLA (not supported with compute capability 3.0) by removing the line build:xla in .tf_configure.bazelrc and adding:
build --define=with_xla_support=false 
build --action_env TF_ENABLE_XLA=0
  1. for tensorflow > 2.2 the code changed and it is necessary to define TF_EXTRA_CUDA_CAPABILITIES in bazel command line:
--copt=-DTF_EXTRA_CUDA_CAPABILITIES=3.0

@samhuang-sg
Copy link

Which version of gcc are you using? Another question: is it bazel 3.1.0? I can't find bazel 3.10

This is to confirm that the following recipe solved this problem in my case with tensorflow 2.3.2 + cuda 10.1 + cudnn 7.6.5 + bazel 3.10:

  1. disable XLA (not supported with compute capability 3.0) by removing the line build:xla in .tf_configure.bazelrc and adding:
build --define=with_xla_support=false 
build --action_env TF_ENABLE_XLA=0
  1. for tensorflow > 2.2 the code changed and it is necessary to define TF_EXTRA_CUDA_CAPABILITIES in bazel command line:
--copt=-DTF_EXTRA_CUDA_CAPABILITIES=3.0

@alimoche
Copy link

Which version of gcc are you using? Another question: is it bazel 3.1.0? I can't find bazel 3.10

gcc (Debian 8.3.0-6) 8.3.0

yes it is a typo, I used bazel 3.1.0

@samhuang-sg
Copy link

Thanks for the prompt reply!

Which version of gcc are you using? Another question: is it bazel 3.1.0? I can't find bazel 3.10

gcc (Debian 8.3.0-6) 8.3.0

yes it is a typo, I used bazel 3.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:gpu GPU related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 2.4 for issues related to TF 2.4 type:build/install Build and install issues
Projects
None yet
Development

No branches or pull requests

5 participants