Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorflow not detecting GPU #64881

Closed
RetusRieben opened this issue Apr 2, 2024 · 17 comments
Closed

Tensorflow not detecting GPU #64881

RetusRieben opened this issue Apr 2, 2024 · 17 comments
Assignees
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues TF 2.16 type:build/install Build and install issues

Comments

@RetusRieben
Copy link

Issue type

Build/Install

Have you reproduced the bug with TensorFlow Nightly?

No

Source

binary

TensorFlow version

2.16.1

Custom code

No

OS platform and distribution

SUSE Linux Enterprise Server 15 SP3

Mobile device

No response

Python version

3.10.13

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

CUDA 12.4

GPU model and memory

NVIDIA A2, 16GB

Current behavior?

TensorFlow is not recognising the NVIDIA A2 GPU or any other CUDA devices on my Server.

Standalone code to reproduce the issue

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Relevant log output

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2024-04-02 08:37:38.517925: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-02 08:37:38.570940: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-02 08:37:39.624139: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-04-02 08:37:40.270665: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:282] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2024-04-02 08:37:40.270724: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:134] retrieving CUDA diagnostic information for host: HOST
2024-04-02 08:37:40.270738: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:141] hostname: HOST
2024-04-02 08:37:40.270865: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:165] libcuda reported version is: 550.54.14
2024-04-02 08:37:40.270887: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:169] kernel reported version is: 550.54.14
2024-04-02 08:37:40.270896: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:248] kernel version seems to match DSO: 550.54.14
@google-ml-butler google-ml-butler bot added the type:build/install Build and install issues label Apr 2, 2024
@Venkat6871 Venkat6871 added TF 2.16 subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues labels Apr 4, 2024
@Venkat6871
Copy link

Hi @RetusRieben ,

  • Make sure you have installed the appropriate NVIDIA drivers for your GPU. TensorFlow requires compatible NVIDIA drivers to communicate with the GPU.
  • Ensure that the user running TensorFlow has the necessary permissions to access the GPU. Sometimes, permissions issues can prevent GPU detection.
  • Sometimes, TensorFlow may be built without GPU support. You can check this by running.
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Thank you!

@Venkat6871 Venkat6871 added the stat:awaiting response Status - Awaiting response from author label Apr 4, 2024
@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Apr 4, 2024
@RetusRieben
Copy link
Author

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2024-04-04 14:42:07.986529: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-04-04 14:42:08.027002: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-04 14:42:08.826422: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-04-04 14:42:09.350663: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:282] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2024-04-04 14:42:09.350711: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:134] retrieving CUDA diagnostic information for host: Host
2024-04-04 14:42:09.350722: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:141] hostname: Host
2024-04-04 14:42:09.350809: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:165] libcuda reported version is: 550.54.14
2024-04-04 14:42:09.350834: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:169] kernel reported version is: 550.54.14
2024-04-04 14:42:09.350845: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:248] kernel version seems to match DSO: 550.54.14

the following output is generated if I run

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

My user has ROOT rights

@mhlhr
Copy link

mhlhr commented Apr 6, 2024

I tried installing Tensorflow using both conda and pip. My Nvidia Gpu is not detected when I do print(tf.config.list_physical_devices('GPU')). I had installed using python3 -m pip install tensorflow[and-cuda]. Should i try Tensorflow version 2.15.1?

@eabase
Copy link

eabase commented May 4, 2024

Same issue here.
I have Nvidia RTX 4060 Laptop, and command from above show nothing about GPU.

# python -c "import tensorflow as tf; print(tf.config.list_physical_devices())"
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]

@MysterionRise
Copy link

I experience same issue with TF 2.16.1, to be honest only workable fix for me was to rollback to 2.15.1

@eabase
Copy link

eabase commented May 15, 2024

Apparently, TF has not been compiled with CUDA for windows, which is outrageous in this day and age. All documentation state to downgrade to some other version or recompile use WSL. Both ridiculous statements just showing laziness for wanting to dive into the issue. If you can do it before, and in WSL, it should be straight forward to do it also in MSYS/MinGW.

PS. I don't roll back, I only roll forward!

@RetusRieben
Copy link
Author

Apparently, TF has not been compiled with CUDA for windows, which is outrageous in this day and age. All documentation state to downgrade to some other version or recompile use WSL. Both ridiculous statements just showing laziness for wanting to dive into the issue. If you can do it before, and in WSL, it should be straight forward to do it also in MSYS/MinGW.

PS. I don't roll back, I only roll forward!

I absolutely agree. I now also installed it also on Win11 and the only way I managed to do so was via Conda. For someone that is not allowed to use conda that is a big issue, I did not find another way. But that is a indeed ridiculous.

@MysterionRise
Copy link

@RetusRieben or pip but install TF 2.15.1 (I’m not saying it’s great solution, but at least working somehow)

Agreed with you both, that it’s just outrageous not compile libs properly. Hope it will be fixed soon

@eabase
Copy link

eabase commented May 15, 2024

I wonder if we could extract the latest TF package binaries from Conda, as it seem to work there? Maybe install Conda in VM and lookup the packages. Sad way to have to fix this though.

@mhlhr
Copy link

mhlhr commented May 15, 2024 via email

@MohanKrishnaGR
Copy link

Unfortunately, like all you, I wasn't able to access my GPU as backend in TF. I have also tried all the way of installing it -pip, .deb, .tar... It's very hard to access GPU with other versions.

At last, I found a method for accessing GPU as backend, only by using python3.11 and TensorFlow 2.15.1, other than these versions nothing suits.

try this by creating a virtual environment:

conda create -n tmpenv python=3.11
conda activate tmpenv
pip install tensorflow[and-cuda]==2.15.1

if all your GPU driver and cuda is installed, you should be able to run this command:

nvidia-smi

to check if GPU is detected by TF:

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

probably you should get:

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Let me know, if this works for you all!

@eabase
Copy link

eabase commented May 22, 2024

@MohanKrishnaGR
What OS did you install this on? (Via WSL or native conda on Win?)

That is the annoying fact.

  1. You can install with Conda which is totally Python made. So Conda installs some (unknown, but working) CUDA packages.
  2. Python 11, is can't be that different from 12, so what changed and how to fix?

If possible, can you please help by:

  • Can you check to see if you can find the python packages installed by Conda?
  • What is the exact output of your pip install tensorflow[and-cuda]==2.15.1?
    A log would be great! Can you paste it here within tripple back quotes (```)?
  • Can you search you python installation directory for *.whl?
  • Try to install using:
    python -m pip install --only-binary :all: tensorflow[and-cuda]==2.15.1
    And take note of download URLs and output.

References:
https://packaging.python.org/en/latest/discussions/package-formats/
https://realpython.com/python-wheels/
https://opensource.com/article/23/1/packaging-python-modules-wheels

@mhlhr
Copy link

mhlhr commented May 22, 2024 via email

@eabase
Copy link

eabase commented May 22, 2024

@mhlhr There's no point of telling us it is cryptic, unless you also provide the message... 😉

@Venkat6871 Venkat6871 added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jun 12, 2024
@Venkat6871
Copy link

@learning-to-play

@mihaimaruseac
Copy link
Collaborator

mihaimaruseac commented Jun 12, 2024

This duplicates #63362 (edit: changed to right issue)

Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues TF 2.16 type:build/install Build and install issues
Projects
None yet
Development

No branches or pull requests

7 participants