Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorflow does not recognice GPU no matter what #65269

Closed
6CRIPT opened this issue Apr 8, 2024 · 5 comments
Closed

Tensorflow does not recognice GPU no matter what #65269

6CRIPT opened this issue Apr 8, 2024 · 5 comments
Assignees
Labels
subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues TF 2.16 type:build/install Build and install issues

Comments

@6CRIPT
Copy link

6CRIPT commented Apr 8, 2024

Issue type

Build/Install

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

binary

TensorFlow version

2.16.1 or 2.12.0 or 2.10.0

Custom code

Yes

OS platform and distribution

Ubuntu 22.04

Mobile device

No response

Python version

3.9.19

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

CUDA 11.5

GPU model and memory

i7 12700k, 32 GB RAM

Current behavior?

So I am trying in all the ways that are written on the internet to tensorflow detect my GPU. I am currently using a conda enviroment on WSL2. I was following this guide: https://medium.com/@mass.thanapol/tensorflow-with-gpu-on-linux-or-wsl2-10b02fd19924 and then this: https://www.tensorflow.org/install/pip?hl=es-419#windows-wsl2 I have spent like 15 hours of trying tensorflow detect my GPU. I really really really need help. I am using it to my final project of CS. Thanks in advice.

Standalone code to reproduce the issue

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Relevant log output

First, the nvidia-smi output:
(tfg_linux) cesar@CesarPC:~$ nvidia-smi
Tue Apr  9 00:05:08 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.73.01              Driver Version: 552.12         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4060        On  |   00000000:01:00.0  On |                  N/A |
|  0%   31C    P8             N/A /  115W |    1015MiB /   8188MiB |      4%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A    237281      C   /python3.9                                  N/A      |
+-----------------------------------------------------------------------------------------+

And this the output of nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

and for the code:
(tfg_linux) cesar@CesarPC:~$ python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2024-04-09 00:06:41.417626: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-09 00:06:41.445626: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-09 00:06:41.819579: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-04-09 00:06:42.269409: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-04-09 00:06:42.281813: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]

Also if I execute from this guide https://www.tensorflow.org/install/gpu?hl=es-419 the following line:
sudo apt install ./libnvinfer7_7.1.3-1+cuda11.0_amd64.deb
I got:
sudo apt install ./libnvinfer7_7.1.3-1+cuda11.0_amd64.deb
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Note, selecting 'libnvinfer7' instead of './libnvinfer7_7.1.3-1+cuda11.0_amd64.deb'
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 libnvinfer7 : Depends: libcublas-11-0 but it is not installable
               Depends: cuda-cudart-11-0 but it is not installable
E: Unable to correct problems, you have held broken packages.
@google-ml-butler google-ml-butler bot added the type:build/install Build and install issues label Apr 8, 2024
@tilakrayal tilakrayal added subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues TF 2.16 labels Apr 10, 2024
@tilakrayal
Copy link
Contributor

@6CRIPT,
I suspect you are trying to install the every tensorflow version on CUDA 11.5 only which is not compatible. Could you please uninstall the installed libraries and try to install CUDA 12.3 and cuDNN 8.9 for the tensorflow v2.16.1 and check the below process.

- Make sure cuda and cudnn are installed correctly.
- Make sure $Env:CUDA_PATH is given correctly
- Make sure $Env:LD_LIBRARY_PATH is given correctly
- Make sure $Env:TF_CUDA_PATHS is given correctly
- Use pip install tensorflow[and-cuda] instead of pip install tensorflow. Because extensions, such as nvidia-cuda-* and nvidia-cudnn-* from pip(Pypi) are also needed.

#63341
#63948

Thank you!

@tilakrayal tilakrayal added the stat:awaiting response Status - Awaiting response from author label Apr 10, 2024
@6CRIPT
Copy link
Author

6CRIPT commented Apr 10, 2024

Hi, thanks for your reply.
As you can see on my nvidia-smi output I was using CUDA 12.4 I will retry the entire process anyways, any updates will be posted :p

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Apr 10, 2024
@tilakrayal tilakrayal added the stat:awaiting response Status - Awaiting response from author label Apr 10, 2024
Copy link

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Apr 18, 2024
@6CRIPT
Copy link
Author

6CRIPT commented Apr 18, 2024

Hi I finally solved it. It was all about installing exactly the acurrate versions. Thanks for your responses ! :D

I have uploaded a video to youtube explaining the process:
https://www.youtube.com/watch?v=iIYHfCh1rmU&t=5s

@google-ml-butler google-ml-butler bot removed the stale This label marks the issue/pr stale - to be closed automatically if no activity label Apr 18, 2024
@6CRIPT 6CRIPT closed this as completed Apr 18, 2024
@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Apr 18, 2024
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues TF 2.16 type:build/install Build and install issues
Projects
None yet
Development

No branches or pull requests

2 participants