Tensorflow WSL GPU CUDA recognition issue RTX3090 #63948

jeanswiegers · 2024-03-19T09:45:13Z

tf_env.txt

Issue type

Build/Install

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

2.16.1

Custom code

No

OS platform and distribution

Linux Ubuntu 20.04.6

Mobile device

No response

Python version

3.11.7

Bazel version

No response

GCC/compiler version

11.2.0

CUDA/cuDNN version

8.6

GPU model and memory

NVIDIA GeForce RTX 3090

Current behavior?

I installed Tensorflow using gpu install guide for WSL as follows:

wsl --install
sudo apt update
sudo apt upgrade
sudo apt install python3-pip
cd Desktop
Download anaconda linux
https://www.anaconda.com/download#downloads
bash Anaconda3-2024.02-1-Linux-x86_64.sh
~/anaconda3/bin/conda init bash
~/anaconda3/bin/conda init zsh
conda create --name tf-gpu python==3.11
conda activate tf-gpu
pip install --upgrade pip
pip install tensorflow[and-cuda]

Pytorch is working perfectly in WSL with Cuda enabled. Tensorflow is not recognizing my GPU.

nvidia-smi output:

Tue Mar 19 11:19:07 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.60.01              Driver Version: 551.76         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        On  |   00000000:01:00.0  On |                  N/A |
| 30%   54C    P0            121W /  350W |    1642MiB /  24576MiB |      4%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

nvcc --version output:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" output:

2024-03-19 11:29:26.228449: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-19 11:29:26.250017: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-19 11:29:26.585563: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-19 11:29:26.832574: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-19 11:29:26.844141: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

/usr/local/cuda-11.8/extras/demo_suite/deviceQuery output:

/usr/local/cuda-11.8/extras/demo_suite/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 3090"
  CUDA Driver Version / Runtime Version          12.4 / 11.8
  CUDA Capability Major/Minor version number:    8.6
  Total amount of global memory:                 24576 MBytes (25769279488 bytes)
  (82) Multiprocessors, (128) CUDA Cores/MP:     10496 CUDA Cores
  GPU Max Clock rate:                            1695 MHz (1.70 GHz)
  Memory Clock rate:                             9751 Mhz
  Memory Bus Width:                              384-bit
  L2 Cache Size:                                 6291456 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.4, CUDA Runtime Version = 11.8, NumDevs = 1, Device0 = NVIDIA GeForce RTX 3090
Result = PASS

ENVs set:

export CUDA_HOME=/usr/local/cuda
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH

Standalone code to reproduce the issue

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Relevant log output

2024-03-19 11:29:26.228449: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-19 11:29:26.250017: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-19 11:29:26.585563: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-19 11:29:26.832574: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-19 11:29:26.844141: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]

The text was updated successfully, but these errors were encountered:

sushreebarsa · 2024-03-20T04:38:33Z

@jeanswiegers If TF is not recognizing GPU, then could you please verify the build compatibility by running the following in your WSL environment;

import tensorflow as tf
print(tf.test.is_built_with_cuda())

This should output True. If it's False, you might need to reinstall TensorFlow with GPU support. For more information on WSL with GPU support please refer to https://www.tensorflow.org/install/pip. The TensorFlow version needs to be compatible with your CUDA version.

Thank you!

jeanswiegers · 2024-03-20T04:47:51Z

@jeanswiegers If TF is not recognizing GPU, then could you please verify the build compatibility by running the following in your WSL environment;
import tensorflow as tf
print(tf.test.is_built_with_cuda())
This should output True. If it's False, you might need to reinstall TensorFlow with GPU support. For more information on WSL with GPU support please refer to https://www.tensorflow.org/install/pip. The TensorFlow version needs to be compatible with your CUDA version.

Thank you!

Hi Sushreebarsa, thanks for your help. It does return True in my environment.

sushreebarsa · 2024-03-20T05:13:06Z

@jeanswiegers Thank you for your quick response!
Simply restarting your WSL instance (wsl --shutdown) or your entire computer can resolve environment variable issues. Could you please try this once. If the issue continues then, please use nvidia-smi within your WSL terminal to confirm your RTX3090 is recognized by the NVIDIA drivers. If not, there might be an issue with the driver installation itself.
Thank you!

jeanswiegers · 2024-03-20T05:16:33Z

i have restarted pc and ubuntu multiple times.

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.60.01              Driver Version: 551.76         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        On  |   00000000:01:00.0  On |                  N/A |
| 30%   54C    P0            121W /  350W |    1642MiB /  24576MiB |      4%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

jeanswiegers · 2024-03-20T17:20:22Z

I managed to get it working by installing the latest supported CUDA version (12.3) Ubuntu runfile stated on tensorflows website.

Only running
pip install tensorflow[with-cuda] doesn't work.

jeanswiegers · 2024-03-20T17:20:42Z

Working

google-ml-butler · 2024-03-20T17:20:44Z

Are you satisfied with the resolution of your issue?
Yes
No

sushreebarsa · 2024-03-21T04:28:16Z

@jeanswiegers Glad it worked fine for you.
Thank you!

chaudharyachint08 · 2024-03-22T22:24:53Z

Almost final and automated fix below

Where I found the resolution
- TF 2.16.1 Fails to work with GPUs
  - Solution proposed by "sh-shahrokhi", improved by "ChristofKaufmann"
  - See specially Comment by COntributor
- Related Issues
  - GPU not detected on WSL2, where I have post some comments
  - Tensorflow WSL GPU CUDA recognition issue RTX3090
  - Once gain: tf.2.16.1 fails to recognize GPUs
- Other mention on social media
  - Reddit issue
Exact solution
- Temporary fix (after activating environment in which Tensorflow 2.16.1 is installed)
```
  export NVIDIA_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)")))
  export LD_LIBRARY_PATH=$(echo ${NVIDIA_DIR}/*/lib/ | sed -r 's/\s+/:/g')${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
  
```
- Automating the variable set/unset process with Anaconda (one-time setup)
  - Activate your environement in which TF 2.16.1 is installed
  - Two files to be created in "anaconda3/envs/<ENV_NAME>/etc/conda"
    - anaconda3/envs/<ENV_NAME>/etc/conda/activate.d/env_vars.sh
```
  #!/bin/sh
  export NVIDIA_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)")))
  export LD_LIBRARY_PATH=$(echo ${NVIDIA_DIR}/*/lib/ | sed -r 's/\s+/:/g')${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
  
```
    - anaconda3/envs/<ENV_NAME>/etc/conda/deactivate.d/env_vars.sh
```
  #!/bin/sh
  unset NVIDIA_DIR
  unset LD_LIBRARY_PATH
  
```
  - Official documentation to do this via conda.io
  - Stack-overflow question where I got this Set environment vars when activating conda env
What else helped me
- How to list all variables names and their current values?
- Conda documenation for "env_vars.sh" for activate/deactivate

google-ml-butler bot added type:build/install Build and install issues type:support Support issues labels Mar 19, 2024

google-ml-butler bot assigned sushreebarsa Mar 19, 2024

jeanswiegers changed the title ~~Tensorflow WSL GPU CUDA rcognition issue RTX3090~~ Tensorflow WSL GPU CUDA recognition issue RTX3090 Mar 19, 2024

sushreebarsa added wsl2 Windows Subsystem for Linux TF 2.16 and removed type:support Support issues labels Mar 20, 2024

sushreebarsa added the stat:awaiting response Status - Awaiting response from author label Mar 20, 2024

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Mar 20, 2024

sushreebarsa added the stat:awaiting response Status - Awaiting response from author label Mar 20, 2024

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Mar 20, 2024

sushreebarsa assigned SuryanarayanaY and unassigned sushreebarsa Mar 20, 2024

jeanswiegers closed this as completed Mar 20, 2024

This was referenced Mar 22, 2024

TF 2.16.1 Fails to work with GPUs #63362

Open

GPU not detected on WSL2 #63341

Closed

chaudharyachint08 mentioned this issue Mar 22, 2024

once gain: tf.2.16.1 fails to recognize GPUs keras-team/keras#19276

Closed

sushreebarsa self-assigned this Apr 10, 2024

tilakrayal mentioned this issue Apr 10, 2024

Tensorflow does not recognice GPU no matter what #65269

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensorflow WSL GPU CUDA recognition issue RTX3090 #63948

Tensorflow WSL GPU CUDA recognition issue RTX3090 #63948

jeanswiegers commented Mar 19, 2024 •

edited

sushreebarsa commented Mar 20, 2024

jeanswiegers commented Mar 20, 2024

sushreebarsa commented Mar 20, 2024

jeanswiegers commented Mar 20, 2024

jeanswiegers commented Mar 20, 2024

jeanswiegers commented Mar 20, 2024

google-ml-butler bot commented Mar 20, 2024

sushreebarsa commented Mar 21, 2024

chaudharyachint08 commented Mar 22, 2024

Tensorflow WSL GPU CUDA recognition issue RTX3090 #63948

Tensorflow WSL GPU CUDA recognition issue RTX3090 #63948

Comments

jeanswiegers commented Mar 19, 2024 • edited

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output

sushreebarsa commented Mar 20, 2024

jeanswiegers commented Mar 20, 2024

sushreebarsa commented Mar 20, 2024

jeanswiegers commented Mar 20, 2024

jeanswiegers commented Mar 20, 2024

jeanswiegers commented Mar 20, 2024

google-ml-butler bot commented Mar 20, 2024

sushreebarsa commented Mar 21, 2024

chaudharyachint08 commented Mar 22, 2024

Almost final and automated fix below

jeanswiegers commented Mar 19, 2024 •

edited