tensorflow doesn't work with CUDA 12 on WSL2 #59413

luckeyca · 2023-01-22T22:47:51Z

Click to expand!

Issue Type

Bug

Have you reproduced the bug with TF nightly?

Yes

Source

binary

Tensorflow Version

v2.11.0-rc2-17-gd5b57ca93e5 2.11.0

Custom Code

No

OS Platform and Distribution

WSL2 Ubuntu 22.04

Mobile device

N/A

Python version

3.10.6

Bazel version

No response

GCC/Compiler version

No response

CUDA/cuDNN version

12.0

GPU model and memory

No response

Current Behaviour?

When running tensorflow installation verification command right after pip install, both regular release and nightly, tensorflow was looking for CUDA 11 libraries, instead of 12.

Standalone code to reproduce the issue

1. follow this instruction to enable GPU in WSL2 and tested sample app fine. 
https://ubuntu.com/tutorials/enabling-gpu-acceleration-on-ubuntu-on-wsl2-with-the-nvidia-cuda-platform#3-install-nvidia-cuda-on-ubuntu
2. run nvidia-smi command with correct output
3. pip install tf-nightly or tensorflow ran fine.

Both command 4 and 5 with error no finding CUDA 12 libraries even though the LD_LIBRARY_PATH is set correctly. 

4.python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" 
5.python3 -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

Relevant log output

nvidia-smi command output:

# nvidia-smi

Sun Jan 22 17:07:57 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 527.92.01    Driver Version: 528.02       CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:0B:00.0  On |                  Off |
|  0%   34C    P5    35W / 450W |   1619MiB / 24564MiB |      4%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A        23      G   /Xwayland                       N/A      |
+-----------------------------------------------------------------------------+

NVCC command output.

./nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Mon_Oct_24_19:12:58_PDT_2022
Cuda compilation tools, release 12.0, V12.0.76
Build cuda_12.0.r12.0/compiler.31968024_0

"python3 -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"" command output.

python3 -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
2023-01-22 17:41:38.117845: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-22 17:41:38.193983: W tensorflow/tsl/platform/default/dso_loader.cc:67] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64
2023-01-22 17:41:38.194028: I tensorflow/tsl/cuda/cudart_stub.cc:28] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-01-22 17:41:38.214984: E tensorflow/tsl/lib/monitoring/collection_registry.cc:81] Cannot register 2 metrics with the same name: /tensorflow/core/bfc_allocator_delay
2023-01-22 17:41:38.638503: W tensorflow/tsl/platform/default/dso_loader.cc:67] Could not load dynamic library 'libnvinfer.so.8'; dlerror: libnvinfer.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64
2023-01-22 17:41:38.638571: W tensorflow/tsl/platform/default/dso_loader.cc:67] Could not load dynamic library 'libnvinfer_plugin.so.8'; dlerror: libnvinfer_plugin.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64
2023-01-22 17:41:38.638592: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-01-22 17:41:39.158693: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:0b:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-01-22 17:41:39.158768: W tensorflow/tsl/platform/default/dso_loader.cc:67] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64
2023-01-22 17:41:39.158808: W tensorflow/tsl/platform/default/dso_loader.cc:67] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64
2023-01-22 17:41:39.158844: W tensorflow/tsl/platform/default/dso_loader.cc:67] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64
2023-01-22 17:41:39.158877: W tensorflow/tsl/platform/default/dso_loader.cc:67] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64
2023-01-22 17:41:39.167698: W tensorflow/tsl/platform/default/dso_loader.cc:67] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64
2023-01-22 17:41:39.167763: W tensorflow/tsl/platform/default/dso_loader.cc:67] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64
2023-01-22 17:41:39.167772: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1955] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2023-01-22 17:41:39.168021: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
tf.Tensor(-1520.5212, shape=(), dtype=float32)

SuryanarayanaY · 2023-01-25T11:41:43Z

Hi @luckeyca,
We recommend to follow Tensorflow Instructions to install CUDA toolkit using Conda and set-up of the paths which are mentioned here.Since You have used Ubuntu's instructions Iam not sure how the required environment/paths configured for Tensorflow with those instructions.

Anyways you can try these first.

curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
conda create --name tf python=3.9
conda activate tf
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

After that please try the below code and confirm output.
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Also please share the output of the command whereis cuda.

Thank you!

luckeyca · 2023-01-27T21:26:09Z

Hi @SuryanarayanaY, my question is related to CUDA 12, yet your conda example is using cuda 11.2.

luckeyca · 2023-01-27T21:28:13Z

Did you try your suggested procedure with CUDA 12? @SuryanarayanaY When I change the cudatoolkit=11.2 to cudatoolkit=12.0, it says there is no such package

SuryanarayanaY · 2023-01-28T09:53:17Z

@luckeyca ,

Hi @SuryanarayanaY, my question is related to CUDA 12, yet your conda example is using cuda 11.2.

Iam following Tested configurations as per Official documentation here. With TF 2.11v CUDA 11.2 is tested configuration and with tf-nightly CUDA 11.8 is tested. CUDA 12 not yet a tested configuration and there might be compatibility issues(may be with some APIs) which may not be addressed now.

However As per Official documentation it refers to this source where the steps to install CUDA toolkit are mentioned below.It seems you have missed installation of CUDA toolkit as per your mentioned steps the below command is missing.
wget https://developer.download.nvidia.com/compute/cuda/12.0.0/local_installers/cuda-repo-wsl-ubuntu-12-0-local_12.0.0-1_amd64.deb.

Attached below is snapshot of commands to follow as per Tensorflow/Nvidia official source.

Once above steps followed for CUDA toolkit installation please follow remaining steps from Tensorflow documentation here from step2 and in Step4 skip GPU driver installation and continue from CUDA and cuDNN installations using conda.

We recommend to use cuda toolkit 11.2 for TF 2.11v and 11.8 for tf-nightly.

Please follow the steps thoroughly and let us know if any problem still exists. Thanks!

luckeyca · 2023-02-03T05:42:11Z

Hi @SuryanarayanaY , I did install cuda 12.0 already. If you looked at the top, in the problem description, "relevant log output" section, from nvidia-smi command output, it clearly listed cuda 12.0 there.

luckeyca · 2023-02-03T05:45:21Z

I think the problem is with cudnn, even the latest 8.7 does not support cuda 12 yet, only up to 11.8.

google-ml-butler · 2023-02-03T05:45:23Z

Are you satisfied with the resolution of your issue?
Yes
No

SuryanarayanaY · 2023-02-03T06:06:01Z

Yes. Current TF-nightly was tested on CUDA 11.8 version only.Hence there is chance of compatibility issues with higher version.
AFAIK nvidia-smi command outputs CUDA Driver information and the maximum CUDA version that it can support. It does not mean that particular CUDA version already installed along with CUDA driver.Someone can correct me if Iam wrong.

After installing the driver one has to install compatible CUDA and cuDNN toolkit with the required TF version using the conda command and setting the paths correctly as mentioned in documentation which are listed below also.

conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

piwawa · 2023-05-13T13:06:13Z

$ nvidia-smi

# output
Sat May 13 20:53:46 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA TITAN Xp     Off  | 00000000:02:00.0 Off |                  N/A |
| 23%   32C    P8    10W / 250W |   2596MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     12139      C   python3                          2594MiB |
+-----------------------------------------------------------------------------+


$ pip show tensorflow

# output
Name: tensorflow
Version: 2.12.0

Same question with you, even same version of CUDA.

I've got through various forums and been tormented by this problem for a day now. T_T

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices(device_type='GPU') # why this code can not detect GPU? TF Version: 2.12.0. A: 
cpus = tf.config.experimental.list_physical_devices(device_type='CPU')
print(gpus, cpus)

# output
2023-05-13 20:45:51.458371: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-05-13 20:45:51.517207: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-05-13 20:45:51.517982: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-13 20:45:52.532953: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[] [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
2023-05-13 20:45:53.474612: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1956] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

eafpres · 2023-05-16T22:38:54Z

@piwawa I wasted the entire day today on this. I think that part of the problem is the Nvidia WSL2 instructions will get you to install CUDA 12, and that may not work properly with Tensorflow right now. Also, the Tensorflow instructions use conda to install some CUDA 11 stuff, and I am wondering if that is a disconnect with the Nvidia instructions. So tomorrow with a clear head I'm going to try to re-do everything with CUDA 11 and see if I can get it to work.

@SuryanarayanaY I was trying to avoid using conda, as I usually manage 100% with pip. However, I don't think I fully understand these 2 instructions:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/
mkdir -p $CONDA_PREFIX/etc/conda/activate.d

I use Linux but don't do much at system level if I can avoid it. I had previously modified by .bashrc in WSL Ubuntu 20.04 to update the LD_LIBRARY_PATH. With these instructions I'm unclear what is CONDA_PREFIX and what the desired end result is. If I already have a venv (made using Python) and want to install all this there, I'm confused by these new instructions forcing the use of conda.

google-ml-butler bot added the type:bug Bug label Jan 22, 2023

google-ml-butler bot assigned tiruk007 Jan 22, 2023

luckeyca mentioned this issue Jan 22, 2023

[NVIDIA TF] Support building against CUDA 12.0 #58867

Merged

sushreebarsa assigned sushreebarsa and unassigned tiruk007 Jan 23, 2023

sushreebarsa added wsl2 Windows Subsystem for Linux TF 2.11 Issues related to TF 2.11 labels Jan 23, 2023

sushreebarsa assigned SuryanarayanaY and unassigned sushreebarsa Jan 24, 2023

SuryanarayanaY added comp:gpu GPU related issues stat:awaiting response Status - Awaiting response from author labels Jan 25, 2023

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Jan 27, 2023

SuryanarayanaY added the stat:awaiting response Status - Awaiting response from author label Jan 28, 2023

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Feb 3, 2023

luckeyca closed this as completed Feb 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tensorflow doesn't work with CUDA 12 on WSL2 #59413

tensorflow doesn't work with CUDA 12 on WSL2 #59413

luckeyca commented Jan 22, 2023 •

edited

Issue Type

Have you reproduced the bug with TF nightly?

Source

Tensorflow Version

Custom Code

OS Platform and Distribution

Mobile device

Python version

Bazel version

GCC/Compiler version

CUDA/cuDNN version

GPU model and memory

Current Behaviour?

Standalone code to reproduce the issue

Relevant log output

SuryanarayanaY commented Jan 25, 2023

luckeyca commented Jan 27, 2023

luckeyca commented Jan 27, 2023 •

edited

SuryanarayanaY commented Jan 28, 2023

luckeyca commented Feb 3, 2023

luckeyca commented Feb 3, 2023

google-ml-butler bot commented Feb 3, 2023

SuryanarayanaY commented Feb 3, 2023

piwawa commented May 13, 2023

eafpres commented May 16, 2023

tensorflow doesn't work with CUDA 12 on WSL2 #59413

tensorflow doesn't work with CUDA 12 on WSL2 #59413

Comments

luckeyca commented Jan 22, 2023 • edited

Issue Type

Have you reproduced the bug with TF nightly?

Source

Tensorflow Version

Custom Code

OS Platform and Distribution

Mobile device

Python version

Bazel version

GCC/Compiler version

CUDA/cuDNN version

GPU model and memory

Current Behaviour?

Standalone code to reproduce the issue

Relevant log output

SuryanarayanaY commented Jan 25, 2023

luckeyca commented Jan 27, 2023

luckeyca commented Jan 27, 2023 • edited

SuryanarayanaY commented Jan 28, 2023

luckeyca commented Feb 3, 2023

luckeyca commented Feb 3, 2023

google-ml-butler bot commented Feb 3, 2023

SuryanarayanaY commented Feb 3, 2023

piwawa commented May 13, 2023

eafpres commented May 16, 2023

luckeyca commented Jan 22, 2023 •

edited

luckeyca commented Jan 27, 2023 •

edited