Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorflow doesn't work with CUDA 12 on WSL2 #59413

Closed
luckeyca opened this issue Jan 22, 2023 · 10 comments
Closed

tensorflow doesn't work with CUDA 12 on WSL2 #59413

luckeyca opened this issue Jan 22, 2023 · 10 comments
Assignees
Labels
comp:gpu GPU related issues TF 2.11 Issues related to TF 2.11 type:bug Bug wsl2 Windows Subsystem for Linux

Comments

@luckeyca
Copy link

luckeyca commented Jan 22, 2023

Click to expand!

Issue Type

Bug

Have you reproduced the bug with TF nightly?

Yes

Source

binary

Tensorflow Version

v2.11.0-rc2-17-gd5b57ca93e5 2.11.0

Custom Code

No

OS Platform and Distribution

WSL2 Ubuntu 22.04

Mobile device

N/A

Python version

3.10.6

Bazel version

No response

GCC/Compiler version

No response

CUDA/cuDNN version

12.0

GPU model and memory

No response

Current Behaviour?

When running tensorflow installation verification command right after pip install, both regular release and nightly, tensorflow was looking for CUDA 11 libraries, instead of 12.

Standalone code to reproduce the issue

1. follow this instruction to enable GPU in WSL2 and tested sample app fine. 
https://ubuntu.com/tutorials/enabling-gpu-acceleration-on-ubuntu-on-wsl2-with-the-nvidia-cuda-platform#3-install-nvidia-cuda-on-ubuntu
2. run nvidia-smi command with correct output
3. pip install tf-nightly or tensorflow ran fine.

Both command 4 and 5 with error no finding CUDA 12 libraries even though the LD_LIBRARY_PATH is set correctly. 

4.python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" 
5.python3 -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

Relevant log output

nvidia-smi command output:

# nvidia-smi

Sun Jan 22 17:07:57 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 527.92.01    Driver Version: 528.02       CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:0B:00.0  On |                  Off |
|  0%   34C    P5    35W / 450W |   1619MiB / 24564MiB |      4%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A        23      G   /Xwayland                       N/A      |
+-----------------------------------------------------------------------------+

NVCC command output.

./nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Mon_Oct_24_19:12:58_PDT_2022
Cuda compilation tools, release 12.0, V12.0.76
Build cuda_12.0.r12.0/compiler.31968024_0

"python3 -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"" command output.

python3 -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
2023-01-22 17:41:38.117845: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-22 17:41:38.193983: W tensorflow/tsl/platform/default/dso_loader.cc:67] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64
2023-01-22 17:41:38.194028: I tensorflow/tsl/cuda/cudart_stub.cc:28] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-01-22 17:41:38.214984: E tensorflow/tsl/lib/monitoring/collection_registry.cc:81] Cannot register 2 metrics with the same name: /tensorflow/core/bfc_allocator_delay
2023-01-22 17:41:38.638503: W tensorflow/tsl/platform/default/dso_loader.cc:67] Could not load dynamic library 'libnvinfer.so.8'; dlerror: libnvinfer.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64
2023-01-22 17:41:38.638571: W tensorflow/tsl/platform/default/dso_loader.cc:67] Could not load dynamic library 'libnvinfer_plugin.so.8'; dlerror: libnvinfer_plugin.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64
2023-01-22 17:41:38.638592: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-01-22 17:41:39.158693: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:0b:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-01-22 17:41:39.158768: W tensorflow/tsl/platform/default/dso_loader.cc:67] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64
2023-01-22 17:41:39.158808: W tensorflow/tsl/platform/default/dso_loader.cc:67] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64
2023-01-22 17:41:39.158844: W tensorflow/tsl/platform/default/dso_loader.cc:67] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64
2023-01-22 17:41:39.158877: W tensorflow/tsl/platform/default/dso_loader.cc:67] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64
2023-01-22 17:41:39.167698: W tensorflow/tsl/platform/default/dso_loader.cc:67] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64
2023-01-22 17:41:39.167763: W tensorflow/tsl/platform/default/dso_loader.cc:67] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64
2023-01-22 17:41:39.167772: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1955] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2023-01-22 17:41:39.168021: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
tf.Tensor(-1520.5212, shape=(), dtype=float32)
@google-ml-butler google-ml-butler bot added the type:bug Bug label Jan 22, 2023
@sushreebarsa sushreebarsa added wsl2 Windows Subsystem for Linux TF 2.11 Issues related to TF 2.11 labels Jan 23, 2023
@SuryanarayanaY
Copy link
Collaborator

Hi @luckeyca,
We recommend to follow Tensorflow Instructions to install CUDA toolkit using Conda and set-up of the paths which are mentioned here.Since You have used Ubuntu's instructions Iam not sure how the required environment/paths configured for Tensorflow with those instructions.

Anyways you can try these first.

curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
conda create --name tf python=3.9
conda activate tf
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

After that please try the below code and confirm output.
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Also please share the output of the command whereis cuda.

Thank you!

@SuryanarayanaY SuryanarayanaY added comp:gpu GPU related issues stat:awaiting response Status - Awaiting response from author labels Jan 25, 2023
@luckeyca
Copy link
Author

Hi @SuryanarayanaY, my question is related to CUDA 12, yet your conda example is using cuda 11.2.

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Jan 27, 2023
@luckeyca
Copy link
Author

luckeyca commented Jan 27, 2023

Did you try your suggested procedure with CUDA 12? @SuryanarayanaY When I change the cudatoolkit=11.2 to cudatoolkit=12.0, it says there is no such package

@SuryanarayanaY
Copy link
Collaborator

@luckeyca ,

Hi @SuryanarayanaY, my question is related to CUDA 12, yet your conda example is using cuda 11.2.

Iam following Tested configurations as per Official documentation here. With TF 2.11v CUDA 11.2 is tested configuration and with tf-nightly CUDA 11.8 is tested. CUDA 12 not yet a tested configuration and there might be compatibility issues(may be with some APIs) which may not be addressed now.

However As per Official documentation it refers to this source where the steps to install CUDA toolkit are mentioned below.It seems you have missed installation of CUDA toolkit as per your mentioned steps the below command is missing.
wget https://developer.download.nvidia.com/compute/cuda/12.0.0/local_installers/cuda-repo-wsl-ubuntu-12-0-local_12.0.0-1_amd64.deb.

Attached below is snapshot of commands to follow as per Tensorflow/Nvidia official source.

Screenshot 2023-01-28 at 2 49 11 PM

Once above steps followed for CUDA toolkit installation please follow remaining steps from Tensorflow documentation here from step2 and in Step4 skip GPU driver installation and continue from CUDA and cuDNN installations using conda.

We recommend to use cuda toolkit 11.2 for TF 2.11v and 11.8 for tf-nightly.

Please follow the steps thoroughly and let us know if any problem still exists. Thanks!

@SuryanarayanaY SuryanarayanaY added the stat:awaiting response Status - Awaiting response from author label Jan 28, 2023
@luckeyca
Copy link
Author

luckeyca commented Feb 3, 2023

Hi @SuryanarayanaY , I did install cuda 12.0 already. If you looked at the top, in the problem description, "relevant log output" section, from nvidia-smi command output, it clearly listed cuda 12.0 there.

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Feb 3, 2023
@luckeyca
Copy link
Author

luckeyca commented Feb 3, 2023

I think the problem is with cudnn, even the latest 8.7 does not support cuda 12 yet, only up to 11.8.

@luckeyca luckeyca closed this as completed Feb 3, 2023
@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@SuryanarayanaY
Copy link
Collaborator

Yes. Current TF-nightly was tested on CUDA 11.8 version only.Hence there is chance of compatibility issues with higher version.
AFAIK nvidia-smi command outputs CUDA Driver information and the maximum CUDA version that it can support. It does not mean that particular CUDA version already installed along with CUDA driver.Someone can correct me if Iam wrong.

After installing the driver one has to install compatible CUDA and cuDNN toolkit with the required TF version using the conda command and setting the paths correctly as mentioned in documentation which are listed below also.

conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

@piwawa
Copy link

piwawa commented May 13, 2023

$ nvidia-smi

# output
Sat May 13 20:53:46 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA TITAN Xp     Off  | 00000000:02:00.0 Off |                  N/A |
| 23%   32C    P8    10W / 250W |   2596MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     12139      C   python3                          2594MiB |
+-----------------------------------------------------------------------------+


$ pip show tensorflow

# output
Name: tensorflow
Version: 2.12.0

Same question with you, even same version of CUDA.

I've got through various forums and been tormented by this problem for a day now. T_T

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices(device_type='GPU') # why this code can not detect GPU? TF Version: 2.12.0. A: 
cpus = tf.config.experimental.list_physical_devices(device_type='CPU')
print(gpus, cpus)

# output
2023-05-13 20:45:51.458371: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-05-13 20:45:51.517207: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-05-13 20:45:51.517982: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-13 20:45:52.532953: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[] [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
2023-05-13 20:45:53.474612: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1956] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

@eafpres
Copy link

eafpres commented May 16, 2023

@piwawa I wasted the entire day today on this. I think that part of the problem is the Nvidia WSL2 instructions will get you to install CUDA 12, and that may not work properly with Tensorflow right now. Also, the Tensorflow instructions use conda to install some CUDA 11 stuff, and I am wondering if that is a disconnect with the Nvidia instructions. So tomorrow with a clear head I'm going to try to re-do everything with CUDA 11 and see if I can get it to work.

@SuryanarayanaY I was trying to avoid using conda, as I usually manage 100% with pip. However, I don't think I fully understand these 2 instructions:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/
mkdir -p $CONDA_PREFIX/etc/conda/activate.d

I use Linux but don't do much at system level if I can avoid it. I had previously modified by .bashrc in WSL Ubuntu 20.04 to update the LD_LIBRARY_PATH. With these instructions I'm unclear what is CONDA_PREFIX and what the desired end result is. If I already have a venv (made using Python) and want to install all this there, I'm confused by these new instructions forcing the use of conda.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:gpu GPU related issues TF 2.11 Issues related to TF 2.11 type:bug Bug wsl2 Windows Subsystem for Linux
Projects
None yet
Development

No branches or pull requests

6 participants