TF 2.16.1 Fails to work with GPUs #63362

JuanVargas · 2024-03-10T00:17:36Z

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

No

Source

binary

TensorFlow version

TF 2.16.1

Custom code

No

OS platform and distribution

Linux Ubuntu 22.04.4 LTS

Mobile device

No response

Python version

3.10.12

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

12.4

GPU model and memory

No response

Current behavior?

I created a python venv in which I installed TF 2.16.1 following your instructions: pip install tensorflow
When I run python, import tf, and issue tf.config.list_physical_devices('GPU')
I get an empty list [ ]

I created another python venv, installed TF 2.16.1, only this time with the instructions:

python3 -m pip install tensorflow[and-cuda]

When I run that version, import tensorflow as tf, and issue

tf.config.list_physical_devices('GPU')

I also get an empty list.

BTW, I have no problems running on my box TF 2.15.1 with GPUs. Julia also works just fine with GPUs and so does PyTorch.
the

Standalone code to reproduce the issue

Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2024-03-09 19:15:45.018171: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-09 19:15:50.412646: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
>>> tf.__version__
'2.16.1'

tf.config.list_physical_devices('GPU') 
2024-03-09 19:16:28.923792: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-03-09 19:16:29.078379: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]
>>>

Relevant log output

No response

sh-shahrokhi · 2024-03-10T00:52:30Z

It does not work with python=3.12.2 either. Same error. installed tensorflow with $ pip install tensorflow[and-cuda]

damadorPL · 2024-03-10T21:30:58Z

The same error on bare Ubuntu and WSL2 2.15 works without any problems with python 3.11

DiegoMont · 2024-03-10T21:32:27Z

I have the same problem with Ubuntu 22.04.4 with the following environment:

tensorflow==2.16.1
Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] on linux
cuDNN 8.6.0.163
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

nvcc --version output:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

AlpriElse · 2024-03-11T00:37:31Z

I'm not sure if this is the root cause, but I resolved my own issue which also surfaced as a "Cannot dlopen some GPU libraries." error when trying to run python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

To resolve my issue, I followed the tested build versions here:
https://www.tensorflow.org/install/source#gpu

and I needed to update my existing installations from cuDNN 9 -> 8.9 and CUDA 12.4->12.3

When you're on an NVIDIA download page like this one for CUDA Toolkit, don't just download the latest version. See previous versions by hitting "Archive of Previous CUDA Releases"

@JuanVargas can you try uninstalling your existing CUDA installation to a tested build configuration for TF 2.16 by downgrading to CUDA 12.3?

I followed this post to uninstall my existing cuda installation:
https://askubuntu.com/questions/530043/removing-nvidia-cuda-toolkit-and-installing-new-one

@DiegoMont can you try upgrading your cuDNN to 8.9 and CUDA to 12.3?

Gwyki · 2024-03-11T02:08:11Z

I am having the same issue. Brand new Ubuntu 22.04 WSL2 image. Blank Conda environment with either python 3.12.* or 3.11.* fails to correctly setup tensorflow for GPU use when following the recommended:
pip install tensorflow[and-cuda]

Trying to list the physical devices results in:

2024-03-11 02:00:00.294704: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-11 02:00:00.709325: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-11 02:00:01.180225: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:2d:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-11 02:00:01.180445: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]

cuDNN 8.9.*
Cuda 12.3
Tensorflow 2.16.1
TensorRT 8.6.1

Is this a new issue caused by the fact that it doesn't appear that any system cuda needs to be separately installed in WSL2 anymore. I certainly didn't install one manually and yet nvidia-smi is happily reporting cuda version 12.3. It probably comes down to some env paths not set correctly but playing around with $CUDA_PATH and guessing the location within the conda environment has not resolved anything. TensorRT doesn't seem to be picked up yet is definitely installed in the conda environment. Pytorch GPU visibility works as expected.

SuryanarayanaY · 2024-03-11T05:00:26Z

Hi @JuanVargas ,

For GPU package you need to ensure the installation of CUDA driver which can be verified with nvidia-smi command. Then you need to install TF-cuda package with pip install tensorflow[and-cuda] which automatically installs required cuda/cudnn libraries.

I have checked in colab and able to detect GPU.Please refer attached gist.

damadorPL · 2024-03-11T08:38:40Z

doublequotes in pip install because of ZSH

pip install "tensorflow[and-cuda]==2.16.1"                                                                       
 

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: tensorflow==2.16.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (2.16.1)
Requirement already satisfied: absl-py>=1.0.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (2.1.0)
Requirement already satisfied: astunparse>=1.6.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (1.6.3)
Requirement already satisfied: flatbuffers>=23.5.26 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (24.3.7)
Requirement already satisfied: gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (0.5.4)
Requirement already satisfied: google-pasta>=0.1.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (0.2.0)
Requirement already satisfied: h5py>=3.10.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (3.10.0)
Requirement already satisfied: libclang>=13.0.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (16.0.6)
Requirement already satisfied: ml-dtypes~=0.3.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (0.3.2)
Requirement already satisfied: opt-einsum>=2.3.2 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (3.3.0)
Requirement already satisfied: packaging in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (24.0)
Requirement already satisfied: protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (4.25.3)
Requirement already satisfied: requests<3,>=2.21.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (2.31.0)
Requirement already satisfied: setuptools in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (69.1.1)
Requirement already satisfied: six>=1.12.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (1.16.0)
Requirement already satisfied: termcolor>=1.1.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (2.4.0)
Requirement already satisfied: typing-extensions>=3.6.6 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (4.10.0)
Requirement already satisfied: wrapt>=1.11.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (1.16.0)
Requirement already satisfied: grpcio<2.0,>=1.24.3 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (1.62.1)
Requirement already satisfied: tensorboard<2.17,>=2.16 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (2.16.2)
Requirement already satisfied: keras>=3.0.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (3.0.5)
Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (0.36.0)
Requirement already satisfied: numpy<2.0.0,>=1.23.5 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (1.26.4)
Requirement already satisfied: nvidia-cublas-cu12==12.3.4.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (12.3.4.1)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.3.101 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (12.3.101)
Requirement already satisfied: nvidia-cuda-nvcc-cu12==12.3.107 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (12.3.107)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.3.107 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (12.3.107)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.3.101 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (12.3.101)
Requirement already satisfied: nvidia-cudnn-cu12==8.9.7.29 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (8.9.7.29)
Requirement already satisfied: nvidia-cufft-cu12==11.0.12.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (11.0.12.1)
Requirement already satisfied: nvidia-curand-cu12==10.3.4.107 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (10.3.4.107)
Requirement already satisfied: nvidia-cusolver-cu12==11.5.4.101 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (11.5.4.101)
Requirement already satisfied: nvidia-cusparse-cu12==12.2.0.103 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (12.2.0.103)
Requirement already satisfied: nvidia-nccl-cu12==2.19.3 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (2.19.3)
Requirement already satisfied: nvidia-nvjitlink-cu12==12.3.101 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (12.3.101)
Requirement already satisfied: wheel<1.0,>=0.23.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from astunparse>=1.6.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (0.42.0)
Requirement already satisfied: rich in ./miniconda3/envs/tf/lib/python3.11/site-packages (from keras>=3.0.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (13.7.1)
Requirement already satisfied: namex in ./miniconda3/envs/tf/lib/python3.11/site-packages (from keras>=3.0.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (0.0.7)
Requirement already satisfied: dm-tree in ./miniconda3/envs/tf/lib/python3.11/site-packages (from keras>=3.0.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (0.1.8)
Requirement already satisfied: charset-normalizer<4,>=2 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from requests<3,>=2.21.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from requests<3,>=2.21.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from requests<3,>=2.21.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (2.2.1)
Requirement already satisfied: certifi>=2017.4.17 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from requests<3,>=2.21.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (2024.2.2)
Requirement already satisfied: markdown>=2.6.8 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorboard<2.17,>=2.16->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (3.5.2)
Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorboard<2.17,>=2.16->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (0.7.2)
Requirement already satisfied: werkzeug>=1.0.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorboard<2.17,>=2.16->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (3.0.1)
Requirement already satisfied: MarkupSafe>=2.1.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from werkzeug>=1.0.1->tensorboard<2.17,>=2.16->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (2.1.5)
Requirement already satisfied: markdown-it-py>=2.2.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from rich->keras>=3.0.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (3.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from rich->keras>=3.0.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (2.17.2)
Requirement already satisfied: mdurl~=0.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from markdown-it-py>=2.2.0->rich->keras>=3.0.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (0.1.2)

nvidia-smi             
                                                                                           
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.60.01              Driver Version: 551.76         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4070 Ti     On  |   00000000:01:00.0  On |                  N/A |
|  0%   39C    P5             10W /  285W |    4334MiB /  12282MiB |     13%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        41      G   /Xwayland                                   N/A      |
+-----------------------------------------------------------------------------------------+

python3

Python 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))2024-03-11 09:36:29.601060: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-11 09:36:29.921637: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-11 09:36:30.793353: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
>>> print(tf.config.list_physical_devices('GPU'))
2024-03-11 09:36:33.878560: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-11 09:36:33.980099: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]
>>>

damadorPL · 2024-03-11T08:44:20Z

nvcc -V 
                                                                                                          
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:19:38_PST_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0

damadorPL · 2024-03-11T09:29:54Z

got it work :) first
https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89--------------------------------

then download Local Installer for Ubuntu22.04 x86_64 (Deb)

unpack and install libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb

sudo dpkg -i libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb   
                                                           
Selecting previously unselected package libcudnn8.
(Reading database ... 47318 files and directories currently installed.)
Preparing to unpack libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb ...
Unpacking libcudnn8 (8.9.7.29-1+cuda12.2) ...
Setting up libcudnn8 (8.9.7.29-1+cuda12.2) ...

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"  

                             
2024-03-11 10:27:47.879686: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-11 10:27:47.909157: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-11 10:27:48.316717: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-11 10:27:48.664469: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-11 10:27:48.688059: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-11 10:27:48.688111: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

JuanVargas · 2024-03-11T13:03:48Z

Hi Krzysztof I visited the site https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89-------------------------------- where I found an entry listed as " Local Installer for UBuntu22.04 x86_64(Deb)" which I downloaded. Unfortunately what I got is a package named "cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb" which is not the same as the name you suggest in your message, which is " libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb" I assume what you meant is to get the libcudnn8_8.9.7.29*amd64.deb and the cuda12.2_amd64.deb separately and install both. I have CUDA 12.4. I will not go back to trying to make TF 2.16.1 work with older versions of CUDA (12.2 or 12.3) because sooner or later the TF team will have to produce a version with the updated version of CUDA. IMHO, rather than us wasting time going back in versions, the TF beak should invest time going forward to update TF to the current CUDA version. Thank you, Juan

…

On Mon, Mar 11, 2024 at 5:30 AM Krzysztof Radzikowski < ***@***.***> wrote: got it work :) first https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89-------------------------------- then download Local Installer for Ubuntu22.04 x86_64 (Deb) <https://developer.nvidia.com/downloads/compute/cudnn/secure/8.9.7/local_installers/12.x/cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb/> unpack and install libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb ` sudo dpkg -i libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb Selecting previously unselected package libcudnn8. (Reading database ... 47318 files and directories currently installed.) Preparing to unpack libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb ... Unpacking libcudnn8 (8.9.7.29-1+cuda12.2) ... Setting up libcudnn8 (8.9.7.29-1+cuda12.2) ... ` python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" 2024-03-11 10:27:47.879686: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. 2024-03-11 10:27:47.909157: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-03-11 10:27:48.316717: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2024-03-11 10:27:48.664469: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-03-11 10:27:48.688059: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-03-11 10:27:48.688111: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] — Reply to this email directly, view it on GitHub <#63362 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGK34PV2IIF5FUZ73EPKOTYXV2SZAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBXHE3TANRRGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

sh-shahrokhi · 2024-03-11T14:52:29Z

It's just tensorflow can't see the Cuda libraries. Instal tensorflow[and-cuda] and add this to your .bashrc or conda activation script. Adjust python version in it according to your setup. NVIDIA_PACKAGE_DIR="$CONDA_PREFIX/lib/python3.12/site-packages/nvidia" for dir in $NVIDIA_PACKAGE_DIR/*; do if [ -d "$dir/lib" ]; then export LD_LIBRARY_PATH="$dir/lib:$LD_LIBRARY_PATH" fi done You won't need to install cuda or cudnn on the system. only the cuda libraries that are installed with $ pip install tensorflow[and-cuda] would be enough. On Mon, Mar 11, 2024, 7:04 a.m. Juan E. Vargas ***@***.***> wrote:

…

Hi Krzysztof I visited the site https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89-------------------------------- where I found an entry listed as " Local Installer for UBuntu22.04 x86_64(Deb)" which I downloaded. Unfortunately what I got is a package named "cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb" which is not the same as the name you suggest in your message, which is " libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb" I assume what you meant is to get the libcudnn8_8.9.7.29*amd64.deb and the cuda12.2_amd64.deb separately and install both. I have CUDA 12.4. I will not go back to trying to make TF 2.16.1 work with older versions of CUDA (12.2 or 12.3) because sooner or later the TF team will have to produce a version with the updated version of CUDA. IMHO, rather than us wasting time going back in versions, the TF beak should invest time going forward to update TF to the current CUDA version. Thank you, Juan On Mon, Mar 11, 2024 at 5:30 AM Krzysztof Radzikowski < ***@***.***> wrote: > got it work :) first > > https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89-------------------------------- > > then download Local Installer for Ubuntu22.04 x86_64 (Deb) > < https://developer.nvidia.com/downloads/compute/cudnn/secure/8.9.7/local_installers/12.x/cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb/> > > unpack and install libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb > > ` > > sudo dpkg -i libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb > Selecting previously unselected package libcudnn8. > (Reading database ... 47318 files and directories currently installed.) > Preparing to unpack libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb ... > Unpacking libcudnn8 (8.9.7.29-1+cuda12.2) ... > Setting up libcudnn8 (8.9.7.29-1+cuda12.2) ... > > ` > > python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" > > > 2024-03-11 10:27:47.879686: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. > 2024-03-11 10:27:47.909157: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. > To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. > 2024-03-11 10:27:48.316717: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT > 2024-03-11 10:27:48.664469: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node > Your kernel may have been built without NUMA support. > 2024-03-11 10:27:48.688059: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node > Your kernel may have been built without NUMA support. > 2024-03-11 10:27:48.688111: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node > Your kernel may have been built without NUMA support. > [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] > > > — > Reply to this email directly, view it on GitHub > < #63362 (comment)>, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/AAGK34PV2IIF5FUZ73EPKOTYXV2SZAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBXHE3TANRRGU> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> > — Reply to this email directly, view it on GitHub <#63362 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AZRPJAGFJU5ZGBGHOSUK6DTYXWTU3AVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYGM4TINRSGQ> . You are receiving this because you commented.Message ID: ***@***.***>

JuanVargas · 2024-03-11T15:01:32Z

will try that and will let you know. Thank you for the suggestion. Juan On Mon, Mar 11, 2024 at 10:52 AM Shayan Shahrokhi ***@***.***> wrote:

…

It's just tensorflow can't see the Cuda libraries. Instal tensorflow[and-cuda] and add this to your .bashrc or conda activation script NVIDIA_PACKAGE_DIR="$CONDA_PREFIX/lib/python3.12/site-packages/nvidia" for dir in $NVIDIA_PACKAGE_DIR/*; do if [ -d "$dir/lib" ]; then export LD_LIBRARY_PATH="$dir/lib:$LD_LIBRARY_PATH" fi done On Mon, Mar 11, 2024, 7:04 a.m. Juan E. Vargas ***@***.***> wrote: > Hi Krzysztof > > I visited the site > > https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89-------------------------------- > > where I found an entry listed as " Local Installer for UBuntu22.04 > x86_64(Deb)" which I downloaded. > Unfortunately what I got is a package named > "cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb" > which is not the same as the name you suggest in your message, which is " > libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb" > > I assume what you meant is to get the libcudnn8_8.9.7.29*amd64.deb and > the cuda12.2_amd64.deb separately and install both. > > I have CUDA 12.4. I will not go back to trying to make TF 2.16.1 work with > older versions of CUDA (12.2 or 12.3) because sooner or later > the TF team will have to produce a version with the updated version of > CUDA. IMHO, rather than us wasting time going back in versions, > the TF beak should invest time going forward to update TF to the current > CUDA version. > > Thank you, Juan > > > On Mon, Mar 11, 2024 at 5:30 AM Krzysztof Radzikowski < > ***@***.***> wrote: > > > got it work :) first > > > > > https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89-------------------------------- > > > > then download Local Installer for Ubuntu22.04 x86_64 (Deb) > > < > https://developer.nvidia.com/downloads/compute/cudnn/secure/8.9.7/local_installers/12.x/cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb/> > > > > > unpack and install libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb > > > > ` > > > > sudo dpkg -i libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb > > Selecting previously unselected package libcudnn8. > > (Reading database ... 47318 files and directories currently installed.) > > Preparing to unpack libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb ... > > Unpacking libcudnn8 (8.9.7.29-1+cuda12.2) ... > > Setting up libcudnn8 (8.9.7.29-1+cuda12.2) ... > > > > ` > > > > python3 -c "import tensorflow as tf; > print(tf.config.list_physical_devices('GPU'))" > > > > > > 2024-03-11 10:27:47.879686: I tensorflow/core/util/port.cc:113] oneDNN > custom operations are on. You may see slightly different numerical results > due to floating-point round-off errors from different computation orders. > To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. > > 2024-03-11 10:27:47.909157: I > tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary > is optimized to use available CPU instructions in performance-critical > operations. > > To enable the following instructions: AVX2 AVX_VNNI FMA, in other > operations, rebuild TensorFlow with the appropriate compiler flags. > > 2024-03-11 10:27:48.316717: W > tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could > not find TensorRT > > 2024-03-11 10:27:48.664469: I > external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not > open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node > > Your kernel may have been built without NUMA support. > > 2024-03-11 10:27:48.688059: I > external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not > open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node > > Your kernel may have been built without NUMA support. > > 2024-03-11 10:27:48.688111: I > external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not > open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node > > Your kernel may have been built without NUMA support. > > [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] > > > > > > — > > Reply to this email directly, view it on GitHub > > < > #63362 (comment)>, > > > or unsubscribe > > < > https://github.com/notifications/unsubscribe-auth/AAGK34PV2IIF5FUZ73EPKOTYXV2SZAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBXHE3TANRRGU> > > > . > > You are receiving this because you were mentioned.Message ID: > > ***@***.***> > > > > — > Reply to this email directly, view it on GitHub > < #63362 (comment)>, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/AZRPJAGFJU5ZGBGHOSUK6DTYXWTU3AVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYGM4TINRSGQ> > . > You are receiving this because you commented.Message ID: > ***@***.***> > — Reply to this email directly, view it on GitHub <#63362 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGK34PLSBZFKQ5AQKIZ7X3YXXAMRAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYGYZTAMRSGY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

JuanVargas · 2024-03-11T15:29:34Z

Hi Shayan Shahrokhi Thank you for your suggestion (adding the location of the site-packages. I hope you would not mind if I ask : I saw that in your suggestion the name python 3.12 is listed. Is that the version of python that you used to test TF 2.16.1 compatibility with CUDA? Thank you, Juan

…

On Mon, Mar 11, 2024 at 11:01 AM Juan Vargas ***@***.***> wrote: will try that and will let you know. Thank you for the suggestion. Juan On Mon, Mar 11, 2024 at 10:52 AM Shayan Shahrokhi < ***@***.***> wrote: > It's just tensorflow can't see the Cuda libraries. > > Instal tensorflow[and-cuda] and add this to your .bashrc or conda > activation script > > NVIDIA_PACKAGE_DIR="$CONDA_PREFIX/lib/python3.12/site-packages/nvidia" > > for dir in $NVIDIA_PACKAGE_DIR/*; do > if [ -d "$dir/lib" ]; then > export LD_LIBRARY_PATH="$dir/lib:$LD_LIBRARY_PATH" > fi > done > > > > On Mon, Mar 11, 2024, 7:04 a.m. Juan E. Vargas ***@***.***> > wrote: > > > Hi Krzysztof > > > > I visited the site > > > > > https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89-------------------------------- > > > > where I found an entry listed as " Local Installer for UBuntu22.04 > > x86_64(Deb)" which I downloaded. > > Unfortunately what I got is a package named > > "cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb" > > which is not the same as the name you suggest in your message, which is > " > > libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb" > > > > I assume what you meant is to get the libcudnn8_8.9.7.29*amd64.deb and > > the cuda12.2_amd64.deb separately and install both. > > > > I have CUDA 12.4. I will not go back to trying to make TF 2.16.1 work > with > > older versions of CUDA (12.2 or 12.3) because sooner or later > > the TF team will have to produce a version with the updated version of > > CUDA. IMHO, rather than us wasting time going back in versions, > > the TF beak should invest time going forward to update TF to the > current > > CUDA version. > > > > Thank you, Juan > > > > > > On Mon, Mar 11, 2024 at 5:30 AM Krzysztof Radzikowski < > > ***@***.***> wrote: > > > > > got it work :) first > > > > > > > > > https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89-------------------------------- > > > > > > then download Local Installer for Ubuntu22.04 x86_64 (Deb) > > > < > > > https://developer.nvidia.com/downloads/compute/cudnn/secure/8.9.7/local_installers/12.x/cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb/> > > > > > > > > > unpack and install libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb > > > > > > ` > > > > > > sudo dpkg -i libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb > > > Selecting previously unselected package libcudnn8. > > > (Reading database ... 47318 files and directories currently > installed.) > > > Preparing to unpack libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb ... > > > Unpacking libcudnn8 (8.9.7.29-1+cuda12.2) ... > > > Setting up libcudnn8 (8.9.7.29-1+cuda12.2) ... > > > > > > ` > > > > > > python3 -c "import tensorflow as tf; > > print(tf.config.list_physical_devices('GPU'))" > > > > > > > > > 2024-03-11 10:27:47.879686: I tensorflow/core/util/port.cc:113] > oneDNN > > custom operations are on. You may see slightly different numerical > results > > due to floating-point round-off errors from different computation > orders. > > To turn them off, set the environment variable > `TF_ENABLE_ONEDNN_OPTS=0`. > > > 2024-03-11 10:27:47.909157: I > > tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow > binary > > is optimized to use available CPU instructions in performance-critical > > operations. > > > To enable the following instructions: AVX2 AVX_VNNI FMA, in other > > operations, rebuild TensorFlow with the appropriate compiler flags. > > > 2024-03-11 10:27:48.316717: W > > tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: > Could > > not find TensorRT > > > 2024-03-11 10:27:48.664469: I > > external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could > not > > open file to read NUMA node: > /sys/bus/pci/devices/0000:01:00.0/numa_node > > > Your kernel may have been built without NUMA support. > > > 2024-03-11 10:27:48.688059: I > > external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could > not > > open file to read NUMA node: > /sys/bus/pci/devices/0000:01:00.0/numa_node > > > Your kernel may have been built without NUMA support. > > > 2024-03-11 10:27:48.688111: I > > external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could > not > > open file to read NUMA node: > /sys/bus/pci/devices/0000:01:00.0/numa_node > > > Your kernel may have been built without NUMA support. > > > [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] > > > > > > > > > — > > > Reply to this email directly, view it on GitHub > > > < > > > #63362 (comment)>, > > > > > > or unsubscribe > > > < > > > https://github.com/notifications/unsubscribe-auth/AAGK34PV2IIF5FUZ73EPKOTYXV2SZAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBXHE3TANRRGU> > > > > > > . > > > You are receiving this because you were mentioned.Message ID: > > > ***@***.***> > > > > > > > — > > Reply to this email directly, view it on GitHub > > < > #63362 (comment)>, > > > or unsubscribe > > < > https://github.com/notifications/unsubscribe-auth/AZRPJAGFJU5ZGBGHOSUK6DTYXWTU3AVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYGM4TINRSGQ> > > > . > > You are receiving this because you commented.Message ID: > > ***@***.***> > > > > — > Reply to this email directly, view it on GitHub > <#63362 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAGK34PLSBZFKQ5AQKIZ7X3YXXAMRAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYGYZTAMRSGY> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >

sh-shahrokhi · 2024-03-11T15:31:23Z

It's the python in the environment that I installed tensorflow[and-cuda] there. On Mon, Mar 11, 2024, 9:30 a.m. Juan E. Vargas ***@***.***> wrote:

…

Hi Shayan Shahrokhi Thank you for your suggestion (adding the location of the site-packages. I hope you would not mind if I ask : I saw that in your suggestion the name python 3.12 is listed. Is that the version of python that you used to test TF 2.16.1 compatibility with CUDA? Thank you, Juan On Mon, Mar 11, 2024 at 11:01 AM Juan Vargas ***@***.***> wrote: > will try that and will let you know. Thank you for the suggestion. Juan > > > On Mon, Mar 11, 2024 at 10:52 AM Shayan Shahrokhi < > ***@***.***> wrote: > >> It's just tensorflow can't see the Cuda libraries. >> >> Instal tensorflow[and-cuda] and add this to your .bashrc or conda >> activation script >> >> NVIDIA_PACKAGE_DIR="$CONDA_PREFIX/lib/python3.12/site-packages/nvidia" >> >> for dir in $NVIDIA_PACKAGE_DIR/*; do >> if [ -d "$dir/lib" ]; then >> export LD_LIBRARY_PATH="$dir/lib:$LD_LIBRARY_PATH" >> fi >> done >> >> >> >> On Mon, Mar 11, 2024, 7:04 a.m. Juan E. Vargas ***@***.***> >> wrote: >> >> > Hi Krzysztof >> > >> > I visited the site >> > >> > >> https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89-------------------------------- >> > >> > where I found an entry listed as " Local Installer for UBuntu22.04 >> > x86_64(Deb)" which I downloaded. >> > Unfortunately what I got is a package named >> > "cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb" >> > which is not the same as the name you suggest in your message, which is >> " >> > libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb" >> > >> > I assume what you meant is to get the libcudnn8_8.9.7.29*amd64.deb and >> > the cuda12.2_amd64.deb separately and install both. >> > >> > I have CUDA 12.4. I will not go back to trying to make TF 2.16.1 work >> with >> > older versions of CUDA (12.2 or 12.3) because sooner or later >> > the TF team will have to produce a version with the updated version of >> > CUDA. IMHO, rather than us wasting time going back in versions, >> > the TF beak should invest time going forward to update TF to the >> current >> > CUDA version. >> > >> > Thank you, Juan >> > >> > >> > On Mon, Mar 11, 2024 at 5:30 AM Krzysztof Radzikowski < >> > ***@***.***> wrote: >> > >> > > got it work :) first >> > > >> > > >> > >> https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89-------------------------------- >> > > >> > > then download Local Installer for Ubuntu22.04 x86_64 (Deb) >> > > < >> > >> https://developer.nvidia.com/downloads/compute/cudnn/secure/8.9.7/local_installers/12.x/cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb/> >> >> > >> > > >> > > unpack and install libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb >> > > >> > > ` >> > > >> > > sudo dpkg -i libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb >> > > Selecting previously unselected package libcudnn8. >> > > (Reading database ... 47318 files and directories currently >> installed.) >> > > Preparing to unpack libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb ... >> > > Unpacking libcudnn8 (8.9.7.29-1+cuda12.2) ... >> > > Setting up libcudnn8 (8.9.7.29-1+cuda12.2) ... >> > > >> > > ` >> > > >> > > python3 -c "import tensorflow as tf; >> > print(tf.config.list_physical_devices('GPU'))" >> > > >> > > >> > > 2024-03-11 10:27:47.879686: I tensorflow/core/util/port.cc:113] >> oneDNN >> > custom operations are on. You may see slightly different numerical >> results >> > due to floating-point round-off errors from different computation >> orders. >> > To turn them off, set the environment variable >> `TF_ENABLE_ONEDNN_OPTS=0`. >> > > 2024-03-11 10:27:47.909157: I >> > tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow >> binary >> > is optimized to use available CPU instructions in performance-critical >> > operations. >> > > To enable the following instructions: AVX2 AVX_VNNI FMA, in other >> > operations, rebuild TensorFlow with the appropriate compiler flags. >> > > 2024-03-11 10:27:48.316717: W >> > tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: >> Could >> > not find TensorRT >> > > 2024-03-11 10:27:48.664469: I >> > external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could >> not >> > open file to read NUMA node: >> /sys/bus/pci/devices/0000:01:00.0/numa_node >> > > Your kernel may have been built without NUMA support. >> > > 2024-03-11 10:27:48.688059: I >> > external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could >> not >> > open file to read NUMA node: >> /sys/bus/pci/devices/0000:01:00.0/numa_node >> > > Your kernel may have been built without NUMA support. >> > > 2024-03-11 10:27:48.688111: I >> > external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could >> not >> > open file to read NUMA node: >> /sys/bus/pci/devices/0000:01:00.0/numa_node >> > > Your kernel may have been built without NUMA support. >> > > [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] >> > > >> > > >> > > — >> > > Reply to this email directly, view it on GitHub >> > > < >> > >> #63362 (comment)>, >> >> > >> > > or unsubscribe >> > > < >> > >> https://github.com/notifications/unsubscribe-auth/AAGK34PV2IIF5FUZ73EPKOTYXV2SZAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBXHE3TANRRGU> >> >> > >> > > . >> > > You are receiving this because you were mentioned.Message ID: >> > > ***@***.***> >> > > >> > >> > — >> > Reply to this email directly, view it on GitHub >> > < >> #63362 (comment)>, >> >> > or unsubscribe >> > < >> https://github.com/notifications/unsubscribe-auth/AZRPJAGFJU5ZGBGHOSUK6DTYXWTU3AVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYGM4TINRSGQ> >> >> > . >> > You are receiving this because you commented.Message ID: >> > ***@***.***> >> > >> >> — >> Reply to this email directly, view it on GitHub >> < #63362 (comment)>, >> or unsubscribe >> < https://github.com/notifications/unsubscribe-auth/AAGK34PLSBZFKQ5AQKIZ7X3YXXAMRAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYGYZTAMRSGY> >> . >> You are receiving this because you were mentioned.Message ID: >> ***@***.***> >> > — Reply to this email directly, view it on GitHub <#63362 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AZRPJACFR4YH7UCXMOEH7MTYXXEXVAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYG4YTQMZVGA> . You are receiving this because you commented.Message ID: ***@***.***>

damadorPL · 2024-03-11T15:38:17Z

https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ you can get .deb file there directrly

Gwyki · 2024-03-11T16:15:36Z

Thanks @sh-shahrokhi. I thought it was path related. Modified slightly to make it python version independent if you put it in your conda environment activation ([environment]/etc/activate.d/env_vars.sh).

NVIDIA_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)")))
for dir in $NVIDIA_DIR/*; do
    if [ -d "$dir/lib" ]; then
        export LD_LIBRARY_PATH="$dir/lib:$LD_LIBRARY_PATH"
    fi
done

This is not a resolution as this post install step should not be necessary.

W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

I can't seem to do similar tricks to resolve the TensorRT issues when installed similarly into the conda environment. Any ideas?

sh-shahrokhi · 2024-03-11T23:50:00Z

Thanks @sh-shahrokhi. I thought it was path related. Modified slightly to make it python version independent if you put it in your conda environment activation ([environment]/etc/activate.d/env_vars.sh).
NVIDIA_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)")))
for dir in $NVIDIA_DIR/*; do
    if [ -d "$dir/lib" ]; then
        export LD_LIBRARY_PATH="$dir/lib:$LD_LIBRARY_PATH"
    fi
done
This is not a resolution as this post install step should not be necessary.

W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

I can't seem to do similar tricks to resolve the TensorRT issues when installed similarly into the conda environment. Any ideas?

I don't actually use TensorRT, but I would check if the required .so file for it is visible to tensorflow. Maybe I would need to find the name of required file in tensorflow source code.

This actually doesn't change the fact that the new tensorflow version should be tested by google team before release, or the bugs should be fixed. It seems they only care about having a working docker image, not anything else.

Gwyki · 2024-03-12T00:48:53Z

I have given up on TensorRT. I guess I won't be using it either.

This actually doesn't change the fact that the new tensorflow version should be tested by google team before release, or the bugs should be fixed. It seems they only care about having a working docker image, not anything else.

Agreed. Installing TF has always been hit or miss and it seems that in the many years since I last used TF that hasn't changed one bit.

moozoo64 · 2024-03-12T00:50:13Z

Well, I wasted 8hr of my Sunday on this setting up another pc from scratch. Before reverting to the old version. Now looking to move off tensor flow.

mihaimaruseac · 2024-03-12T03:51:26Z

In general, we used to test RC versions before release. For example, we used to have RC0, RC1 and RC2 for TF 2.9. This gave people and downstream teams enough time to test and report issues.

It seems that 2.16.1 only had an RC0 (for 2.16.0).

The release process is (was?) like this:

cut the release branch (e.g., r2.17)
immediately trigger the release pipeline. This would create a few PRs to update version numbers, release notes, but after this step RC0 should be as close as possible to the version on master branch at the time the release branch has been cut. There should not be any code changes to the release branch at this point (except to maybe cherrypick fixes from master from hard bugs caused by cutting the branch at a wrong commit)
have at least a week of testing for downstream teams to test RC0
get fixes to discovered bugs landed on master, cherrypick them to release branch, after they are already tested on nightly releases
trigger RC1 pipeline. Again, no other code changes should occur now, except to fix bugs discovered during building
wait a week for downstream teams to test. If there are bugs, repeat the steps above for another RC, otherwise repeat the steps above for the final version.

Overall, this process would take number_of_RCs + 1 weeks with a possibility of a few more weeks of delay.

However, for 2.16 release, although the branch was cut on Feb 8th, there has been only one RC. Most likely issues can be solved by a patch release

google-ml-butler · 2024-03-12T13:02:08Z

Are you satisfied with the resolution of your issue?
Yes
No

JuanVargas · 2024-03-12T13:02:56Z

I am closing this (unresolved issue) because I am told by the Keras/TF team that the issue is related to TF.

sgkouzias · 2024-05-31T19:11:59Z

I'm generally confused about this setup with WSL 2. Where, what exactly needs to be installed.

When attempting to install using the command pip install tensorflow[and-cuda]

The following error is displayed:

ERROR: Could not find a version that satisfies the requirement nvidia-nccl-cu12==2.19.3; extra == "and-cuda" (from tensorflow[and-cuda]) (from versions: 0.0.1.dev5)
ERROR: No matching distribution found for nvidia-nccl-cu12==2.19.3; extra == "and-cuda"

tenserflow 2.16.1 To support it, you need this:

python version:

Python 3.12.0

I'm running the command nvidia-smi and this is what it gives me:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.85                 Driver Version: 555.85         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1050 Ti   WDDM  |   00000000:01:00.0  On |                  N/A |
|  0%   50C    P8             N/A /   90W |     863MiB /   4096MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

I installed cuda and cuDNN

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Nov_22_10:30:42_Pacific_Standard_Time_2023
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_0

wsl 2 (uname -m && cat /etc/*release):

x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

I read in the nvidia manual that you need to install the cuda toolkit in linux. Installed version 12.3 and still I can't install tenserflow.

Please explain it to me. What do i need to install where.

What do i need to install inside wsl 2 and what do i need to install inside windows.

I am confused with toolkit and cuDNN. What should be installed where

@MrOxMasTer the purpose of installing WSL 2 is to "install TensorFlow inside" and as a process it is irrelevant to installing the CUDA Toolkit and cuDNN on Windows. Generally, installing WSL 2 allows you to run a full Linux environment within Windows, making it easier to develop and run applications that rely on Linux-based tools and libraries as TensorFlow version 2.16.1. More specifically, in order to run the latest version of TensorFlow and utilize your GPU in WSL 2 according to TensorFlow official documentation you must proceed with the TensorFlow installation procedure on WSL 2. Hopefully that makes more sense.

thephet · 2024-06-03T13:38:53Z

@sgkouzias Thank you. The solution from @niko247 worked and it is what I am using

MrOxMasTer · 2024-06-06T21:29:41Z

@MrOxMasTer the purpose of installing WSL 2 is to "install TensorFlow inside" and as a process it is irrelevant to installing the CUDA Toolkit and cuDNN on Windows. Generally, installing WSL 2 allows you to run a full Linux environment within Windows, making it easier to develop and run applications that rely on Linux-based tools and libraries as TensorFlow version 2.16.1. More specifically, in order to run the latest version of TensorFlow and utilize your GPU in WSL 2 according to TensorFlow official documentation you must proceed with the TensorFlow installation procedure on WSL 2. Hopefully that makes more sense.

I also have to guess what the installation problem is:

The most terrible setup I've ever seen in my entire life. It is not clear where I need to enter these commands, and why, if all commands need to be entered in wsl2, then why do you have a command in your sentence that is entered only on the side where the graphics driver is installed, because clearly

After all, it is clearly written in white in nvidia that the graphics driver does not need to be installed in wsl, and for some reason you offer me to enter a command that works only on the windows side, but at the same time you say that the entire installation in wsl takes place inside.

How am I supposed to know what to install inside wsl, if it's just from the point of view of a person who uses such functionality for the first time, it sounds like nonsense, because I don't understand how to use it and naturally I will try to run something in windows, because I work in windows. Even if I install it, how would I know that for example in vs code there is an extension like “wsl” that allows you to connect to wsl?

rkuo2000 · 2024-06-07T03:10:04Z

@MrOxMasTer you can do every installation (mostly python packages) "pip install" or "sudo apt install" in WSL ubuntu like you are using a PC running ubuntu OS , except CUDA, CuDNN installation for Windows.

sgkouzias · 2024-06-07T05:19:51Z

@MrOxMasTer since you work in Windows you could simply refer to the TensorFlow official documentation to install TensorFlow with pip for Windows WSL2 (aka Windows Subsystem for Linux) and open the provided link for the official CUDA on WSL User Guide.

Notice that in the official CUDA on WSL User Guide is clearly stated that:

"Once a Windows NVIDIA GPU driver is installed on the system, CUDA becomes available within WSL 2. The CUDA driver installed on Windows host will be stubbed inside the WSL 2 as libcuda.so, therefore users must not install any NVIDIA GPU Linux driver within WSL 2...."

Also kindly note that the current issue opened "TF 2.16.1 Fails to work with GPUs" involves Linux Operating Systems and potentially the additional steps to be specified in the official TensorFlow documentation in order to utilize GPUs locally.

Until today the officially documented TensorFlow standard installation procedure for Linux users with GPUs does not include the additional steps required to perform deep learning experiments with TensorFlow version 2.16.1 and utilize GPU locally. That's why I submitted a pull request (pending review) in good faith and for the shake of all users as TensorFlow is "An Open Source Machine Learning Framework for Everyone".

Hope that the next patch version of TensorFlow will fix the bug as soon as possible!

MrOxMasTer · 2024-06-07T21:03:20Z

"Once a Windows NVIDIA GPU driver is installed on the system, CUDA becomes available within WSL 2. The CUDA driver installed on Windows host will be stubbed inside the WSL 2 as libcuda.so, therefore users must not install any NVIDIA GPU Linux driver within WSL 2...."

Yes, I say for this that you do not need to install graphics drivers in wsl, but installing tensorflow involves not only installing graphics drivers, but also cuDNN and cudaToolkit (possibly TensorRT) and it was not clear to me specifically where it needed to be installed. Then I saw that I needed to install everything inside wsl except the graphics driver

MrOxMasTer · 2024-06-07T21:05:45Z

Also kindly note that the current issue opened "TF 2.16.1 Fails to work with GPUs" involves Linux Operating Systems and potentially the additional steps to be specified in the official TensorFlow documentation in order to utilize GPUs locally.

I started a not very pleasant acquaintance with tensorflow with this version. As I understand it, the specific reason is 2.16.1 and it does not work in wsl. Because nothing worked for me. And the question is which version can be installed so that it works normally in wsl.

Also, for the future, I will say that installing anaconda does not help either. You can install a maximum of 2.10 version on it

sgkouzias · 2024-06-07T21:25:51Z

Also kindly note that the current issue opened "TF 2.16.1 Fails to work with GPUs" involves Linux Operating Systems and potentially the additional steps to be specified in the official TensorFlow documentation in order to utilize GPUs locally.

I started a not very pleasant acquaintance with tensorflow with this version. As I understand it, the specific reason is 2.16.1 and it does not work in wsl. Because nothing worked for me. And the question is which version can be installed so that it works normally in wsl.

Also, for the future, I will say that installing anaconda does not help either. You can install a maximum of 2.10 version on it

@MrOxMasTer I totally understand your frustration but I reassure you that TensorFlow version 2.16.1 can actually work with your cuda-enabled GPU.

You can try the following:

Create a fresh conda virtual environment in WSL and activate it, like this:

conda create --name tf python=3.11
conda activate tf

Within the fresh conda virtual environment tf created in the previous step run the following commands sequentially:

pip install --upgrade pip
pip install tensorflow[and-cuda]

Set environment variables:

Note: This step is required in order to utilize your GPU but not yet included in the official TensorFlow documentation. All NVIDIA libs are installed with TensorFlow due to the fact you ran the command pip install tensorflow[and-cuda] in the previous step!

Locate the directory for the conda environment in your terminal window by running in the terminal:

echo $CONDA_PREFIX

Enter that directory and create these subdirectories and files:

cd $CONDA_PREFIX
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
touch ./etc/conda/activate.d/env_vars.sh
touch ./etc/conda/deactivate.d/env_vars.sh

Edit ./etc/conda/activate.d/env_vars.sh as follows:

#!/bin/sh

# Store original LD_LIBRARY_PATH 
export ORIGINAL_LD_LIBRARY_PATH="${LD_LIBRARY_PATH}" 

# Get the CUDNN directory 
CUDNN_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn; print(nvidia.cudnn.__file__)")))

# Set LD_LIBRARY_PATH to include CUDNN directory
export LD_LIBRARY_PATH=$(find ${CUDNN_DIR}/*/lib/ -type d -printf "%p:")${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

# Get the ptxas directory  
PTXAS_DIR=$(dirname $(dirname $(python -c "import nvidia.cuda_nvcc; print(nvidia.cuda_nvcc.__file__)")))

# Set PATH to include the directory containing ptxas
export PATH=$(find ${PTXAS_DIR}/*/bin/ -type d -printf "%p:")${PATH:+:${PATH}}

Edit ./etc/conda/deactivate.d/env_vars.sh as follows:

#!/bin/sh

# Restore original LD_LIBRARY_PATH
export LD_LIBRARY_PATH="${ORIGINAL_LD_LIBRARY_PATH}"

# Unset environment variables
unset CUDNN_DIR
unset PTXAS_DIR

Verify the GPU setup:
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Additionally, as I was informed the next version of TensorFlow will hopefully arrive within the next days!

I hope it helps!

GorillaDaddy · 2024-06-08T05:56:19Z

Thanks, but it doesn't work for me @sgkouzias. It does at least find some files. However, despite the verify gpu setup saying that 1 gpu is available, no gpu activity... all cpu

sgkouzias · 2024-06-08T06:33:31Z

Thanks, but it doesn't work for me @sgkouzias. It does at least find some files. However, despite the verify gpu setup saying that 1 gpu is available, no gpu activity... all cpu

@GorillaDaddy well in order to work your setup should meet some certain technical requirements (please first check the official TensorFlow documentation). As I am not aware of your setup I could not possibly guess (even if I could provide some useful assistance for you) why it seems that your GPU is not properly utilized (if it is the case). However, here are some hints that I hope will help you:
a) check your OS compatibility,
b) Also check whether your GPU is compatible, and of course
c) check the Python version compatibility with the desired TensorFlow version,
d) train a deep learning model in Google Colab (you can use a TensorFlow ready-to-use dataset) by utilizing a GPU and time it. Train the same model on the same data on your PC and compare the training time.

MrOxMasTer · 2024-06-08T08:44:31Z

Also kindly note that the current issue opened "TF 2.16.1 Fails to work with GPUs" involves Linux Operating Systems and potentially the additional steps to be specified in the official TensorFlow documentation in order to utilize GPUs locally.

Hooray, it worked, thanks, but I have a mistake with this NUMA. Is this normal? Could it be because I did not install Cuda and Cdn on behalf of the administrator?

sgkouzias · 2024-06-08T10:49:30Z

Also kindly note that the current issue opened "TF 2.16.1 Fails to work with GPUs" involves Linux Operating Systems and potentially the additional steps to be specified in the official TensorFlow documentation in order to utilize GPUs locally.

Hooray, it worked, thanks, but I have a mistake with this NUMA. Is this normal? Could it be because I did not install Cuda and Cdn on behalf of the administrator?

@MrOxMasTer congratulations and thanks for the feedback.

The error "Your kernel may have been built without NUMA support" refers to the lack of NUMA (Non-Uniform Memory Access) support in the kernel you are using. NUMA is a memory architecture used in multiprocessor systems where the memory access time depends on the memory location relative to the processor.

NUMA support is important for optimizing memory access on systems with multiple CPUs or GPUs. It allows the operating system to allocate memory and schedule processes in a way that reduces memory access latency.

The Windows Subsystem for Linux (aka WSL) provides a Linux-compatible kernel interface developed by Microsoft and allows you to run Linux binaries on Windows. However, WSL's kernel might lack certain features present in a full-fledged Linux kernel, including NUMA support.

The lack of NUMA support might lead to suboptimal performance on systems with multiple processors or GPUs because the memory allocation might not be as efficient.

Consequently you can safely ignore the warning (you can read more about the warning in this discussion in nvidia.developer.com ).

rednag · 2024-06-10T10:07:03Z

I'm also facing this issue as the op since I've upgraded to 2.16.1. After the downgrade to 2.15.1 everything runs smooth.

TensorFlow version
TF 2.16.1

OS platform and distribution
Linux Ubuntu 22.04.4 LTS

Python version
3.10.12

CUDA/cuDNN version
12.4

Actually I want to import the ops package from keras, but it seems it is firstly available on keras 3. If I upgrade keras I also have to upgrade tensorflow due to incompatibilities... but after the upgrade I'm not able to use the GPU anymore.

sgkouzias · 2024-06-10T10:23:31Z

@rednag as I understand it you have two available options:

1) Keep TensorFlow version 2.15 and reinstall Keras 3 afterwards
According to the official Keras documentation you can simply:
pip install --upgrade keras after installing tensorflow version 2.15

2) Upgrade Tensorflow to version 2.16.1
You can upgrade to TensorFlow version 2.16.1 and utilize your GPU locally (Keras 3.0 will be installed as well) through following the steps below:

Create a fresh conda virtual environment and activate it like this:

conda create --name tf python=3.11
conda activate tf

pip install --upgrade pip,
pip install tensorflow[and-cuda],
Set environment variables:

Locate the directory for the conda environment in your terminal window by running in the terminal:

echo $CONDA_PREFIX

Enter that directory and create these subdirectories and files:

cd $CONDA_PREFIX
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
touch ./etc/conda/activate.d/env_vars.sh
touch ./etc/conda/deactivate.d/env_vars.sh

Edit ./etc/conda/activate.d/env_vars.sh as follows:

#!/bin/sh

# Store original LD_LIBRARY_PATH 
export ORIGINAL_LD_LIBRARY_PATH="${LD_LIBRARY_PATH}" 

# Get the CUDNN directory 
CUDNN_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn; print(nvidia.cudnn.__file__)")))

# Set LD_LIBRARY_PATH to include CUDNN directory
export LD_LIBRARY_PATH=$(find ${CUDNN_DIR}/*/lib/ -type d -printf "%p:")${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

# Get the ptxas directory  
PTXAS_DIR=$(dirname $(dirname $(python -c "import nvidia.cuda_nvcc; print(nvidia.cuda_nvcc.__file__)")))

# Set PATH to include the directory containing ptxas
export PATH=$(find ${PTXAS_DIR}/*/bin/ -type d -printf "%p:")${PATH:+:${PATH}}

Edit ./etc/conda/deactivate.d/env_vars.sh as follows:

#!/bin/sh

# Restore original LD_LIBRARY_PATH
export LD_LIBRARY_PATH="${ORIGINAL_LD_LIBRARY_PATH}"

# Unset environment variables
unset CUDNN_DIR
unset PTXAS_DIR

Verify the GPU setup:
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

I have submitted the respective pull request to update the official TensorFlow installation guide and is currently pending review.

Additionally, as I was informed the next version of TensorFlow will hopefully arrive within the next days!

I hope it helps!

rednag · 2024-06-10T11:55:34Z

Thank you for the fast reply. At the moment I'm using the old functions from the keras.src.utils and tf packages, but I'm looking forward to the new release.

sgkouzias · 2024-06-10T12:56:54Z

Thank you for the fast reply. At the moment I'm using the old functions from the keras.src.utils and tf packages, but I'm looking forward to the new release.

@rednag great. Another option to consider for fast model training with Keras 3 and GPU acceleration is to use JAX as keras backend.

eabase · 2024-06-14T01:21:27Z

Can someone care to explain why TF >2.10 cannot be run with GPU in native windows?
This totally makes no sense whatsoever, as all other HW, WSL, and Conda works with GPU. Including other python packages, such as Torch. So what is going on?

I.e. What is the problem and why is it not being addressed by the community?

sh-shahrokhi · 2024-06-14T02:26:42Z

Can someone care to explain why TF >2.10 cannot be run with GPU in native windows? This totally makes no sense whatsoever, as all other HW, WSL, and Conda works with GPU. Including other python packages, such as Torch. So what is going on?

I.e. What is the problem and why is it not being addressed by the community?

Google removed the native windows cuda build starting TF 2.11
There is nothing you can do about it, building from source with cuda will also fail in windows.

mihaimaruseac · 2024-06-14T19:08:55Z

Everyone that cared about full support of TF is no longer in the team. See above comments for more details and differences

eabase · 2024-06-15T00:12:30Z

@sh-shahrokhi

Google removed the native windows cuda build starting TF 2.11

Unfortunately that doesn't say anything. I don't see how you can "remove" any of that, apart from breaking the build scripts. Whatever you "remove" must still be present for all other nix builds. WSL is not that different from MSYS, MinGW, which (no longer) is too far from VS C/C++ builds.

sh-shahrokhi · 2024-06-15T03:16:55Z

@sh-shahrokhi

Google removed the native windows cuda build starting TF 2.11

Unfortunately that doesn't say anything. I don't see how you can "remove" any of that, apart from breaking the build scripts. Whatever you "remove" must still be present for all other nix builds. WSL is not that different from MSYS, MinGW, which (no longer) is too far from VS C/C++ builds.

#58629
Also:
#59918

ben-jy · 2024-06-17T11:33:14Z

Also kindly note that the current issue opened "TF 2.16.1 Fails to work with GPUs" involves Linux Operating Systems and potentially the additional steps to be specified in the official TensorFlow documentation in order to utilize GPUs locally.

I started a not very pleasant acquaintance with tensorflow with this version. As I understand it, the specific reason is 2.16.1 and it does not work in wsl. Because nothing worked for me. And the question is which version can be installed so that it works normally in wsl.
Also, for the future, I will say that installing anaconda does not help either. You can install a maximum of 2.10 version on it

@MrOxMasTer I totally understand your frustration but I reassure you that TensorFlow version 2.16.1 can actually work with your cuda-enabled GPU.

You can try the following:

Create a fresh conda virtual environment in WSL and activate it, like this:
conda create --name tf python=3.11
conda activate tf
Within the fresh conda virtual environment tf created in the previous step run the following commands sequentially:
pip install --upgrade pip
pip install tensorflow[and-cuda]
Set environment variables:

Note: This step is required in order to utilize your GPU but not yet included in the official TensorFlow documentation. All NVIDIA libs are installed with TensorFlow due to the fact you ran the command pip install tensorflow[and-cuda] in the previous step!

Locate the directory for the conda environment in your terminal window by running in the terminal:

echo $CONDA_PREFIX

Enter that directory and create these subdirectories and files:
cd $CONDA_PREFIX
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
touch ./etc/conda/activate.d/env_vars.sh
touch ./etc/conda/deactivate.d/env_vars.sh
Edit ./etc/conda/activate.d/env_vars.sh as follows:
#!/bin/sh

# Store original LD_LIBRARY_PATH 
export ORIGINAL_LD_LIBRARY_PATH="${LD_LIBRARY_PATH}" 

# Get the CUDNN directory 
CUDNN_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn; print(nvidia.cudnn.__file__)")))

# Set LD_LIBRARY_PATH to include CUDNN directory
export LD_LIBRARY_PATH=$(find ${CUDNN_DIR}/*/lib/ -type d -printf "%p:")${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

# Get the ptxas directory  
PTXAS_DIR=$(dirname $(dirname $(python -c "import nvidia.cuda_nvcc; print(nvidia.cuda_nvcc.__file__)")))

# Set PATH to include the directory containing ptxas
export PATH=$(find ${PTXAS_DIR}/*/bin/ -type d -printf "%p:")${PATH:+:${PATH}}
Edit ./etc/conda/deactivate.d/env_vars.sh as follows:
#!/bin/sh

# Restore original LD_LIBRARY_PATH
export LD_LIBRARY_PATH="${ORIGINAL_LD_LIBRARY_PATH}"

# Unset environment variables
unset CUDNN_DIR
unset PTXAS_DIR
Verify the GPU setup: python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Additionally, as I was informed the next version of TensorFlow will hopefully arrive within the next days!

I hope it helps!

Doesn't work for me :/ I even reinstalled completely WSL, but I still get an empty list when showing the available devices... Should CUDA be unistalled on Windows side ? When I use "nvidia-smi", it is written that I have the 12.5 Cuda Version, even if I didn't install anything on WSL... Is that normal ?

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.52.01 Driver Version: 555.99 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+

sgkouzias · 2024-06-17T15:54:11Z

Also kindly note that the current issue opened "TF 2.16.1 Fails to work with GPUs" involves Linux Operating Systems and potentially the additional steps to be specified in the official TensorFlow documentation in order to utilize GPUs locally.

I started a not very pleasant acquaintance with tensorflow with this version. As I understand it, the specific reason is 2.16.1 and it does not work in wsl. Because nothing worked for me. And the question is which version can be installed so that it works normally in wsl.
Also, for the future, I will say that installing anaconda does not help either. You can install a maximum of 2.10 version on it

@MrOxMasTer I totally understand your frustration but I reassure you that TensorFlow version 2.16.1 can actually work with your cuda-enabled GPU.

You can try the following:

Create a fresh conda virtual environment in WSL and activate it, like this:
conda create --name tf python=3.11
conda activate tf
Within the fresh conda virtual environment tf created in the previous step run the following commands sequentially:
pip install --upgrade pip
pip install tensorflow[and-cuda]
Set environment variables:

Note: This step is required in order to utilize your GPU but not yet included in the official TensorFlow documentation. All NVIDIA libs are installed with TensorFlow due to the fact you ran the command pip install tensorflow[and-cuda] in the previous step!

Locate the directory for the conda environment in your terminal window by running in the terminal:

echo $CONDA_PREFIX

Enter that directory and create these subdirectories and files:
cd $CONDA_PREFIX
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
touch ./etc/conda/activate.d/env_vars.sh
touch ./etc/conda/deactivate.d/env_vars.sh
Edit ./etc/conda/activate.d/env_vars.sh as follows:
#!/bin/sh

# Store original LD_LIBRARY_PATH 
export ORIGINAL_LD_LIBRARY_PATH="${LD_LIBRARY_PATH}" 

# Get the CUDNN directory 
CUDNN_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn; print(nvidia.cudnn.__file__)")))

# Set LD_LIBRARY_PATH to include CUDNN directory
export LD_LIBRARY_PATH=$(find ${CUDNN_DIR}/*/lib/ -type d -printf "%p:")${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

# Get the ptxas directory  
PTXAS_DIR=$(dirname $(dirname $(python -c "import nvidia.cuda_nvcc; print(nvidia.cuda_nvcc.__file__)")))

# Set PATH to include the directory containing ptxas
export PATH=$(find ${PTXAS_DIR}/*/bin/ -type d -printf "%p:")${PATH:+:${PATH}}
Edit ./etc/conda/deactivate.d/env_vars.sh as follows:
#!/bin/sh

# Restore original LD_LIBRARY_PATH
export LD_LIBRARY_PATH="${ORIGINAL_LD_LIBRARY_PATH}"

# Unset environment variables
unset CUDNN_DIR
unset PTXAS_DIR
Verify the GPU setup: python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Additionally, as I was informed the next version of TensorFlow will hopefully arrive within the next days!

I hope it helps!
Doesn't work for me :/ I even reinstalled completely WSL, but I still get an empty list when showing the available devices... Should CUDA be unistalled on Windows side ? When I use "nvidia-smi", it is written that I have the 12.5 Cuda Version, even if I didn't install anything on WSL... Is that normal ?

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.52.01 Driver Version: 555.99 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+

@ben-jy frankly I have no clue. Did you check the official documentation? Your setup meets the technical requirements? What's the Python version in WSL2? Is it compatible with TensorFlow 2.16.1? What's the name of your NVIDIA GPU? The output of the command nvidia-smi in WSL2 seems normal since your GPU driver is installed in Windows. However you could try reinstalling everything (compatible GPU driver, afterwards WSL2 and then TensorFlow)...

sh-shahrokhi · 2024-06-17T16:46:53Z

Official documentation has not been updated fot TF 2.16 and still refers to cuda 11.8 not 12.

…

________________________________ From: Sotiris Gkouzias ***@***.***> Sent: Monday, June 17, 2024 9:54 AM To: tensorflow/tensorflow ***@***.***> Cc: Shayan Shahrokhi ***@***.***>; Mention ***@***.***> Subject: Re: [tensorflow/tensorflow] TF 2.16.1 Fails to work with GPUs (Issue #63362) Also kindly note that the current issue opened "TF 2.16.1 Fails to work with GPUs" involves Linux Operating Systems and potentially the additional steps to be specified in the official TensorFlow documentation in order to utilize GPUs locally. I started a not very pleasant acquaintance with tensorflow with this version. As I understand it, the specific reason is 2.16.1 and it does not work in wsl. Because nothing worked for me. And the question is which version can be installed so that it works normally in wsl. Also, for the future, I will say that installing anaconda does not help either. You can install a maximum of 2.10 version on it @MrOxMasTer<https://github.com/MrOxMasTer> I totally understand your frustration but I reassure you that TensorFlow version 2.16.1 can actually work with your cuda-enabled GPU. You can try the following: 1. Create a fresh conda virtual environment in WSL and activate it, like this: conda create --name tf python=3.11 conda activate tf 1. Within the fresh conda virtual environment tf created in the previous step run the following commands sequentially: pip install --upgrade pip pip install tensorflow[and-cuda] 1. Set environment variables: Note: This step is required in order to utilize your GPU but not yet included in the official TensorFlow documentation. All NVIDIA libs are installed with TensorFlow due to the fact you ran the command pip install tensorflow[and-cuda] in the previous step! Locate the directory for the conda environment in your terminal window by running in the terminal: echo $CONDA_PREFIX Enter that directory and create these subdirectories and files: cd $CONDA_PREFIX mkdir -p ./etc/conda/activate.d mkdir -p ./etc/conda/deactivate.d touch ./etc/conda/activate.d/env_vars.sh touch ./etc/conda/deactivate.d/env_vars.sh Edit ./etc/conda/activate.d/env_vars.sh as follows: #!/bin/sh # Store original LD_LIBRARY_PATH export ORIGINAL_LD_LIBRARY_PATH="${LD_LIBRARY_PATH}" # Get the CUDNN directory CUDNN_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn; print(nvidia.cudnn.__file__)"))) # Set LD_LIBRARY_PATH to include CUDNN directory export LD_LIBRARY_PATH=$(find ${CUDNN_DIR}/*/lib/ -type d -printf "%p:")${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} # Get the ptxas directory PTXAS_DIR=$(dirname $(dirname $(python -c "import nvidia.cuda_nvcc; print(nvidia.cuda_nvcc.__file__)"))) # Set PATH to include the directory containing ptxas export PATH=$(find ${PTXAS_DIR}/*/bin/ -type d -printf "%p:")${PATH:+:${PATH}} Edit ./etc/conda/deactivate.d/env_vars.sh as follows: #!/bin/sh # Restore original LD_LIBRARY_PATH export LD_LIBRARY_PATH="${ORIGINAL_LD_LIBRARY_PATH}" # Unset environment variables unset CUDNN_DIR unset PTXAS_DIR Verify the GPU setup: python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" Additionally, as I was informed the next version of TensorFlow will hopefully arrive within the next days! I hope it helps! Doesn't work for me :/ I even reinstalled completely WSL, but I still get an empty list when showing the available devices... Should CUDA be unistalled on Windows side ? When I use "nvidia-smi", it is written that I have the 12.5 Cuda Version, even if I didn't install anything on WSL... Is that normal ? +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 555.52.01 Driver Version: 555.99 CUDA Version: 12.5 | |-----------------------------------------+------------------------+----------------------+ @ben-jy<https://github.com/ben-jy> frankly I have no clue. Did you check the official documentation<https://www.tensorflow.org/install/pip#windows-wsl2>? Your setup meets the technical requirements? What's the Python version in WSL2? Is it compatible with TensorFlow 2.16.1? What's the name of your NVIDIA GPU? The output of the command nvidia-smi in WSL2 seems normal since your GPU driver is installed in Windows. However you could try reinstalling everything (compatible GPU driver, afterwards WSL2 and then TensorFlow)... — Reply to this email directly, view it on GitHub<#63362 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AZRPJAD4P5CMSQLZQOQO5CTZH4BEJAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZTG43TKOBSGU>. You are receiving this because you were mentioned.Message ID: ***@***.***>

ben-jy · 2024-06-18T06:32:13Z

@sgkouzias I checked the official documentation, but I find it not very clear, and seems a bit contradictory: the software requirements state that CUDA and cuDNN should be installed on the machine, but the pip package should install them automatically with Tensorflow right ? Besides, this medium tutorial explain that CUDA should not be installed on Windows side, neither on WSL side, and be installed using the pip package. Maybe I should try to uninstall all CUDA-related on Windows...
Concerning your other questions:

I have an RTX 3070 Ti, which is in the list of CUDA-enabled product.
I use conda and I tried the install with Python 3.10 and 3.11, which are in the software requirements of the officiel documentation. Those versions are said compatible with TensorFlow 2.16.1, according to the PyPi package tags

I will try to make a clean reinstall of my GPU driver, as well as unistalling CUDA on Windows side. If it doesn't work, I think it is better to install CUDA and cuDNN manually, along with an oldest TensorFlow version. It is still a shame that the official documentation of such a large and important library is so unclear.

google-ml-butler bot added the type:bug Bug label Mar 10, 2024

google-ml-butler bot assigned SuryanarayanaY Mar 10, 2024

SuryanarayanaY added comp:gpu GPU related issues TF 2.16 labels Mar 11, 2024

SuryanarayanaY added the stat:awaiting response Status - Awaiting response from author label Mar 11, 2024

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Mar 11, 2024

SuryanarayanaY mentioned this issue Mar 12, 2024

once gain: tf.2.16.1 fails to recognize GPUs keras-team/keras#19276

Closed

JuanVargas closed this as completed Mar 12, 2024

sirius422 mentioned this issue Jun 2, 2024

Attempting to register factory for plugin cuDNN/cuFFT/cuBLAS on Linux install bmaltais/kohya_ss#2263

Open

This was referenced Jun 12, 2024

tensorflow is so buggy, you guys should just gave up and should migrate to TORCH this is so bad , i cant anymore. #69586

Closed

Tensorflow not detecting GPU #64881

Closed

TensorFlow 2.16 / Keras 3 have undocumented breaking API changes #63792

Open

eabase mentioned this issue Jun 14, 2024

What is preventing TF to use GPU when used in native windows? #69750

Open

TF 2.16.1 Fails to work with GPUs #63362

TF 2.16.1 Fails to work with GPUs #63362

Comments

JuanVargas commented Mar 10, 2024

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output

sh-shahrokhi commented Mar 10, 2024 • edited

damadorPL commented Mar 10, 2024

DiegoMont commented Mar 10, 2024

AlpriElse commented Mar 11, 2024

Gwyki commented Mar 11, 2024 • edited

SuryanarayanaY commented Mar 11, 2024

damadorPL commented Mar 11, 2024 • edited

damadorPL commented Mar 11, 2024

damadorPL commented Mar 11, 2024 • edited

JuanVargas commented Mar 11, 2024 via email

sh-shahrokhi commented Mar 11, 2024 via email • edited

JuanVargas commented Mar 11, 2024 via email

JuanVargas commented Mar 11, 2024 via email

sh-shahrokhi commented Mar 11, 2024 via email

damadorPL commented Mar 11, 2024 • edited

Gwyki commented Mar 11, 2024

sh-shahrokhi commented Mar 11, 2024 • edited

Gwyki commented Mar 12, 2024

moozoo64 commented Mar 12, 2024

mihaimaruseac commented Mar 12, 2024 • edited

google-ml-butler bot commented Mar 12, 2024

JuanVargas commented Mar 12, 2024

sgkouzias commented May 31, 2024

thephet commented Jun 3, 2024

MrOxMasTer commented Jun 6, 2024

rkuo2000 commented Jun 7, 2024

sgkouzias commented Jun 7, 2024 • edited

MrOxMasTer commented Jun 7, 2024

MrOxMasTer commented Jun 7, 2024 • edited

sgkouzias commented Jun 7, 2024 • edited

GorillaDaddy commented Jun 8, 2024

sgkouzias commented Jun 8, 2024 • edited

MrOxMasTer commented Jun 8, 2024

sgkouzias commented Jun 8, 2024 • edited

rednag commented Jun 10, 2024 • edited

sgkouzias commented Jun 10, 2024 • edited

rednag commented Jun 10, 2024

sgkouzias commented Jun 10, 2024 • edited

eabase commented Jun 14, 2024

sh-shahrokhi commented Jun 14, 2024 • edited

mihaimaruseac commented Jun 14, 2024

eabase commented Jun 15, 2024

sh-shahrokhi commented Jun 15, 2024 • edited

ben-jy commented Jun 17, 2024

sgkouzias commented Jun 17, 2024

sh-shahrokhi commented Jun 17, 2024 via email

ben-jy commented Jun 18, 2024 • edited

sh-shahrokhi commented Mar 10, 2024 •

edited

Gwyki commented Mar 11, 2024 •

edited

damadorPL commented Mar 11, 2024 •

edited

damadorPL commented Mar 11, 2024 •

edited

sh-shahrokhi commented Mar 11, 2024 via email •

edited

damadorPL commented Mar 11, 2024 •

edited

sh-shahrokhi commented Mar 11, 2024 •

edited

mihaimaruseac commented Mar 12, 2024 •

edited

sgkouzias commented Jun 7, 2024 •

edited

MrOxMasTer commented Jun 7, 2024 •

edited

sgkouzias commented Jun 7, 2024 •

edited

sgkouzias commented Jun 8, 2024 •

edited

sgkouzias commented Jun 8, 2024 •

edited

rednag commented Jun 10, 2024 •

edited

sgkouzias commented Jun 10, 2024 •

edited

sgkouzias commented Jun 10, 2024 •

edited

sh-shahrokhi commented Jun 14, 2024 •

edited

sh-shahrokhi commented Jun 15, 2024 •

edited

ben-jy commented Jun 18, 2024 •

edited