cuDNN, cuFFT, and cuBLAS Errors #62075

Ke293-x2Ek-Qe-7-aE-B · 2023-10-09T18:57:10Z

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

source

TensorFlow version

GIT_VERSION:v2.14.0-rc1-21-g4dacf3f368e VERSION:2.14.0

Custom code

No

OS platform and distribution

WSL2 Linux Ubuntu 22

Mobile device

No response

Python version

3.10, but I can try different versions

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

CUDA version: 11.8, cuDNN version: 8.7

GPU model and memory

NVIDIA Geforce GTX 1660 Ti, 8GB Memory

Current behavior?

When I run the GPU test from the TensorFlow install instructions, I get several errors and warnings.
I don't care about the NUMA stuff, but the first 3 errors are that TensorFlow was not able to load cuDNN. I would really like to be able to use it to speed up training some RNNs and FFNNs. I do get my GPU in the list of physical devices, so I can still train, but not as fast as with cuDNN.

Standalone code to reproduce the issue

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Relevant log output

2023-10-09 13:36:23.355516: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-10-09 13:36:23.355674: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-10-09 13:36:23.355933: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-10-09 13:36:23.413225: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-09 13:36:25.872586: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:880] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-10-09 13:36:25.916952: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:880] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-10-09 13:36:25.917025: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:880] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

SuryanarayanaY · 2023-10-10T09:38:02Z

Hi @Ke293-x2Ek-Qe-7-aE-B ,

Starting from TF2.14 tensorflow provides CUDA package which can install all the cuDNN,cuFFT and cubLas libraries.

You can use pip install tensorflow[and-cuda] command for that.

Please try this command let us know if it helps. Thankyou!

Ke293-x2Ek-Qe-7-aE-B · 2023-10-10T13:52:17Z

@SuryanarayanaY I did not know that it now came bundled with cuDNN. I installed tensorflow with the [and-cuda] part, though, but I also installed cuda toolkit and cuDNN separately. I will try just installing the cuda toolkit and then installing tensorflow[and-cuda].
Also, is there a way to install tensorflow for GPU without it coming with cuDNN? If I just pip install tensorflow, will that install with GPU support, just without cuDNN, so that I can manually install them? I don't really need to, but I am curious if it can be installed that way too.

Ke293-x2Ek-Qe-7-aE-B · 2023-10-10T23:28:45Z

@SuryanarayanaY I tried several times, reinstalling Ubuntu, but it still doesn't work.

AthiemoneZero · 2023-10-11T10:26:01Z

I also have the same issue, and this seems not to be due to cuda environment as I rebulid cuda and cudnn to make them suit for tf-2.14.0.

This is log out I find:
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

2023-10-11 18:21:57.387396: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2023-10-11 18:21:57.415774: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2023-10-11 18:21:57.415847: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2023-10-11 18:21:57.415877: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-10-11 18:21:57.421400: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-10-11 18:21:58.155058: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2023-10-11 18:21:59.113217: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:880] could not open file to read NUMA node: /sys/bus/pci/devices/0000:65:00.0/numa_node Your kernel may have been built without NUMA support. 2023-10-11 18:21:59.152044: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:880] could not open file to read NUMA node: /sys/bus/pci/devices/0000:65:00.0/numa_node Your kernel may have been built without NUMA support. 2023-10-11 18:21:59.152153: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:880] could not open file to read NUMA node: /sys/bus/pci/devices/0000:65:00.0/numa_node Your kernel may have been built without NUMA support. [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Ke293-x2Ek-Qe-7-aE-B · 2023-10-11T14:05:22Z

@AthiemoneZero Because it still does output a GPU device at the bottom of the log, I am training on GPU, just without cuDNN. It will be slower, but it is better than nothing or training on CPU.

AthiemoneZero · 2023-10-11T14:31:34Z

@AthiemoneZero Because it still does output a GPU device at the bottom of the log, I am training on GPU, just without cuDNN. It will be slower, but it is better than nothing or training on CPU.

Yeah. But I just found that when I downgrade to 2.13.0 version, errors in register won't appear again. It looks like this:

(TF) ephys3@ZhouLab-Ephy3:~$ python3 -c "import tensorrt as trt;import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

2023-10-11 20:39:12.097457: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-10-11 20:39:12.130250: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-11 20:39:13.856721: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] could not open file to read NUMA node: /sys/bus/pci/devices/0000:65:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-10-11 20:39:13.870767: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] could not open file to read NUMA node: /sys/bus/pci/devices/0000:65:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-10-11 20:39:13.870941: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] could not open file to read NUMA node: /sys/bus/pci/devices/0000:65:00.0/numa_node
Your kernel may have been built without NUMA support.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Although I haven't figured out how to solve NUMA node error, I found some clues from another issue (as I operated all above in WSL Ubuntu). This bug seems not to be significant as explaination from NVIDIA forums . So I guess errors in register might have something with the latest version and errors in NUMA might be caused by OS enviroment. Hope this information would help some guys.

Ke293-x2Ek-Qe-7-aE-B · 2023-10-11T14:37:43Z

@AthiemoneZero I tried downgrading as well, but it didn't work for me. The NUMA errors are (as stated in the error message) because the kernel provided by Microsoft for WSL2 is not built with NUMA support. I tried cloning the repo (here) and building from source my own with NUMA support, but that didn't work, so I am just ignoring those errors for now.

AthiemoneZero · 2023-10-11T14:46:50Z

@Ke293-x2Ek-Qe-7-aE-B I rebuilt all in an independent conda environment as TF. My steps were to create a TF env with python 3.9.8 and tried python3 -m pip install tensorflow[and-cuda] --user according to instruction. Following these I tried python3 -m pip install tensorflow[and-cuda]=2.13.0 --user and found it solved some bug.

Ke293-x2Ek-Qe-7-aE-B · 2023-10-11T14:49:35Z

@AthiemoneZero Thanks for the instructions. I'll try and see if it works on my system. I have been using python 3.10, so maybe that's why it didn't work. Did you have to install the CUDA toolkit?

AthiemoneZero · 2023-10-11T14:51:41Z

@Ke293-x2Ek-Qe-7-aE-B I didnt execute conda install cuda-toolkit here. I guess [and-cuda] argument help me install some dependencies.

AthiemoneZero · 2023-10-11T14:53:23Z

But I did double check version of cuda and cudnn. For this I even downgrade them again and again.

Ke293-x2Ek-Qe-7-aE-B · 2023-10-11T14:58:10Z

@AthiemoneZero Usually, I would install the CUDA toolkit according to these instructions (here), then install cuDNN according to these instructions (here). I installed CUDA toolkit version 11.8 and cuDNN version 8.7, because they are the latest supported by TensorFlow, according to their support table here. I guess using [and-cuda] installs all of that for you.

AthiemoneZero · 2023-10-11T15:04:45Z

@Ke293-x2Ek-Qe-7-aE-B Apologize for my misunderstanding. I did the same in installing cuda toolkit as what you described above before I went directly to debug tf_gpu. I made sure my gpu and cuda could perform well as I have tried another task smoothly using cuda but without tf. What I concerned is some dependencies of tf have to be pre-installed in a conda env and this might be treated by [and-cuda] (my naive guess

Ke293-x2Ek-Qe-7-aE-B · 2023-10-11T15:09:42Z

@AthiemoneZero I always install CUDA toolkit and cuDNN globally for the whole system, and then install TensorFlow in a miniconda environment. This doesn't work anymore with the newest versions of TensorFlow, so I'll try your instructions. It does make sense to install everything in a conda env, I just hadn't thought of that since my other method had worked in the past. Thanks for sharing what you did to make it work.

AthiemoneZero · 2023-10-11T15:15:15Z

@Ke293-x2Ek-Qe-7-aE-B You're welcomed. BTW, I also followed the instruction to configure development including suitable version of bazel and clang-16, just before all my operation digging into conda env.

Ke293-x2Ek-Qe-7-aE-B · 2023-10-11T15:15:47Z

@AthiemoneZero Thanks, but it didn't work.

FaisalAlj · 2023-10-17T09:17:26Z

Hello,

I'm experiencing the same issue, even though I meticulously followed all the instructions for setting up CUDA 11.8 and CuDNN 8.7. The error messages I'm encountering are as follows:

Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered.
Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered.
Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered.

I've tried this with different versions of Python. Surprisingly, when I used Python 3.11, TensorFlow 2.13 was installed without these errors. However, when I used Python 3.10 or 3.9, I ended up with TensorFlow 2.14 and the aforementioned errors.

I've come across information suggesting that I may not need to manually install CUDA and CuDNN, as [and-cuda] should handle the installation of these components automatically.

Could someone please guide me on the correct approach to resolve this issue? I've tried various methods, but unfortunately, none of them have yielded a working solution.

P.S. I'm using conda in WSL 2 on Windows 11.

nkinnaird · 2023-10-17T15:55:59Z

I am having the same issue as FaisalAlj above, on Windows 10 with the same versions of CUDA and CuDNN. The package tensorflow[and-cuda] is not found by pip. I've tried different versions of python and tensorflow without success. In my case I'm using virtualenv rather than conda.

Edit 1:
I appear to be able to install tensorflow[and-cuda] as long as I use quotes around the package, like: pip install "tensorflow[and-cuda]".

Edit 2:
I still appear to be getting these messages however, so I'm not sure I've installed things correctly.

SuryanarayanaY · 2023-10-18T09:17:14Z

Hi @Ke293-x2Ek-Qe-7-aE-B ,

I have checked the installation on colab(linx environment) and observed same logs as per attached gist.

These logs seems generated from XLA compiler but GPU is able to detectable. Similar issue #62002 and already bought to Engineering team attention.

CC: @learning-to-play

qnlzgl · 2024-02-15T11:03:50Z

Same error here I tried Cuda 12 and Cuda 11.8 (WSL2 in Ubuntu). All of them have this issue.

NOORLEICESTER · 2024-02-15T11:09:36Z

Thank you for your comment @qnlzgl . I have attempted to fix the issue in various ways, but none have proven successful for me.

qnlzgl · 2024-02-15T18:30:59Z

Thank you for your comment @qnlzgl . I have attempted to fix the issue in various ways, but none have proven successful for me.

I feel it's okay to leave the errors as it is, I have this error while importing tensorflow, but still could use GPUs like normal.

NOORLEICESTER · 2024-02-15T21:30:27Z

But i am getting an error in keras because of the mentioned error @qnlzgl

SomeUserName1 · 2024-02-19T12:09:43Z

Occurs with tensorflow[and-cuda]==2.15.0.post0
Tried building from source and installing via pip & conda.

$ python
Python 3.11.7 (main, Jan 29 2024, 16:03:57) [GCC 13.2.1 20230801] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2024-02-19 12:55:16.996299: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-02-19 12:55:17.017067: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-19 12:55:17.017083: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-19 12:55:17.017652: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-19 12:55:17.020872: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
>>> print(tf.config.list_physical_devices('GPU'))
2024-02-19 12:56:09.845769: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-02-19 12:56:09.870239: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-02-19 12:56:09.870422: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
>>>

On tf-nightly[and-cuda] this doesn't occur anymore.

$ python
Python 3.11.7 (main, Jan 29 2024, 16:03:57) [GCC 13.2.1 20230801] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tensorflow.python.platform import build_info as tf_build_info
2024-02-19 13:04:56.451101: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-02-19 13:04:56.474818: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
>>> print("cudnn_version",tf_build_info.build_info['cudnn_version'])
cudnn_version 8
>>> 
>>> print("cuda_version",tf_build_info.build_info['cuda_version'])
cuda_version 12.3
>>> print(tf.config.list_physical_devices('GPU'))
2024-02-19 13:13:48.002965: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-02-19 13:13:48.006970: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-02-19 13:13:48.007068: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

The NUMA thing persists. Also on Linux with NUMA configured.

System information:
(Arch Linux + Zen kernel patches + KDE Plasma)

$ uname -a
Linux somelegion 6.7.4-zen1-1-zen #1 ZEN SMP PREEMPT_DYNAMIC Mon, 05 Feb 2024 22:07:37 +0000 x86_64 GNU/Linux

$ nvidia-smi
Mon Feb 19 13:09:28 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090 ...    On  | 00000000:01:00.0  On |                  N/A |
| N/A   40C    P5              18W /  95W |   1414MiB / 16376MiB |     18%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

$ zgrep CONFIG_NUMA /proc/config.gz                                                                                                                                                                                             
CONFIG_NUMA_BALANCING=y
CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
CONFIG_NUMA=y
# CONFIG_NUMA_EMU is not set
CONFIG_NUMA_KEEP_MEMINFO=y

sudo lspci -vvv -s 01:00.0 
01:00.0 VGA compatible controller: NVIDIA Corporation GN21-X11 (rev a1) (prog-if 00 [VGA controller])
	Subsystem: Lenovo GN21-X11
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 132
	IOMMU group: 16
	Region 0: Memory at 85000000 (32-bit, non-prefetchable) [size=16M]
	Region 1: Memory at 4000000000 (64-bit, prefetchable) [size=16G]
	Region 3: Memory at 4400000000 (64-bit, prefetchable) [size=32M]
	Region 5: I/O ports at 6000 [size=128]
	Expansion ROM at 86080000 [virtual] [disabled] [size=512K]
	Capabilities: [60] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee003d8  Data: 0000
	Capabilities: [78] Express (v2) Legacy Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 16GT/s, Width x16, ASPM L1, Exit Latency L1 <16us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s (downgraded), Width x16
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range AB, TimeoutDis+ NROPrPrP- LTR+
			 10BitTagComp+ 10BitTagReq+ OBFF Via message, ExtFmt- EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
			 AtomicOpsCtl: ReqEn-
		LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
		LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
			 EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
			 Retimer- 2Retimers- CrosslinkRes: unsupported
	Capabilities: [b4] Vendor Specific Information: Len=14 <?>
	Capabilities: [100 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending- InProgress-
	Capabilities: [250 v1] Latency Tolerance Reporting
		Max snoop latency: 34326183936ns
		Max no snoop latency: 34326183936ns
	Capabilities: [258 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=255us PortTPowerOnTime=10us
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
			   T_CommonMode=0us LTR1.2_Threshold=281600ns
		L1SubCtl2: T_PwrOn=10us
	Capabilities: [128 v1] Power Budgeting <?>
	Capabilities: [420 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Capabilities: [900 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn- PerformEqu-
		LaneErrStat: 0
	Capabilities: [bb0 v1] Physical Resizable BAR
		BAR 0: current size: 16MB, supported: 16MB
		BAR 1: current size: 16GB, supported: 64MB 128MB 256MB 512MB 1GB 2GB 4GB 8GB 16GB
		BAR 3: current size: 32MB, supported: 32MB
	Capabilities: [c1c v1] Physical Layer 16.0 GT/s <?>
	Capabilities: [d00 v1] Lane Margining at the Receiver <?>
	Capabilities: [e00 v1] Data Link Feature <?>
	Kernel driver in use: nvidia
	Kernel modules: nouveau, nvidia_drm, **nvidia**

Might be from the NVidia side if one belives the linked doc:

What: /sys/bus/pci/devices/.../numa_node
Date: Oct 2014
Contact: Prarit Bhargava prarit@redhat.com
Description:
This file contains the NUMA node to which the PCI device is
attached, or -1 if the node is unknown. The initial value
comes from an ACPI _PXM method or a similar firmware
source. If that is missing or incorrect, this file can be
written to override the node. In that case, please report
a firmware bug to the system vendor. Writing to this file
taints the kernel with TAINT_FIRMWARE_WORKAROUND, which
reduces the supportability of your system.

Nephalen · 2024-02-20T06:12:13Z

Hi. I'm having the exact same 3 errors after updating tf 2.11 to 2.14. In my case, tensorflow-gpu is installed from conda-forge channel. Cuda library is installed through cudatoolkit-dev from conda-forge as well. tf 2.11 didn't show such errors.

It does not prevent the GPU usage. But I observed around 10%-15% slowdown on model prediction speed in tf 2.14 comparing to the exact same code in tf 2.11.

Could this be related to those errors? I'm kind of getting mixed messages from the overall discussion.

ManzarIMalik · 2024-02-20T20:07:33Z

Same error on Ubuntu 22.04 LTS Install in WSL2 / Windows 11. Has anyone found solution to this?

SomeUserName1 · 2024-02-21T18:59:58Z

`pip uninstall tensorflow && pip install tf-nightly[and-cuda]`

ManzarIMalik · 2024-02-22T04:39:18Z

pip uninstall tensorflow && pip install tf-nightly[and-cuda]

This is not working either.

Radiated-Coder · 2024-02-24T23:24:45Z

Hi Guys,

Seems like I'm late to the party. I am running TF on WSL2. Guess what, I have completely messed up my set up which was running great on TF 2.10 configuration.
I Upgraded every thing to a stable configuration as mentioned by NVIDIA, PF below. Only to find out Microsoft killed NUMA.

TensorFlow : 2.15
Cuda kit : 12.3
CuDNN : 8.9
VS : Comm 2022
Python : 3.10
numactl : 2.0.14-3Ubuntu2 (force installed but no use)
UBUNTU : 24
CPU : Ryzen 7 5600
Mem : 32 Gigs
GPU : GeF RTX 4070 16 Gigs

Any help would be appreciated.
TY. I too will Update her If I find a fix.

NOORLEICESTER · 2024-02-25T01:20:45Z

@SomeUserName1
Unfortunately, I got the error below when I tried to run the pip install tf-nightly[and-cuda]
RROR: Could not find a version that satisfies the requirement tf-nightly[and-cuda] (from versions: none)
ERROR: No matching distribution found for tf-nightly[and-cuda]

SomeUserName1 · 2024-02-25T10:15:14Z

Try quoting the package name

pip install 'tf-nightly[and-cuda]'

NOORLEICESTER · 2024-02-25T12:02:34Z

Please any comment to figure out this error?
ImportError: cannot import name 'cast' from partially initialized module 'keras.src.backend' (most likely due to a circular import)

SomePersonSomeWhereInTheWorld · 2024-02-26T18:27:53Z

Python 3.9 same issue:

Python 3.9.18 (main, Jan  4 2024, 00:00:00) 
[GCC 11.4.1 20230605 (Red Hat 11.4.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2024-02-26 13:24:06.607339: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-02-26 13:24:06.609051: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-26 13:24:06.645912: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-26 13:24:06.646284: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
>>> print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
2024-02-26 13:24:09.084670: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1960] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
Num GPUs Available:  0

pip show tensorflow
Name: tensorflow
Version: 2.13.0

pip show tf-nightly
Name: tf-nightly
Version: 2.16.0

SomeUserName1 · 2024-02-28T14:08:43Z

@NOORLEICESTER without further code it's difficult to say what's going on. I'd guess from the error that you actually have a circular import, so try to check your imports.

@SomePersonSomeWhereInTheWorld
Uninstall all tf versions but nightly to avoid version mismatches and the like.
uninstall keras as tf-nightly also installs keras-nightly.
You may want to remove the previous env and start with a clean slate.
Could not find cuda drivers on your machine, GPU will not be used.
Did you install the Nvidia driver (via apt/rpm/pacman) and cuda on your system?

NOORLEICESTER · 2024-03-01T00:59:35Z

@ManzarIMalik not worked as well.

NOORLEICESTER · 2024-03-01T01:00:36Z

@SomeUserName1 not worked as well.

arvindbs2014 · 2024-03-01T02:51:11Z

@AthiemoneZero Because it still does output a GPU device at the bottom of the log, I am training on GPU, just without cuDNN. It will be slower, but it is better than nothing or training on CPU.

Yeah. But I just found that when I downgrade to 2.13.0 version, errors in register won't appear again. It looks like this:

(TF) ephys3@ZhouLab-Ephy3:~$ python3 -c "import tensorrt as trt;import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2023-10-11 20:39:12.097457: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-10-11 20:39:12.130250: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-11 20:39:13.856721: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] could not open file to read NUMA node: /sys/bus/pci/devices/0000:65:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-10-11 20:39:13.870767: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] could not open file to read NUMA node: /sys/bus/pci/devices/0000:65:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-10-11 20:39:13.870941: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] could not open file to read NUMA node: /sys/bus/pci/devices/0000:65:00.0/numa_node
Your kernel may have been built without NUMA support.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Although I haven't figured out how to solve NUMA node error, I found some clues from another issue (as I operated all above in WSL Ubuntu). This bug seems not to be significant as explaination from NVIDIA forums . So I guess errors in register might have something with the latest version and errors in NUMA might be caused by OS enviroment. Hope this information would help some guys.

NUMA non zero problem can be solved this way

Check Nodes
lspci | grep -i nvidia

01:00.0 VGA compatible controller: NVIDIA Corporation TU106 [GeForce RTX 2060 12GB] (rev a1)
01:00.1 Audio device: NVIDIA Corporation TU106 High Definition Audio Controller (rev a1)
The first line shows the address of the VGA-compatible device, NVIDIA Geforce, as 01:00 . Each one will be different, so let’s change this part carefully.
2. Check and change NUMA setting values
If you run ls with this path /sys/bus/pci/devicecs/, you can see the following list:
ls /sys/bus/pci/devices/

0000:00:00.0 0000:00:06.0 0000:00:15.0 0000:00:1c.0 0000:00:1f.3 0000:00:1f.6 0000:02:00.0
0000:00:01.0 0000:00:14.0 0000:00:16.0 0000:00:1d.0 0000:00:1f.4 0000:01:00.0
0000:00:02.0 0000:00:14.2 0000:00:17.0 0000:00:1f.0 0000:00:1f.5 0000:01:00.1
01:00.0 checked above is visible. However, 0000: is attached in front.
3. Check if it is connected.
cat /sys/bus/pci/devices/0000:01:00.0/numa_node

-1

1 means no connection, and 0 means connected.
4. Fix it with the command below.
sudo echo 0 | sudo tee -a /sys/bus/pci/devices/0000:01:00.0/numa_node

0

wangkuiyi · 2024-03-03T22:12:02Z

It's been five months, yet the problem remains.

maulberto3 · 2024-03-03T22:16:07Z

It's been five months, yet the problem remains.

You are right, what a shame, I gave up and went to Rust.

…nd wsl - "environment-win" holds is a working env with Tensorflow 2.15 but only with CPU support (as GPU on bare Windows is not supported anymore) - "environment-wsl" holds a working env for WSL Ubuntu with Tensorflow 2.13 with GPU support (as installing 2.15 with "tensorflow[and-cuda]" on WSL has issues with registering cuDNN, cuFFT, cuBLAS and the GPU is sometimes not being found - tensorflow/tensorflow#62075) - a 2.13 trained model can still be used on a Windows machine with Tensorflow 2.15 (only when you save the model as a .h5 file instead of .keras)

…nd wsl - "environment-win" holds a working env with Tensorflow 2.15 but only with CPU support (as GPU on bare Windows is not supported anymore) - "environment-wsl" holds a working env for WSL Ubuntu with Tensorflow 2.13 with GPU support (as installing 2.15 with "tensorflow[and-cuda]" on WSL has issues with registering cuDNN, cuFFT, cuBLAS and the GPU is sometimes not being found - tensorflow/tensorflow#62075) - a 2.13 trained model can still be used on a Windows machine with Tensorflow 2.15 (only when you save the model as a .h5 file instead of .keras)

Hato1 · 2024-03-21T06:03:53Z

When 2.15.X didn't work, tensorflow 2.16.1 (without CUDA) solved this issue for me. Python3.10, CUDA driver 12.2, Cuda Toolkit 12.1, cuDNN 8.9.5.

ManzarIMalik · 2024-03-21T14:15:35Z

It finally worked, with Tensorflow 2.16.1 (upgrade to lastest) > pip install --upgrade tensorflow

bojle · 2024-03-22T05:16:40Z

Can confirm it works with 2.16.1. For those who have to resort to using 2.9.0 workaround (some of my packages are limited to 2.15), use python <= 3.10 to install
it.

jeb2112 · 2024-04-06T19:46:58Z

Can confirm that these error messages probably don't matter. I have nvidia GEforce 1650 and I had working tensorflow (2.15, cuda 12.2) and pytorch env's. But, the pytorch env was for some code that was stuck at python3.6, which I could no longer debug in vs code. So, I created a new pytorch env at cuda-12.2 and torch 2.2.0. That wouldn't detect the GPU so I backed off and went with cuda-11.8 and torch 2.0.0 but it still wouldn't detect the GPU. But, the broken pytorch attempt also broke the working tensorflow env, so it started giving me the above messages, and wouldn't detect the GPU at all (ie using the print physical devices thing). On a whim, I rebooted, after which both the new pytorch and tensorflow env's are back to detecting the GPU, but I still have those messages, but only in tensorflow.

mikechen66 · 2024-05-03T09:12:03Z

Hi @Ke293-x2Ek-Qe-7-aE-B ,

Starting from TF2.14 tensorflow provides CUDA package which can install all the cuDNN,cuFFT and cubLas libraries.

You can use pip install tensorflow[and-cuda] command for that.

Please try this command let us know if it helps. Thankyou!

I chech the installation. The prerequisite of successful installing TensorFlow 2.14 ~2.16 is users need to install Nvdia Linux Driver, CUDA Toolkit and cuDNN in the original Linux environment and then set the environment in bashrc, not in the base environment of Anaconda/Miniconda.

google-ml-butler bot added the type:bug Bug label Oct 9, 2023

google-ml-butler bot assigned Varsha-anjanappa Oct 9, 2023

Varsha-anjanappa assigned SuryanarayanaY and unassigned Varsha-anjanappa Oct 10, 2023

SuryanarayanaY added type:build/install Build and install issues comp:gpu GPU related issues labels Oct 10, 2023

SuryanarayanaY added TF2.14 For issues related to Tensorflow 2.14.x stat:awaiting response Status - Awaiting response from author labels Oct 10, 2023

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Oct 10, 2023

SuryanarayanaY added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Oct 18, 2023

FaisalAlj mentioned this issue Oct 18, 2023

WSL2 tensorflow install not working #62095

Closed

SuryanarayanaY mentioned this issue Feb 19, 2024

v2.15.0 docker image print error messages like "Unable to register cuDNN/cuFFT... factory " #62987

Open

Korred mentioned this issue Mar 5, 2024

feat: Add custom data generator Korred/unet-pp#6

Merged

vpratz mentioned this issue Apr 19, 2024

bayesflow breaks existing tensorflow installation stefanradev93/BayesFlow#162

Open

cuDNN, cuFFT, and cuBLAS Errors #62075

cuDNN, cuFFT, and cuBLAS Errors #62075

Comments

Ke293-x2Ek-Qe-7-aE-B commented Oct 9, 2023

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output

SuryanarayanaY commented Oct 10, 2023

Ke293-x2Ek-Qe-7-aE-B commented Oct 10, 2023 • edited

Ke293-x2Ek-Qe-7-aE-B commented Oct 10, 2023

AthiemoneZero commented Oct 11, 2023

Ke293-x2Ek-Qe-7-aE-B commented Oct 11, 2023

AthiemoneZero commented Oct 11, 2023 • edited

Ke293-x2Ek-Qe-7-aE-B commented Oct 11, 2023

AthiemoneZero commented Oct 11, 2023 • edited

Ke293-x2Ek-Qe-7-aE-B commented Oct 11, 2023

AthiemoneZero commented Oct 11, 2023 • edited

AthiemoneZero commented Oct 11, 2023

Ke293-x2Ek-Qe-7-aE-B commented Oct 11, 2023

AthiemoneZero commented Oct 11, 2023 • edited

Ke293-x2Ek-Qe-7-aE-B commented Oct 11, 2023

AthiemoneZero commented Oct 11, 2023 • edited

Ke293-x2Ek-Qe-7-aE-B commented Oct 11, 2023 • edited

FaisalAlj commented Oct 17, 2023

nkinnaird commented Oct 17, 2023 • edited

SuryanarayanaY commented Oct 18, 2023

qnlzgl commented Feb 15, 2024

NOORLEICESTER commented Feb 15, 2024 • edited

qnlzgl commented Feb 15, 2024

NOORLEICESTER commented Feb 15, 2024

SomeUserName1 commented Feb 19, 2024 • edited

Nephalen commented Feb 20, 2024

ManzarIMalik commented Feb 20, 2024

SomeUserName1 commented Feb 21, 2024 • edited

pip uninstall tensorflow && pip install tf-nightly[and-cuda]

ManzarIMalik commented Feb 22, 2024

pip uninstall tensorflow && pip install tf-nightly[and-cuda]

Radiated-Coder commented Feb 24, 2024

NOORLEICESTER commented Feb 25, 2024

SomeUserName1 commented Feb 25, 2024

NOORLEICESTER commented Feb 25, 2024

SomePersonSomeWhereInTheWorld commented Feb 26, 2024

SomeUserName1 commented Feb 28, 2024

NOORLEICESTER commented Mar 1, 2024

NOORLEICESTER commented Mar 1, 2024

arvindbs2014 commented Mar 1, 2024

wangkuiyi commented Mar 3, 2024

maulberto3 commented Mar 3, 2024

Hato1 commented Mar 21, 2024

ManzarIMalik commented Mar 21, 2024

bojle commented Mar 22, 2024

jeb2112 commented Apr 6, 2024 • edited

mikechen66 commented May 3, 2024

Ke293-x2Ek-Qe-7-aE-B commented Oct 10, 2023 •

edited

AthiemoneZero commented Oct 11, 2023 •

edited

AthiemoneZero commented Oct 11, 2023 •

edited

AthiemoneZero commented Oct 11, 2023 •

edited

AthiemoneZero commented Oct 11, 2023 •

edited

AthiemoneZero commented Oct 11, 2023 •

edited

Ke293-x2Ek-Qe-7-aE-B commented Oct 11, 2023 •

edited

nkinnaird commented Oct 17, 2023 •

edited

NOORLEICESTER commented Feb 15, 2024 •

edited

SomeUserName1 commented Feb 19, 2024 •

edited

SomeUserName1 commented Feb 21, 2024 •

edited

`pip uninstall tensorflow && pip install tf-nightly[and-cuda]`

`pip uninstall tensorflow && pip install tf-nightly[and-cuda]`

jeb2112 commented Apr 6, 2024 •

edited