Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"CUDA_ERROR_NOT_FOUND: named symbol not found" in Docker container #68711

Closed
koryphaee opened this issue May 27, 2024 · 13 comments
Closed

"CUDA_ERROR_NOT_FOUND: named symbol not found" in Docker container #68711

koryphaee opened this issue May 27, 2024 · 13 comments
Assignees

Comments

@koryphaee
Copy link

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

binary

TensorFlow version

2.16.1-0-g5bc9d26649c

Custom code

No

OS platform and distribution

Windows 11 Pro + WSL2 Ubuntu 22.04

Mobile device

No response

Python version

3.11.0rc1

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

RTX 4080

Current behavior?

I am using Docker Desktop with WSL2 on Windows 11 Pro to run the tensorflow-gpu:latest Docker container. After starting it no GPU is found and the error failed call to cuInit: CUDA_ERROR_NOT_FOUND: named symbol not found occurs. CUDA, cuDNN and Python versions are all dictated by the Docker image so I cannot change them.

Things I have tried without success:

  • using a different container version (2.16.1, 2.15.0, 2.14.0, 2.13.0)
  • reinstalling WSL2/Ubuntu
  • reinstalling Docker Desktop
  • reinstalling GPU drivers

Reinstalling the drivers is the only thing that had any effect at all. It changed the error from CUDA_ERROR_NO_DEVICE to CUDA_ERROR_NOT_FOUND.

I have found jax-ml/jax#13570 with a similar error but it's fix doesn't apply here. RTX 4000 cards require CUDA >=11.8 which I am using.

Standalone code to reproduce the issue

docker run --rm --gpus all -it tensorflow/tensorflow:latest-gpu bash
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Relevant log output

2024-05-27 18:05:30.149964: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-27 18:05:31.089452: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:282] failed call to cuInit: CUDA_ERROR_NOT_FOUND: named symbol not found
[]
@nikolayDemirev
Copy link

I have the same issue: #68710

@Sukanya41455
Copy link

Please make sure that you have installed the Nvidia container toolkit before. Your docker container is not able to find the relevant CUDA, hence the error. Here's the documention: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

@koryphaee
Copy link
Author

I have tried that already and it did not make a difference. AFAIK it's only necessary when running Docker on a native Linux host. In my configuration with WSL2 it should already be covered by Docker Desktop.
I have found this YouTube tutorial where it works without the container Toolkit too: https://youtu.be/YozfiLI1ogY.

@TitanTomorrow
Copy link

**I'm getting the same issue, within the docker image tensorflow/tensorflow:latest-gpu-jupyter (2.16.1) I am getting...

./__nvcc_device_query

./__nvcc_device_query failed to call cudaLoader::cuInit(0) with error 0x1f4 (CUDA_ERROR_NOT_FOUND)

./nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Nov_22_10:17:15_PST_2023
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_0

nvidia-smi

Tue May 28 10:03:54 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.03 Driver Version: 555.85 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3070 On | 00000000:02:00.0 On | N/A |
| 0% 48C P8 21W / 220W | 637MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+**

@TitanTomorrow
Copy link

I'm getting the same issue with tensorflow/tensorflow:2.13.0-gpu-jupyter ....
E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_NOT_FOUND: named symbol not found

@TitanTomorrow
Copy link

I can install and run CUDA samples in WSL....(Ubuntu 22.04) by installing the CUDA tool kit 12.5

Device 0: "NVIDIA GeForce RTX 3070" with Compute 8.6 capability
printf() is called. Output:

[2, 0]: Value is:10
[2, 1]: Value is:10
[2, 2]: Value is:10
[2, 3]: Value is:10
[2, 4]: Value is:10
[2, 5]: Value is:10
[2, 6]: Value is:10
[2, 7]: Value is:10
[3, 0]: Value is:10

@nikolayDemirev
Copy link

nikolayDemirev commented May 28, 2024

I have found a solution and shared it here: #68710

@koryphaee
Copy link
Author

The workaround is working. Since this issue is pretty much a duplicate of the other one I will close it now. The other one should be left open until the latest drivers is working again.

@koryphaee koryphaee closed this as not planned Won't fix, can't repro, duplicate, stale May 28, 2024
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@pyjsql
Copy link

pyjsql commented Jun 1, 2024

Nvidia 555.85 not working. Cuda 12.5 buggy

Use 551.86 game ready driver and Cuda 12.4 with docker tensorflow/tensorflow:latest-gpu

@RajKrishna2123
Copy link

Nvidia 555.85 not working. Cuda 12.5 buggy

Use 551.86 game ready driver and Cuda 12.4 with docker tensorflow/tensorflow:latest-gpu

the solution you provided it worked for me even NVidia has stated that they have fixed this specific issue in thier nvidia driver download website but seems like its not solved yet, i was on latest version as 555.85 i downgraded it to 551.86 and everything worked fine thnak you.

@rpatterson
Copy link

rpatterson commented Sep 26, 2024

Nvidia 555.85 not working. Cuda 12.5 buggy

Use 551.86 game ready driver and Cuda 12.4 with docker tensorflow/tensorflow:latest-gpu

I've been running v552.22 for some time now and I don't like being so out of date. Can you, @pyjsql or anyone else, give any more details? You say that CUDA v12.5 is buggy, but the latest Nvidia driver v561.09 includes CUDA v12.6 and I still have this issue under that version. In fact, I just tested every Nvidia driver v552.22-v561.09, and v552.44 is the most recent that works. Are there users using Nvidia CUDA under WSL with v561.09 successfully? What's the causal factor here? Is it the age of the GPU hardware? The WSL kernel version? What's the right place to follow this issue and find out when it's safe to upgrade the Nvidia driver? I've done a ton of googling and your "Cuda 12.5 buggy" is the most complete explanation I've found. ;-)

@mauroao
Copy link

mauroao commented Nov 25, 2024

Thanks for all of you guys !
I've updated docker desktop on windows 10 and now it is working !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants