-
Notifications
You must be signed in to change notification settings - Fork 74.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"CUDA_ERROR_NOT_FOUND: named symbol not found" in Docker container #68711
Comments
|
I have the same issue: #68710 |
|
Please make sure that you have installed the Nvidia container toolkit before. Your docker container is not able to find the relevant CUDA, hence the error. Here's the documention: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html |
|
I have tried that already and it did not make a difference. AFAIK it's only necessary when running Docker on a native Linux host. In my configuration with WSL2 it should already be covered by Docker Desktop. |
|
**I'm getting the same issue, within the docker image tensorflow/tensorflow:latest-gpu-jupyter (2.16.1) I am getting... ./__nvcc_device_query./__nvcc_device_query failed to call cudaLoader::cuInit(0) with error 0x1f4 (CUDA_ERROR_NOT_FOUND) ./nvcc --versionnvcc: NVIDIA (R) Cuda compiler driver nvidia-smiTue May 28 10:03:54 2024 +-----------------------------------------------------------------------------------------+ |
|
I'm getting the same issue with tensorflow/tensorflow:2.13.0-gpu-jupyter .... |
|
I can install and run CUDA samples in WSL....(Ubuntu 22.04) by installing the CUDA tool kit 12.5 Device 0: "NVIDIA GeForce RTX 3070" with Compute 8.6 capability [2, 0]: Value is:10 |
|
I have found a solution and shared it here: #68710 |
|
The workaround is working. Since this issue is pretty much a duplicate of the other one I will close it now. The other one should be left open until the latest drivers is working again. |
|
Nvidia 555.85 not working. Cuda 12.5 buggy Use 551.86 game ready driver and Cuda 12.4 with docker tensorflow/tensorflow:latest-gpu |
the solution you provided it worked for me even NVidia has stated that they have fixed this specific issue in thier nvidia driver download website but seems like its not solved yet, i was on latest version as 555.85 i downgraded it to 551.86 and everything worked fine thnak you. |
I've been running v552.22 for some time now and I don't like being so out of date. Can you, @pyjsql or anyone else, give any more details? You say that CUDA v12.5 is buggy, but the latest Nvidia driver v561.09 includes CUDA v12.6 and I still have this issue under that version. In fact, I just tested every Nvidia driver v552.22-v561.09, and v552.44 is the most recent that works. Are there users using Nvidia CUDA under WSL with v561.09 successfully? What's the causal factor here? Is it the age of the GPU hardware? The WSL kernel version? What's the right place to follow this issue and find out when it's safe to upgrade the Nvidia driver? I've done a ton of googling and your "Cuda 12.5 buggy" is the most complete explanation I've found. ;-) |
|
Thanks for all of you guys ! |
Issue type
Bug
Have you reproduced the bug with TensorFlow Nightly?
Yes
Source
binary
TensorFlow version
2.16.1-0-g5bc9d26649c
Custom code
No
OS platform and distribution
Windows 11 Pro + WSL2 Ubuntu 22.04
Mobile device
No response
Python version
3.11.0rc1
Bazel version
No response
GCC/compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
RTX 4080
Current behavior?
I am using Docker Desktop with WSL2 on Windows 11 Pro to run the
tensorflow-gpu:latestDocker container. After starting it no GPU is found and the errorfailed call to cuInit: CUDA_ERROR_NOT_FOUND: named symbol not foundoccurs. CUDA, cuDNN and Python versions are all dictated by the Docker image so I cannot change them.Things I have tried without success:
Reinstalling the drivers is the only thing that had any effect at all. It changed the error from CUDA_ERROR_NO_DEVICE to CUDA_ERROR_NOT_FOUND.
I have found jax-ml/jax#13570 with a similar error but it's fix doesn't apply here. RTX 4000 cards require CUDA >=11.8 which I am using.
Standalone code to reproduce the issue
docker run --rm --gpus all -it tensorflow/tensorflow:latest-gpu bash python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"Relevant log output
The text was updated successfully, but these errors were encountered: