Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorflow gpu #65035

Closed
soheil-asgari opened this issue Apr 4, 2024 · 11 comments
Closed

tensorflow gpu #65035

soheil-asgari opened this issue Apr 4, 2024 · 11 comments
Assignees
Labels
stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues TF 2.16 type:build/install Build and install issues

Comments

@soheil-asgari
Copy link

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

2.16

Custom code

Yes

OS platform and distribution

ubuntu

Mobile device

No response

Python version

3.11

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

nvidia gtx 1650

Current behavior?

I did all the installation steps step by step but still I can't use the GPU and this is really bothering me.

Standalone code to reproduce the issue

2024-04-04 10:56:25.510348: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]

Relevant log output

2024-04-04 10:56:25.510348: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]
@sushreebarsa
Copy link
Contributor

@soheil-asgari Could you verify if you have a compatible Nvidia GPU by running nvidia-smi in your terminal. If there's no output, you might not have an Nvidia GPU or the drivers might not be installed?
Thank you!

@sushreebarsa sushreebarsa added stat:awaiting response Status - Awaiting response from author type:build/install Build and install issues subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues TF 2.16 and removed type:bug Bug labels Apr 5, 2024
@Retalak
Copy link

Retalak commented Apr 5, 2024

It is horrendous how hard it is to get this working on Windows.

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Apr 5, 2024
@alexanderbeatson
Copy link

It is horrendous how hard it is to get this working on Windows.

Hello @Retalak
Please note that Tensorflow 2.10 was the last version that support GPU on Windows. If you want to use Tensorflow-gpu on Windows, you need to install Tensorflow in WSL2.

@alexanderbeatson
Copy link

I had the similar issue with Tensorflow GPU, but the wheel install from this wheel solved the issue.

Make sure your nvidia-smi running properly and your cuda version is compatible with Tensorflow 2.15 (Default 2.16 version is yet to be supported).

You can also search compatible wheel file here!

@zzj0402
Copy link

zzj0402 commented Apr 8, 2024

Python 3.9.19 (main, Mar 21 2024, 17:11:28) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
NVIDIA-SMI 550.54.10              Driver Version: 551.61         CUDA Version: 12.4
Skipping registering GPU devices...
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 6427527073389244107
xla_global_id: -1
]
TF Version
2.16.1

On WSL2 Windows 11 23H2 4090 laptop version. Followed pip install tensorflow[and-cuda] no luck

@Retalak
Copy link

Retalak commented Apr 8, 2024

I eventually got it working. Part of the problem with the documentation IMO is that it does not clearly state what is required on the host vs the guest machine for WSL2. I have the CUDA Toolkit 12.4 installed on my Windows host, not sure if that is necessary but I'm not messing with it now that I got it working. I did not install anything on the guest before running the above install command, it installed the needed versions of the toolkit and CUDNN (I also did a fresh install of my WSL2 - Ubuntu LTS 22.04.3 - to clear anything I had done previously). Another issue is there are a lot of error messages that are considered "normal" and may throw off new users. Here is the test command result on my WSL2 guest with it working (I think):

retalak@**-********:~$ python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2024-04-08 16:07:56.717457: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-08 16:07:57.579692: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-04-08 16:07:58.688620: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-04-08 16:07:58.852701: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-04-08 16:07:58.852827: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

I also had to add this to the end of ~/.bashrc on my WSL2 guest (replace USER_NAME and make sure to exit WSL2 terminal and start a new one after saving and before running the test command):

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/USER_NAME/.local/bin;
export NVIDIA_DIR=$(dirname $(dirname $(python3 -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)")))
export LD_LIBRARY_PATH=$(echo ${NVIDIA_DIR}/*/lib/ | sed -r 's/\s+/:/g')${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Note I am not using miniconda, so this may be different if you are trying to do this in a miniconda environment.

@sushreebarsa
Copy link
Contributor

sushreebarsa commented Apr 10, 2024

@soheil-asgari Could you please let us know any update on this issue?
Thank you!

@sushreebarsa sushreebarsa added the stat:awaiting response Status - Awaiting response from author label Apr 10, 2024
Copy link

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Apr 18, 2024
@svkgn4DL
Copy link

This is still an issue. Documentation not states to set environment variable required for GPU like mentioned by @Retalak. If i remember correctly tensorflow used to have this documented earlier versions.

@google-ml-butler google-ml-butler bot removed stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author labels Apr 19, 2024
@sushreebarsa sushreebarsa added stat:awaiting response Status - Awaiting response from author stale This label marks the issue/pr stale - to be closed automatically if no activity labels Apr 22, 2024
Copy link

This issue was closed because it has been inactive for 7 days since being marked as stale. Please reopen if you'd like to work on this further.

Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues TF 2.16 type:build/install Build and install issues
Projects
None yet
Development

No branches or pull requests

6 participants