-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zero volatile GPU-Util but high GPU Memory Usage #543
Comments
@lglhuada, This is not an error, just log. It just means there is no efficient way to transfer data from gpu:0 to gpu:2. You can exclude either one of them through CUDA_VISIBLE_DEVICES
This is also expected. TensorFlow always reserves most the GPU memory when it initializes, even before the first GPU kernels are received. So you will see a high memory usage at the beginning. But the fact GPU is not actually utilized means none of the kernels are running on GPU. Could you try to run the tutorials and see if you can get any GPU utilized?
If you still have problems after that, please provide more information about your machine set up. |
@zheng-xq Thanks for your answers. I have been trying to run the tutorial to check whether GPU are utilized. However, there are a lot errors. I will add comments if errors were still available. |
@zheng-xq hi, I have checked that I can use GPU with bazel-example and the GPU-util is 21%. So now I need bazel to build my project, right? thanks |
In this case, it is okay to use bazel to build your project, although it Please make sure you installed the GPU-enabled TensorFlow binaries. If you bazel build -c opt --config=cuda On Mon, Dec 28, 2015 at 1:21 AM, Guangliang Liu notifications@github.com
|
hi @zheng-xq thanks for your quick feedback. Actually I tried to run my code without bazel-build and GPU-util is 0 with command " with tf.device('/gpu:0') " , and I am sure I have installed GPU-enabled tensorflow binaries, cudatoolkit 7.0 and cudnn 6.5. What might lead to this issue? |
Hi I have solved my problem without bazel-build, and I modified codes following the code example of cifar10_multi_gpu_train. Thanks. |
@lglhuada I encountered the same problem. How did you make the code run in GPU? What's the key code? |
@OswinGuai If your computation is not as much as possible( such as add or plus operation), the GPU util is not obvious, try implement large models. ;) |
Revert "Use open() instead of tf.gfile.FastGFile()"
@lglhuada Hi, recently I have the same issue, I tried three different types of neural network : |
When I train the faster-rcnn-resnet101 model with object-detection repo, no matter how many GPUs I used, the train.py will consume all of the GPUs. e.g. when I use 1, 2 or 4 GPUs, train.py occupy all the memory of them, but only one GPU 'Volatile GPU-Util ' is 100%, the others are 0%. |
so what you did? ... actually i am also facing same issue can u please help |
I had this problem when the cc @Kirancgi |
thanks buddy will try it out |
Hi I am running a model implemented by tensorflow with only one GPU, the GPU usage is 95% while the volatile GPU-Util is 0.
Specifically I have Tesla k40m with cuda 7.0 and cudnn 6.5v2 installed on Centos 7.0. There are three files: data_loader.py, model.py and train.py in my project. In the train.py I firstly declared
" with tf.device('/gpu:0'):" and then sess.run([train_op]). When I run the code, errors raised:
"tensorflow/core/common_runtime/gpu/gpu_init.cc:45] cannot enable peer access from device ordinal 0 to device ordinal 2"
On the other hand, I installed tensorflow with Pip.
Any help are more than welcome.
The text was updated successfully, but these errors were encountered: