Zero volatile GPU-Util but high GPU Memory Usage #543

lglhuada · 2015-12-18T03:47:31Z

Hi I am running a model implemented by tensorflow with only one GPU, the GPU usage is 95% while the volatile GPU-Util is 0.

Specifically I have Tesla k40m with cuda 7.0 and cudnn 6.5v2 installed on Centos 7.0. There are three files: data_loader.py, model.py and train.py in my project. In the train.py I firstly declared
" with tf.device('/gpu:0'):" and then sess.run([train_op]). When I run the code, errors raised:

"tensorflow/core/common_runtime/gpu/gpu_init.cc:45] cannot enable peer access from device ordinal 0 to device ordinal 2"

On the other hand, I installed tensorflow with Pip.

Any help are more than welcome.

zheng-xq · 2015-12-18T20:17:15Z

cannot enable peer access from device ordinal 0 to device ordinal 2"

@lglhuada, This is not an error, just log. It just means there is no efficient way to transfer data from gpu:0 to gpu:2. You can exclude either one of them through CUDA_VISIBLE_DEVICES

the GPU usage is 95% while the volatile GPU-Util is 0.

This is also expected. TensorFlow always reserves most the GPU memory when it initializes, even before the first GPU kernels are received. So you will see a high memory usage at the beginning. But the fact GPU is not actually utilized means none of the kernels are running on GPU.

Could you try to run the tutorials and see if you can get any GPU utilized?

bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer
bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu

If you still have problems after that, please provide more information about your machine set up.

lglhuada · 2015-12-21T09:56:48Z

@zheng-xq Thanks for your answers. I have been trying to run the tutorial to check whether GPU are utilized. However, there are a lot errors. I will add comments if errors were still available.

lglhuada · 2015-12-28T09:20:07Z

@zheng-xq hi, I have checked that I can use GPU with bazel-example and the GPU-util is 21%. So now I need bazel to build my project, right? thanks

zheng-xq · 2015-12-28T23:02:36Z

In this case, it is okay to use bazel to build your project, although it
shouldn't be necessary.

Please make sure you installed the GPU-enabled TensorFlow binaries. If you
built from source, please use:

bazel build -c opt --config=cuda
//tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
pip install /tmp/tensorflow_pkg/tensorflow-0.6.0-cp27-none-linux_x86_64.whl

On Mon, Dec 28, 2015 at 1:21 AM, Guangliang Liu notifications@github.com
wrote:

@zheng-xq https://github.com/zheng-xq hi, I have checked that I can use
GPU with bazel-example and the GPU-util is 21%. So now I need bazel to
build my project, right? thanks

—
Reply to this email directly or view it on GitHub
#543 (comment)
.

lglhuada · 2015-12-29T02:59:36Z

hi @zheng-xq thanks for your quick feedback. Actually I tried to run my code without bazel-build and GPU-util is 0 with command " with tf.device('/gpu:0') " , and I am sure I have installed GPU-enabled tensorflow binaries, cudatoolkit 7.0 and cudnn 6.5. What might lead to this issue?

lglhuada · 2015-12-29T08:14:28Z

Hi I have solved my problem without bazel-build, and I modified codes following the code example of cifar10_multi_gpu_train. Thanks.

OswinGuai · 2016-10-17T09:22:39Z

@lglhuada I encountered the same problem. How did you make the code run in GPU? What's the key code?

lglhuada · 2017-03-08T11:23:08Z

@OswinGuai If your computation is not as much as possible( such as add or plus operation), the GPU util is not obvious, try implement large models. ；）

Revert "Use open() instead of tf.gfile.FastGFile()"

TillLindemann · 2017-11-06T13:35:15Z

@lglhuada Hi, recently I have the same issue, I tried three different types of neural network :
1.the simple feedforward network ,2.the pixelcnn 3.GAN.
And weird things happened,
1 uses very low usage of the gpu, and that project has very large training data.
2 uses almost 100% of gpu .
3 the usage of gpu is preodic , sometimes up to 50% sometimes down to 0%.
After exiperiment, I finally found the reason, it's the code is to blame, actually the code was written by someone who has no gpu on his laptop,so it's totally unfriendly with gpu,most operations do run on the gpu,but there still some operations are run on the cpu,and when the code running on the cpu,the gpu has to wait,that causes the gap. there are some solutions for it , change the type of variables and constants to tf.float32. but that would not change a lot , so if the code is not friendly with gpu ,you should use tensorflow-cpu version,that might be faster than gpu-version. So In conclusion, if your gpu does running on some codes and not running on some codes,it probably the code's problem.

tuobay · 2017-11-06T14:27:22Z

When I train the faster-rcnn-resnet101 model with object-detection repo, no matter how many GPUs I used, the train.py will consume all of the GPUs. e.g. when I use 1, 2 or 4 GPUs, train.py occupy all the memory of them, but only one GPU 'Volatile GPU-Util ' is 100%, the others are 0%.

Kirancgi · 2019-03-21T09:59:32Z

%.

so what you did? ... actually i am also facing same issue can u please help

turowicz · 2019-04-12T09:40:34Z

I had this problem when the .record files were invalid.

cc @Kirancgi

Kirancgi · 2019-04-12T12:49:10Z

thanks buddy will try it out
@turowicz

mrry assigned zheng-xq Dec 18, 2015

lglhuada closed this as completed Dec 29, 2015

tarasglek pushed a commit to tarasglek/tensorflow that referenced this issue Jun 20, 2017

Merge pull request tensorflow#543 from jart/rollback-open

0f90408

Revert "Use open() instead of tf.gfile.FastGFile()"

LinHungShi mentioned this issue Oct 9, 2017

Problem running the training script LinHungShi/GCNetwork#10

Open

MaybeShewill-CV mentioned this issue Nov 30, 2017

突然发现个问题，我的gpu使用率是0，你们有遇见吗？ MaybeShewill-CV/CRNN_Tensorflow#16

Closed

Kirancgi unassigned zheng-xq Mar 21, 2019

w-zm mentioned this issue Aug 3, 2019

Keras with ThensorFlow GPU not work fo40225/tensorflow-windows-wheel#61

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero volatile GPU-Util but high GPU Memory Usage #543

Zero volatile GPU-Util but high GPU Memory Usage #543

lglhuada commented Dec 18, 2015

zheng-xq commented Dec 18, 2015

lglhuada commented Dec 21, 2015

lglhuada commented Dec 28, 2015

zheng-xq commented Dec 28, 2015

lglhuada commented Dec 29, 2015

lglhuada commented Dec 29, 2015

OswinGuai commented Oct 17, 2016

lglhuada commented Mar 8, 2017

TillLindemann commented Nov 6, 2017 •

edited

tuobay commented Nov 6, 2017

Kirancgi commented Mar 21, 2019

turowicz commented Apr 12, 2019

Kirancgi commented Apr 12, 2019

Zero volatile GPU-Util but high GPU Memory Usage #543

Zero volatile GPU-Util but high GPU Memory Usage #543

Comments

lglhuada commented Dec 18, 2015

zheng-xq commented Dec 18, 2015

lglhuada commented Dec 21, 2015

lglhuada commented Dec 28, 2015

zheng-xq commented Dec 28, 2015

lglhuada commented Dec 29, 2015

lglhuada commented Dec 29, 2015

OswinGuai commented Oct 17, 2016

lglhuada commented Mar 8, 2017

TillLindemann commented Nov 6, 2017 • edited

tuobay commented Nov 6, 2017

Kirancgi commented Mar 21, 2019

turowicz commented Apr 12, 2019

Kirancgi commented Apr 12, 2019

TillLindemann commented Nov 6, 2017 •

edited