Tensorflow per_process_gpu_memory_fraction used more memory than specified #30039

zli117 · 2019-06-22T06:05:28Z

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Arch linux 5.1.12
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
TensorFlow installed from (source or binary): Arch Linux repository
TensorFlow version (use command below): 1.14.0-rc1
Python version: 3.7.3
Bazel version (if compiling from source): N/A
GCC/Compiler version (if compiling from source): N/A
CUDA/cuDNN version: 10.1.168
GPU model and memory: Quadro M2200, 4043 MB

You can collect some of this information using our environment capture
script
You can also obtain the TensorFlow version with: 1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)" 2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

Describe the current behavior
Tensorflow allocates more memory than specified. When running multiple processes sharing the same GPU can cause one process to have out of memory exception. For example, I specified it to use no more than 50% of GPU memory. However, it actually allocates ~52% memory as in the screenshot.

Describe the expected behavior
I would expect it to allocate no more than 50% memory. In my case, it would be <=2021.5 MB.

Code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate the problem.

import tensorflow as tf

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.5)

with tf.compat.v1.Session(
        config=tf.ConfigProto(gpu_options=gpu_options)) as sess:
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)
    while True:
        sess.run(c)

Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

The text was updated successfully, but these errors were encountered:

nikitos9000 · 2019-07-29T15:21:47Z

Can confirm this issue, on a system with NVIDIA 2080 Ti it behaves the following way:
with per_process_gpu_memory_fraction=0.2 it allocates 2457 MiB / 10989 MiB (as shown in nvidia-smi), which is obviously greater than expected (0.2 * 10989 = 2198 MiB)

With per_process_gpu_memory_fraction=0.1 it allocates 1357 MiB / 10989 MiB, which is greater than 0.1 * 10989 = 1099 MiB expected.

aaroey · 2019-12-18T17:34:41Z

Hi @zli117, I think this is expected. per_process_gpu_memory_fraction specifies the amount of memory that TF will be used to allocate input/output tensors of the graph and temporary buffers for intermediate results. This doesn't include memory that is needed to initialize CUDA/cuDNN and other GPU libraries.

I'm closing this, feel free to reopen if there are further questions.

achandraa self-assigned this Jun 24, 2019

achandraa added comp:gpu GPU related issues type:bug Bug labels Jun 24, 2019

achandraa assigned ymodak and unassigned achandraa Jun 27, 2019

ymodak assigned caisq and unassigned ymodak Jul 1, 2019

ymodak added type:support Support issues and removed type:bug Bug labels Jul 1, 2019

goldiegadde added the TF 1.14 for issues seen with TF 1.14 label Jul 10, 2019

ymodak assigned aaroey and unassigned caisq Aug 30, 2019

robnagler mentioned this issue Dec 18, 2019

Fix tensorflow, cuda, py2 issues radiasoft/container-jupyter-nvidia#2

Closed

aaroey closed this as completed Dec 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensorflow per_process_gpu_memory_fraction used more memory than specified #30039

Tensorflow per_process_gpu_memory_fraction used more memory than specified #30039

zli117 commented Jun 22, 2019

nikitos9000 commented Jul 29, 2019

aaroey commented Dec 18, 2019

Tensorflow per_process_gpu_memory_fraction used more memory than specified #30039

Tensorflow per_process_gpu_memory_fraction used more memory than specified #30039

Comments

zli117 commented Jun 22, 2019

nikitos9000 commented Jul 29, 2019

aaroey commented Dec 18, 2019