Tensorflow Still Trying to use CUDA even when Session Created with device_count={'GPU': 0} #9201

cancan101 · 2017-04-13T21:04:36Z

System Information

Using the tensorflow/tensorflow:1.0.1-devel-gpu Docker image.
('v1.0.0-65-g4763edf-dirty', '1.0.1')
Host: Driver Version: 367.57, 3.13.0-57-generic

Issue

If I Set compute mode to EXCLUSIVE_PROCESS on the Nvidia device (sudo nvidia-smi -c 1), then even though I tell the Session not to use GPUs (config=tf.ConfigProto(device_count={'GPU': 0})), Tensorflow attempts to use the GPU resulting in an inability to create session:

InternalErrorTraceback (most recent call last)
<ipython-input-1-cabf26c1451a> in <module>()
      1 import tensorflow as tf
      2 from tensorflow.python.framework import ops
----> 3 with tf.Session(config=tf.ConfigProto(device_count={'GPU': 0})) as sess:
      4     pass

/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in __init__(self, target, graph, config)
   1174 
   1175     """
-> 1176     super(Session, self).__init__(target, graph, config=config)
   1177     # NOTE(mrry): Create these on first `__enter__` to avoid a reference cycle.
   1178     self._default_graph_context_manager = None

/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in __init__(self, target, graph, config)
    550     try:
    551       with errors.raise_exception_on_not_ok_status() as status:
--> 552         self._session = tf_session.TF_NewDeprecatedSession(opts, status)
    553     finally:
    554       tf_session.TF_DeleteSessionOptions(opts)

/usr/lib/python2.7/contextlib.pyc in __exit__(self, type, value, traceback)
     22         if type is None:
     23             try:
---> 24                 self.gen.next()
     25             except StopIteration:
     26                 return

/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.pyc in raise_exception_on_not_ok_status()
    464           None, None,
    465           compat.as_text(pywrap_tensorflow.TF_Message(status)),
--> 466           pywrap_tensorflow.TF_GetCode(status))
    467   finally:
    468     pywrap_tensorflow.TF_DeleteStatus(status)

InternalError: Failed to create session.

This can be demonstrated by running:

import tensorflow as tf
from tensorflow.python.framework import ops
with tf.Session(config=tf.ConfigProto(device_count={'GPU': 0})) as sess:
    pass

when another process is using CUDA and the exclusive process mode is set.

If exclusive process mode is not set, then the session is created but using nvidia-smi, I see that the process is using GPU ram (and CUDA):

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      2237    C   /usr/bin/python                                 61MiB |

The issue seems limited to TF trying to lock the CUDA device (an allocate ~61MB memory). Subsequent computations do happen correctly on the CPU.

The text was updated successfully, but these errors were encountered:

jart · 2017-04-14T05:25:56Z

You want either export CUDA_VISIBLE_DEVICES= or alternatively a virtualenv with non-GPU TensorFlow. See also: #2175 (comment)

cancan101 · 2017-04-14T13:26:36Z

@jart: I'm not sure why the config approach I outlined doesn't work and why the only suggestion is to set an env var. Setting the configuration as I did seems to partially work (ie prevent usage of the gpu for the graph) but not totally ie it locks device. This seems to violate "principle of least astonishment". It seems like this is either a documentation issue or an issue with how the config is used.

The environmental var approach is not ideal as:

It is weird for a process to set this value it self.
related to 1, but limits the ability to set use GPU vs not on a per Session basis.

cancan101 · 2017-04-25T15:47:40Z

@jart Any thoughts on the above questions / comments?

jart · 2017-04-25T19:58:24Z

@zheng-xq Our friend @cancan101 believes it would be less astonishing for our users if tf.ConfigProto(device_count={'GPU': 0}) also implied export CUDA_VISIBLE_DEVICES="". That doesn't sound unreasonable to me. What are your opinions on this feature request?

Belval · 2017-05-05T18:23:34Z

I am experiencing the same issue with TF and I too believe tf.ConfigProto(device_count={'GPU': 0}) should imply export CUDA_VISIBLE_DEVICES="". I'd like to be able to use my GPU for specific tasks without setting up a second env.

zjuxxd · 2017-06-29T06:22:34Z

I'm also have the same problem. Will be very happy if this could be supported.

TimZaman · 2017-12-20T12:45:06Z

Why is this issue closed?

Belval · 2017-12-20T13:07:42Z

@TimZaman

It's not, but it really isn't a priority as you can (I know it's ugly)

os.environ["CUDA_VISIBLE_DEVICES"] = ''

# Code that uses Tensorflow without GPU

os.environ["CUDA_VISIBLE_DEVICES"] = '0'

If you want you could also wrap the whole thing in a decorator:

import os

def cpu_only():
    def _method_wrapper(function):
        def wrap(*args, **kwargs):
            os.environ["CUDA_VISIBLE_DEVICES"] = ''
            ret = function(*args, **kwargs)
            os.environ["CUDA_VISIBLE_DEVICES"] = '0'
            return ret
            wrap.__doc__ = function.__doc__
            wrap.__name__ = function.__name__
        return wrap
    return _method_wrapper

Someone might work on it one day but I wouldn't hold my breath

TimZaman · 2017-12-20T13:14:30Z

@Belval hehe yeah that makes me feel like I want to take a shower.
But I love how you wrapped that turd in a beautiful decorator! 🤣 Fair enough for now, I can see how this is not top prio.

mohantym · 2022-05-24T12:10:51Z

Hi @cancan101 ! 1.x issues are not supported any more. You can use tf.device to switch between CPU and GPU in 2.x versions. Thank you!

google-ml-butler · 2022-05-31T12:19:13Z

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler · 2022-06-07T12:59:11Z

Closing as stale. Please reopen if you'd like to work on this further.

Imported from GitHub PR openxla/xla#9201 This PR implements the following optimizations: ``` Gt(Max(a,b), a) -> Gt(b,a) Gt(Max(a,b), b) -> Gt(a,b) Gt(Min(a,b), a) -> False Gt(Min(a,b), b) -> False Gt(a, Min(a,b)) -> Gt(a,b) Gt(b, Min(a,b)) -> Gt(b,a) Gt(a, Max(a,b)) -> False Gt(b, Max(a,b)) -> False ``` We tested `Gt(Max(a,b), a) -> Gt(b,a)` optimization on Resnet50 model internally. Overall, we observed the following benefits of adding this optimization: - VmRSS usage is 14% less - Number of instructions - 13% less - Memory locations - 22% less Discussion: **Optimization for fold compare_GT(maximum(a,b), a) into compare_GT(b,a). #[8346](openxla/xla#8346 Copybara import of the project: -- 7695a3ff259d2174af46634a6e2276aabe295d05 by Alexander Pivovarov <pivovaa@amazon.com>: Simplify Gt(Max(a,b), a) -> Gt(b,a) Merging this change closes #9201 PiperOrigin-RevId: 605206447

Imported from GitHub PR openxla/xla#9201 PiperOrigin-RevId: 605598127

jart closed this as completed Apr 14, 2017

jart added the type:support Support issues label Apr 14, 2017

jart reopened this Apr 25, 2017

jart added the type:feature Feature requests label Apr 25, 2017

aselle added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Apr 25, 2017

una-dinosauria mentioned this issue May 3, 2017

Javier comments 1 una-dinosauria/3d-pose-baseline#8

Merged

skye added stat:contribution welcome Status - Contributions welcome and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower type:support Support issues labels Jun 16, 2017

ed-alertedh mentioned this issue Feb 12, 2018

[bug?] Tensorflow accepts CUDA_VISIBLE_DEVICES but still allocates memory on multiple GPUs #16284

Closed

nickjong mentioned this issue Nov 20, 2019

Setting number of GPUs to 0 on a CUDA machine should disable use of GPUs apple/turicreate#2651

Closed

mohantym self-assigned this May 24, 2022

mohantym added the stat:awaiting response Status - Awaiting response from author label May 24, 2022

google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label May 31, 2022

google-ml-butler bot closed this as completed Jun 7, 2022

copybara-service bot pushed a commit that referenced this issue Feb 9, 2024

Reland PR #9201: Simplify Gt(Max(a,b), a) -> Gt(b,a)

1223564

Imported from GitHub PR openxla/xla#9201 PiperOrigin-RevId: 605598127

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensorflow Still Trying to use CUDA even when Session Created with device_count={'GPU': 0} #9201

Tensorflow Still Trying to use CUDA even when Session Created with device_count={'GPU': 0} #9201

cancan101 commented Apr 13, 2017

jart commented Apr 14, 2017

cancan101 commented Apr 14, 2017 •

edited

Loading

cancan101 commented Apr 25, 2017

jart commented Apr 25, 2017

Belval commented May 5, 2017

zjuxxd commented Jun 29, 2017

TimZaman commented Dec 20, 2017

Belval commented Dec 20, 2017

TimZaman commented Dec 20, 2017

mohantym commented May 24, 2022

google-ml-butler bot commented May 31, 2022

google-ml-butler bot commented Jun 7, 2022

Tensorflow Still Trying to use CUDA even when Session Created with device_count={'GPU': 0} #9201

Tensorflow Still Trying to use CUDA even when Session Created with device_count={'GPU': 0} #9201

Comments

cancan101 commented Apr 13, 2017

System Information

Issue

jart commented Apr 14, 2017

cancan101 commented Apr 14, 2017 • edited Loading

cancan101 commented Apr 25, 2017

jart commented Apr 25, 2017

Belval commented May 5, 2017

zjuxxd commented Jun 29, 2017

TimZaman commented Dec 20, 2017

Belval commented Dec 20, 2017

TimZaman commented Dec 20, 2017

mohantym commented May 24, 2022

google-ml-butler bot commented May 31, 2022

google-ml-butler bot commented Jun 7, 2022

cancan101 commented Apr 14, 2017 •

edited

Loading