Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Device Selector in TensorFlow 2.0 #26460

Closed
alsrgv opened this issue Mar 7, 2019 · 34 comments
Closed

GPU Device Selector in TensorFlow 2.0 #26460

alsrgv opened this issue Mar 7, 2019 · 34 comments
Assignees
Labels
comp:gpu GPU related issues type:feature Feature requests

Comments

@alsrgv
Copy link
Contributor

alsrgv commented Mar 7, 2019

Please make sure that this is a feature request. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:feature_template

System information

  • TensorFlow version (you are using): 2.0
  • Are you willing to contribute it (Yes/No): Happy to help as much as I can!

Describe the feature and the current behavior/state.
TensorFlow 1.x support specifying GPU devices to use:

# Horovod: pin GPU to be used to process local rank (one GPU per process)
config = tf.ConfigProto()
config.gpu_options.visible_device_list = str(hvd.local_rank())

There's no comparable API in TensorFlow 2.0. The closest option is to use the CUDA_VISIBLE_DEVICES environment variable. Unfortunately, CUDA_VISIBLE_DEVICES prevents processes from doing cudaMemcpy from/to devices not owned by the process. There's a significant performance degradation when NCCL is used with P2P communication disabled.

The ask is to add an API to TensorFlow 2.0 to enable device selection.

Will this change the current api? How?
Yes, will introduce an API to select GPU devices to use.

Who will benefit with this feature?
Users of Horovod.

Any Other info.
cc @azaks2 @alextp @jaingaurav @guptapriya

@jaingaurav jaingaurav self-assigned this Mar 7, 2019
@jaingaurav
Copy link
Contributor

Duplicate of #25446

@jaingaurav jaingaurav marked this as a duplicate of #25446 Mar 7, 2019
@guptapriya
Copy link
Contributor

@jaingaurav is a replacement for setting visible_device_list on your radar? I ask because I think Alex said there are some technical difficulties in implementing it.

@jvishnuvardhan jvishnuvardhan added comp:gpu GPU related issues type:feature Feature requests labels Mar 7, 2019
@jaingaurav
Copy link
Contributor

@guptapriya: It is on my radar, but I need to still sync up with @alextp on the potential issues.

@jaingaurav
Copy link
Contributor

A number of new API were added in tf.config namespace to support this use case. Please let me know if there is anything we missed regarding this specific issue.

@alsrgv
Copy link
Contributor Author

alsrgv commented Apr 19, 2019

Thanks for the update, @jaingaurav!

I have tried new functionality via tensorflow/tensorflow:nightly-gpu-py3 and have a couple of questions.

First, the API requires one to do tf.config.experimental.list_physical_devices('GPU'), filter that list and provide remnants to tf.config.experimental.set_visible_devices(physical_devices[1:], 'GPU').

During the list operation, TensorFlow creates a GPU context on every GPU, including ones that we're not planning to use. You can see how this is wasteful if we will run 8 TensorFlow processes on 8-GPU server, each taking up ~120MB of GPU memory, totaling almost 1GB of wasted GPU memory.

Could you add a way to set visible devices w/o binding GPU contexts?

Second, I noticed that our legacy usage of config.gpu_options.visible_device_list = '0' and tf.enable_eager_execution(config=config) has stopped working. Is this intentional for 1.14?

root@fc725ca05627:/# python
Python 3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> config = tf.ConfigProto()
>>> config.gpu_options.visible_device_list = '0'
>>> tf.enable_eager_execution(config=config)
>>> tf.constant(1)
2019-04-19 07:12:06.657522: I tensorflow/stream_executor/platform/default/dso_loader.cc:43] Successfully opened dynamic library libcuda.so.1
2019-04-19 07:12:13.742849: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-04-19 07:12:14.238725: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1589] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
totalMemory: 14.73GiB freeMemory: 14.60GiB
2019-04-19 07:12:14.461207: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-04-19 07:12:14.463142: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1589] Found device 1 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:05.0
totalMemory: 14.73GiB freeMemory: 14.60GiB
2019-04-19 07:12:14.880827: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-04-19 07:12:14.881876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1589] Found device 2 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:06.0
totalMemory: 14.73GiB freeMemory: 14.60GiB
2019-04-19 07:12:15.076517: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-04-19 07:12:15.077554: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1589] Found device 3 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:07.0
totalMemory: 14.73GiB freeMemory: 14.60GiB
2019-04-19 07:12:15.085892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1712] Adding visible gpu devices: 0, 1, 2, 3
2019-04-19 07:12:15.089494: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-04-19 07:12:15.114457: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x68e92e0 executing computations on platform CUDA. Devices:
2019-04-19 07:12:15.114490: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2019-04-19 07:12:15.114504: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (1): Tesla T4, Compute Capability 7.5
2019-04-19 07:12:15.114511: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (2): Tesla T4, Compute Capability 7.5
2019-04-19 07:12:15.114517: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (3): Tesla T4, Compute Capability 7.5
2019-04-19 07:12:15.117618: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2019-04-19 07:12:15.121237: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x6a80e90 executing computations on platform Host. Devices:
2019-04-19 07:12:15.121270: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-04-19 07:12:15.121415: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1712] Adding visible gpu devices: 0, 1, 2, 3
2019-04-19 07:12:15.121678: I tensorflow/stream_executor/platform/default/dso_loader.cc:43] Successfully opened dynamic library libcudart.so.10.0
2019-04-19 07:12:15.127364: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-19 07:12:15.127398: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1126]      0 1 2 3
2019-04-19 07:12:15.127407: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1139] 0:   N Y N N
2019-04-19 07:12:15.127414: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1139] 1:   Y N N N
2019-04-19 07:12:15.127420: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1139] 2:   N N N Y
2019-04-19 07:12:15.127426: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1139] 3:   N N Y N
2019-04-19 07:12:15.128428: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1260] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14202 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
2019-04-19 07:12:15.128847: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1260] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 14202 MB memory) -> physical GPU (device: 1, name: Tesla T4, pci bus id: 0000:00:05.0, compute capability: 7.5)
2019-04-19 07:12:15.129209: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1260] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 14202 MB memory) -> physical GPU (device: 2, name: Tesla T4, pci bus id: 0000:00:06.0, compute capability: 7.5)
2019-04-19 07:12:15.129622: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1260] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 14202 MB memory) -> physical GPU (device: 3, name: Tesla T4, pci bus id: 0000:00:07.0, compute capability: 7.5)
<tf.Tensor: id=0, shape=(), dtype=int32, numpy=1>
>>> tf.config.experimental.get_visible_devices()
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:3', device_type='GPU')]
>>>

@jaingaurav
Copy link
Contributor

@alsrgv: Are you sure that you the listing operation is causing the GPU memory to be used? The listing API was supposed to be a lightweight operation that would not involve any memory allocation. Did you experience this with the 1.0 or 2.0 nightly?

Regarding the bug you mentioned with tf.enable_eager_execution(config=config). This is a bug in the implementation. I'll look into it tomorrow and get a fix into the 1.14 branch.

@alsrgv
Copy link
Contributor Author

alsrgv commented Apr 19, 2019

@jaingaurav, thanks for the quick response. I did experience it in tensorflow/tensorflow:nightly-gpu-py3, which reports itself as 1.14.1-dev20190417. Do you expect 2.0-nightly behavior to be different? Is there a docker image with 2.0-gpu-nightly?

The way I verify GPU memory usage is via nvidia-smi. After running tf.config.experimental.list_physical_devices('GPU') in one of the terminals, I see memory usage on all GPUs in another:

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      6287      C   python                                       119MiB |
|    1      6287      C   python                                       119MiB |
|    2      6287      C   python                                       119MiB |
|    3      6287      C   python                                       119MiB |
+-----------------------------------------------------------------------------+

@jaingaurav
Copy link
Contributor

Thanks for the details. I have a fix for the regression that I am getting through code review.

Looking into the GPU memory allocation issue now. I was able to reproduce it locally.

@jaingaurav
Copy link
Contributor

@alsrgv: The fix for tf.enable_eager_execution(config=config) has been merged. Could you let me know if it works for you, and I'll have it cherry-picked into 1.14.

I am working on the GPU memory allocation issue. However, the fix requires quite a bit of code re-structuring. It seems we end up allocating memory as a function of querying CUDA capabilities.

@alsrgv
Copy link
Contributor Author

alsrgv commented Apr 23, 2019

@jaingaurav, thanks for the fix! I see the following error: ValueError: Invalid visible device index: 0 in CPU environments. Historically, using visible_device_list = '0' on CPU machine was a no-op, but now it's crashing. Is it possible to avoid the crash in this scenario?

Looking forward to the memory usage fix. Memory usage could be caused by the creation of CUDA context. If that's the case, driver API should allow querying device capabilities w/o CUDA context (and memory usage).

@jaingaurav
Copy link
Contributor

@alsrgv: Thanks for the previous behavior. I'll ensure that I maintain compatibility. I've got the memory issue almost fixed. One last change needed.

In terms of releases, I will ensure the regression is cherry-picked into 1.14. However, for the memory issue, can it wait till 1.15 & 2.0 or would you like it for 1.14 as well? Just depends on what you'd like to support.

@alsrgv
Copy link
Contributor Author

alsrgv commented Apr 24, 2019

@jaingaurav, looking forward to the fixes!

It would be great if memory issue fix can be picked in 1.14 as well. 1.13 did have memory issue with XLA (it was binding memory on all devices as well), and it was causing out of memory issues with cuDNN. So there are no release w/o memory issues since 1.12.

@jaingaurav
Copy link
Contributor

@alsrgv: The XLA issues has been fixed with a427c13 correct? If so, it'll be in the 1.14 branch.

@alsrgv
Copy link
Contributor Author

alsrgv commented Apr 24, 2019

@jaingaurav, yes, that's it - hence the request to pick the fix for this memory issue into 1.14 branch as well.

@jaingaurav
Copy link
Contributor

@alsrgv: All known issues should be fixed in tonight's nightly. Once you confirm the behavior, I will speak to the release team about trying to get the memory fixes into 1.14. The changes weren't too bad, but they do incur some risk to get into the release.

@alsrgv
Copy link
Contributor Author

alsrgv commented Apr 26, 2019

@jaingaurav, thanks for the fixes!

I just tried https://files.pythonhosted.org/packages/e1/c6/6cde177c97e975d3c0aa36a7df87b353e8cfa26660735f6668d314106d81/tf_nightly_gpu-1.14.1.dev20190426-cp37-cp37m-manylinux1_x86_64.whl.

The memory leak with tf.config.experimental.list_physical_devices('GPU') is fixed 👍

I'm still getting an error with visible_device_list in absence of GPUs:

(env) root@153afad1fa58:/# CUDA_VISIBLE_DEVICES='' python
Python 3.7.3 (default, Mar 26 2019, 00:55:50)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> config = tf.ConfigProto()
>>> config.gpu_options.visible_device_list = '0'
>>> tf.enable_eager_execution(config=config)
>>> tf.constant(1)
2019-04-26 06:20:39.548228: I tensorflow/stream_executor/platform/default/dso_loader.cc:43] Successfully opened dynamic library libcuda.so.1
2019-04-26 06:20:44.481123: E tensorflow/stream_executor/cuda/cuda_driver.cc:320] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2019-04-26 06:20:44.481316: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:166] retrieving CUDA diagnostic information for host: 153afad1fa58
2019-04-26 06:20:44.481341: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:173] hostname: 153afad1fa58
2019-04-26 06:20:44.481492: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:197] libcuda reported version is: 410.72.0
2019-04-26 06:20:44.481544: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:201] kernel reported version is: 410.72.0
2019-04-26 06:20:44.481555: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:308] kernel version seems to match DSO: 410.72.0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/env/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 180, in constant_v1
    allow_broadcast=False)
  File "/env/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 254, in _constant_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "/env/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 98, in convert_to_eager_tensor
    ctx.ensure_initialized()
  File "/env/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 405, in ensure_initialized
    config_str = self.config.SerializeToString()
  File "/env/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 672, in config
    self._initialize_physical_devices()
  File "/env/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 917, in _initialize_physical_devices
    self._import_config()
  File "/env/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 971, in _import_config
    raise ValueError("Invalid visible device index: %s" % index)
ValueError: Invalid visible device index: 0
>>>

I tried this on CPU build with the same outcome.

@jaingaurav
Copy link
Contributor

Thanks @alsrgv. From the looks of it that nightly build might not have that latest change yet. We can re-verify in the next build.

@alsrgv
Copy link
Contributor Author

alsrgv commented Apr 26, 2019

@jaingaurav, I just built the master from source and can confirm it works, thanks!

@ppwwyyxx
Copy link
Contributor

@jaingaurav thanks a lot for this improvement! I can verify that it also fixes another old issue at #8136 (comment).
This code:

import tensorflow as tf
print(tf.config.experimental.list_physical_devices('GPU'))
cfg = tf.ConfigProto()
cfg.gpu_options.visible_device_list = '1'
sess = tf.Session(config=cfg)   # do not fail

do not fail when running on a 2-gpu machine. But it would fail if using tf.test.is_gpu_available() instead of list_physical_devices.

Any chance we can have this great improvement into tf.test.is_gpu_available()?

@jaingaurav
Copy link
Contributor

@ppwwyyxx: This is exactly why the new API was created. Unfortunately any changes to tf.test.is_gpu_available() would possibly break backwards compatibility. Hence, I'd prefer if you used the new APIs. Also, unless you are writing test cases, I'd probably avoid using that API.

@alsrgv
Copy link
Contributor Author

alsrgv commented Apr 29, 2019

@jaingaurav, any news whether these fixes can be picked into r1.14? We'd really like a release since 1.12.x that has correct GPU memory binding behavior.

cc @martinwicke

@alextp
Copy link
Contributor

alextp commented Apr 29, 2019 via email

@jaingaurav
Copy link
Contributor

@alsrgv: We had a chat about it this morning. Given the status of the 1.14, we're going to aim to cherry-pick the memory fixes. We'd greatly appreciate any testing that you can do to help ensure that we don't incur any regressions and that we have everything you need in 1.14. Thank you for all that you have done so far!

@alsrgv
Copy link
Contributor Author

alsrgv commented Apr 29, 2019

@jaingaurav, perfect, thanks! I will test 1.14 RCs as they come out.

@llan-ml
Copy link

llan-ml commented Jun 7, 2019

@jaingaurav Hi, I try to use device selector with tf2, but there are still some problems:

In [1]: import tensorflow as tf

In [2]: tf.__version__
Out[2]: '2.0.0-dev20190606'

In [3]: gpus = tf.config.experimental.list_physical_devices("GPU")

In [4]: gpus
Out[4]:
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'),
 PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]

In [5]: tf.config.experimental.set_visible_devices(gpus[0], 'GPU')

In [6]: tf.config.experimental.set_memory_growth(gpus[0], True)

In [7]: tf.constant(1)
...
ntext.py in _compute_gpu_options(self)
    851       memory_growths = set(self._memory_growth_map.values())
    852       if len(memory_growths) > 1:
--> 853         raise ValueError("Memory growth cannot differ between GPU devices")
    854       allow_growth = memory_growths.pop()
    855     else:

ValueError: Memory growth cannot differ between GPU devices

@jaingaurav
Copy link
Contributor

@llan-ml: Please see the updated guide at https://www.tensorflow.org/beta/guide/using_gpu#limiting_gpu_memory_growth. Currently we require the memory growth option to be uniform across all GPUs. This may change in the future if someone were to implement the changes.

@llan-ml
Copy link

llan-ml commented Jun 12, 2019

@jaingaurav What I mean is that after I have selected a GPU by calling tf.config.experimental.set_visible_devices(gpus[0], 'GPU'), it still raises ValueError: Memory growth cannot differ between GPU devices.

For now, I still have to select a specific GPU by setting CUDA_VISIBLE_DEVICES in a multi-gpu machine.

tensorflow-copybara pushed a commit that referenced this issue Jun 13, 2019
This was discovered in Issue #26460.

PiperOrigin-RevId: 253082055
@jaingaurav
Copy link
Contributor

@llan-ml: Thanks, that is indeed a valid bug. The fix has been pushed now and I'll ensure it makes it into the upcoming 1.14 release.

@pidajay
Copy link

pidajay commented Jul 14, 2019

@jaingaurav I am seeing a lot of memory issues when using depthwise conv2d native. Seems like it does not respect the session config visible devices list and grabs all the GPUs. You mentioned this above - "I am working on the GPU memory allocation issue. However, the fix requires quite a bit of code re-structuring. It seems we end up allocating memory as a function of querying CUDA capabilities." I am wondering if these fixes made their way to TF 1.14. Can you point me to any PRs with these fixes? Thanks!

@jaingaurav
Copy link
Contributor

@pidajay: How are you querying the GPUs, could you share a code snippet? Note this issue was primarily focused on querying GPUs with eager execution. If you are using sessions and experiencing issues there might be something else going on.

Here is a tutorial on how to use the new APIs:
https://www.tensorflow.org/beta/guide/using_gpu#limiting_gpu_memory_growth

@pidajay
Copy link

pidajay commented Jul 14, 2019

@jaingaurav Appreciate the response. I am using an estimator (TPU estimator but running on GPU). Single GPU is fine but problem shows up when distributing across multiple GPUs (I use horovod. But horovod does not seem to be the issue here). I create the visible device list as follows at the beginning of the program

sess_config=tf.ConfigProto()
sess_config.gpu_options.allow_growth = True
sess_config.gpu_options.visible_device_list = str(hvd.local_rank())

But I notice that the estimator violates the visible device list above and allocates all the GPUs. And this happens only when using depthwise conv 2D.
I guess this thread is related to eager mode and probably not the right thread. If you can't think of anything on top of your head, I will try to create a new issue with a tiny example reproducing the problem. Thanks!

@llan-ml
Copy link

llan-ml commented Jul 27, 2019

Hi @jaingaurav It seems that disabling all GPUs does not work properly. I ran the following code

In [1]: import tensorflow as tf

In [2]: tf.__version__
Out[2]: '2.0.0-dev20190725'

In [3]: physical_devices = tf.config.experimental.list_physical_devices('GPU')
2019-07-28 03:12:44.161668: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2019-07-28 03:12:44.243318: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: Tesla V100-PCIE-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.38
pciBusID: 0000:3b:00.0
2019-07-28 03:12:44.244091: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties:
name: Tesla V100-PCIE-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.38
pciBusID: 0000:af:00.0
2019-07-28 03:12:44.244418: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2019-07-28 03:12:44.246004: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2019-07-28 03:12:44.247384: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2019-07-28 03:12:44.247752: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2019-07-28 03:12:44.249599: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2019-07-28 03:12:44.251015: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2019-07-28 03:12:44.255340: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-07-28 03:12:44.258154: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1

In [4]: tf.config.experimental.set_visible_devices([], 'GPU')

In [5]: visible_devices = tf.config.experimental.get_visible_devices()

In [6]: visible_devices
Out[6]: [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]

In [7]: x = tf.Variable(1.0)
2019-07-28 03:13:21.109555: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-07-28 03:13:22.043938: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55e7a5774b90 executing computations on platform CUDA. Devices:
2019-07-28 03:13:22.044006: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Tesla V100-PCIE-16GB, Compute Capability 7.0
2019-07-28 03:13:22.044029: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (1): Tesla V100-PCIE-16GB, Compute Capability 7.0
2019-07-28 03:13:22.067710: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2100000000 Hz
2019-07-28 03:13:22.074734: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55e7a994ac40 executing computations on platform Host. Devices:
2019-07-28 03:13:22.074809: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-07-28 03:13:22.074977: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-28 03:13:22.075003: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]

In [8]: x.device
Out[8]: '/job:localhost/replica:0/task:0/device:CPU:0'

In [9]: !nvidia-smi
Sun Jul 28 03:13:56 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   30C    P0    41W / 250W |    418MiB / 16130MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-PCIE...  Off  | 00000000:AF:00.0 Off |                    0 |
| N/A   35C    P0    42W / 250W |    418MiB / 16130MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     52456      C   ...2.0/envs/tf-nightly-2.0-0725/bin/python   407MiB |
|    1     52456      C   ...2.0/envs/tf-nightly-2.0-0725/bin/python   407MiB |
+-----------------------------------------------------------------------------+

Although the variable x is placed on CPU, the process still occupies some gpu memory. The expected behavior is that running nvidia-smi does not show any process info.

@lucasjinreal
Copy link

What if users want specific 2 GPUs outof 4:

tf.config.experimental.set_visible_devices(gpus[0], 'GPU')

How to enable 2 of them.

@ppwwyyxx
Copy link
Contributor

As the documentation said, you can give it a list of devices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:gpu GPU related issues type:feature Feature requests
Projects
None yet
Development

No branches or pull requests

9 participants