Use gpu_memory_fraction in while using distributed tensorflow #3057

carpedm20 · 2016-06-27T11:41:17Z

While I'm implementing models in A3C, I tried to allocate a fraction of gpu to a process and processes that use multiple fractions of gpus update a parameters of a parameter server (in a single machine). For example, I want to create 12 workers with 3 gpus to update a master model.

I referenced https://www.tensorflow.org/versions/r0.9/how_tos/distributed/index.html and used tf.GPUOptions(per_process_gpu_memory_fraction=0.1) but it doesn't work when we pass it to sv.managed_session (just take all of the memory of the first visible gpu).

Also, I can't find how to give GPUOptions to parameter server that does not have session creation (I always pass GPUoptions to tf.Session which similar to sv.managed_session). How can I allocate a specific fraction of a gpu to each tasks including parameter server and workers?

Code can be found https://github.com/devsisters/DQN-tensorflow/blob/distributed/main.py#L78.

The text was updated successfully, but these errors were encountered:

mrry · 2016-06-27T21:01:19Z

It's possible to do this today, but the interface isn't very intuitive. TL;DR: The GPUOptions in the session creation will be ignored, and you have to pass them when you create the tf.train.Server objects to which these settings apply. (The reason for this is that the GPU device is created when you create the server, not when you create the session. When you use non-distributed TensorFlow, the device is created when you create the session.)

To set this option then, you have to set the tf.ServerDef.default_session_config.gpu_options.per_process_gpu_memory_fraction field when you create the server. Currently you can only do that if you build the tf.train.ServerDef yourself. I'll add a interface to let you override the tf.ConfigProto on its own while still using the Pythonic sugar for defining a cluster, and use that to close this issue.

ghost · 2017-06-30T04:06:57Z

@mrry What if I didn't define any tf.train.Server object? And my code is sth like sess = sv.prepare_or_wait_for_session(config=tf.ConfigProto(gpu_options=gpu_options)). So where should I pass the config object to? TIA!

andydavis1 assigned mrry Jun 27, 2016

andydavis1 added the triaged label Jun 27, 2016

vrv closed this as completed in 380e801 Jun 28, 2016

daeyun mentioned this issue Sep 3, 2016

per_process_gpu_memory_fraction didn`t work #2471

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use gpu_memory_fraction in while using distributed tensorflow #3057

Use gpu_memory_fraction in while using distributed tensorflow #3057

carpedm20 commented Jun 27, 2016 •

edited

mrry commented Jun 27, 2016

ghost commented Jun 30, 2017

Use gpu_memory_fraction in while using distributed tensorflow #3057

Use gpu_memory_fraction in while using distributed tensorflow #3057

Comments

carpedm20 commented Jun 27, 2016 • edited

mrry commented Jun 27, 2016

ghost commented Jun 30, 2017

carpedm20 commented Jun 27, 2016 •

edited