Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use gpu_memory_fraction in while using distributed tensorflow #3057

Closed
carpedm20 opened this issue Jun 27, 2016 · 2 comments
Closed

Use gpu_memory_fraction in while using distributed tensorflow #3057

carpedm20 opened this issue Jun 27, 2016 · 2 comments
Assignees

Comments

@carpedm20
Copy link
Contributor

carpedm20 commented Jun 27, 2016

While I'm implementing models in A3C, I tried to allocate a fraction of gpu to a process and processes that use multiple fractions of gpus update a parameters of a parameter server (in a single machine). For example, I want to create 12 workers with 3 gpus to update a master model.

I referenced https://www.tensorflow.org/versions/r0.9/how_tos/distributed/index.html and used tf.GPUOptions(per_process_gpu_memory_fraction=0.1) but it doesn't work when we pass it to sv.managed_session (just take all of the memory of the first visible gpu).

Also, I can't find how to give GPUOptions to parameter server that does not have session creation (I always pass GPUoptions to tf.Session which similar to sv.managed_session). How can I allocate a specific fraction of a gpu to each tasks including parameter server and workers?

Code can be found https://github.com/devsisters/DQN-tensorflow/blob/distributed/main.py#L78.

@mrry
Copy link
Contributor

mrry commented Jun 27, 2016

It's possible to do this today, but the interface isn't very intuitive. TL;DR: The GPUOptions in the session creation will be ignored, and you have to pass them when you create the tf.train.Server objects to which these settings apply. (The reason for this is that the GPU device is created when you create the server, not when you create the session. When you use non-distributed TensorFlow, the device is created when you create the session.)

To set this option then, you have to set the tf.ServerDef.default_session_config.gpu_options.per_process_gpu_memory_fraction field when you create the server. Currently you can only do that if you build the tf.train.ServerDef yourself. I'll add a interface to let you override the tf.ConfigProto on its own while still using the Pythonic sugar for defining a cluster, and use that to close this issue.

@ghost
Copy link

ghost commented Jun 30, 2017

@mrry What if I didn't define any tf.train.Server object? And my code is sth like sess = sv.prepare_or_wait_for_session(config=tf.ConfigProto(gpu_options=gpu_options)). So where should I pass the config object to? TIA!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants