You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While I'm implementing models in A3C, I tried to allocate a fraction of gpu to a process and processes that use multiple fractions of gpus update a parameters of a parameter server (in a single machine). For example, I want to create 12 workers with 3 gpus to update a master model.
Also, I can't find how to give GPUOptions to parameter server that does not have session creation (I always pass GPUoptions to tf.Session which similar to sv.managed_session). How can I allocate a specific fraction of a gpu to each tasks including parameter server and workers?
It's possible to do this today, but the interface isn't very intuitive. TL;DR: The GPUOptions in the session creation will be ignored, and you have to pass them when you create the tf.train.Server objects to which these settings apply. (The reason for this is that the GPU device is created when you create the server, not when you create the session. When you use non-distributed TensorFlow, the device is created when you create the session.)
To set this option then, you have to set the tf.ServerDef.default_session_config.gpu_options.per_process_gpu_memory_fraction field when you create the server. Currently you can only do that if you build the tf.train.ServerDef yourself. I'll add a interface to let you override the tf.ConfigProto on its own while still using the Pythonic sugar for defining a cluster, and use that to close this issue.
@mrry What if I didn't define any tf.train.Server object? And my code is sth like sess = sv.prepare_or_wait_for_session(config=tf.ConfigProto(gpu_options=gpu_options)). So where should I pass the config object to? TIA!
While I'm implementing models in A3C, I tried to allocate a fraction of gpu to a process and processes that use multiple fractions of gpus update a parameters of a parameter server (in a single machine). For example, I want to create 12 workers with 3 gpus to update a master model.
I referenced https://www.tensorflow.org/versions/r0.9/how_tos/distributed/index.html and used
tf.GPUOptions(per_process_gpu_memory_fraction=0.1)
but it doesn't work when we pass it tosv.managed_session
(just take all of the memory of the first visible gpu).Also, I can't find how to give
GPUOptions
to parameter server that does not have session creation (I always passGPUoptions
totf.Session
which similar tosv.managed_session
). How can I allocate a specific fraction of a gpu to each tasks including parameter server and workers?Code can be found https://github.com/devsisters/DQN-tensorflow/blob/distributed/main.py#L78.
The text was updated successfully, but these errors were encountered: