New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unified mechanism for setting process-level settings #8136
Comments
Thanks for filing this feature-request issue @yaroslavvb, and the great analysis! Assigning to @mrry since he knows about Session subtleties much better than I do. |
It turns out there are some even subtler subtleties related to the ownership of GPU devices, allocators, and streams, which ought to be solved before we change anything else about configuration. Assigning this to @zheng-xq and @poxvoculi, who're looking into the GPU issues. |
It has been 14 days with no activity and this issue has an assignee.Please update the label and/or status accordingly. |
I agree with Derek that these are indeed subtle issues. We are systematically changing devices, allocators and streams to global resources. But the API still at the session level for backward compatibility. Closing this one for now. Feel free to reopen if someone wants to contribute a new design. |
Another kind of failure: # running on a machine with >1 GPUs
print(tf.test.is_gpu_available())
# or call list_devices()
cfg = tf.ConfigProto()
cfg.gpu_options.visible_device_list = '1'
sess = tf.Session(config=cfg) # fail This is quite annoying: some line of code executed earlier leads to an error later with strange error message. |
same question for me, thanks |
It's so tricky in the tensorflow session. |
WRT to this problem, moving the IMO, one of two corrections should be used.
|
Some settings in TensorFlow apply to all sessions in the process. Examples: size of Eigen thread-pool, allocator growth strategy, logging verbosity
There are currently two places where such process properties are set:
the 1. lacks discoverability. For instance required SM count to make GPU visible to TensorFlow is set through
TF_MIN_GPU_MULTIPROCESSOR_COUNT
which is not documented outside ofgpu_device.cc
. Additionally, it has unclear semantics. When does changingTF_CPP_MIN_VLOG_LEVEL
environment variable have an effect on logging? Empirically, changing it afterimport tf
has an effect, changing it after firsttf.Session
call has no effect.the 2. leads to confusion when you specify conflicting settings. For instance, in #4455 the user was confused that
config=tf.ConfigProto(intra_op_parallelism_threads=1
had no effect. The reason is thatintra_op_parallelism_threads
specifies the size of process global ThreadPool, and this setting was already fixed when user calledtf.Server
earlier. (we also ran into this issue on our deployment)cc @mrry
assigning to @tatatodd for triage since he asked me to file this issue
The text was updated successfully, but these errors were encountered: