Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread limits are not respected in latest tf-nightly build. #18300

Closed
robieta opened this issue Apr 6, 2018 · 5 comments
Closed

Thread limits are not respected in latest tf-nightly build. #18300

robieta opened this issue Apr 6, 2018 · 5 comments
Assignees

Comments

@robieta
Copy link

robieta commented Apr 6, 2018

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
    Yes. My minimal example is as follows:
import tensorflow as tf

config = tf.ConfigProto(
    inter_op_parallelism_threads=1,
    intra_op_parallelism_threads=1,
)

with tf.Session(config=config) as sess:
  a = tf.random_uniform((25000, 25000))

  a_sq = tf.matmul(a, a)
  a_sq.eval()
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
    Goobuntu desktop.
  • TensorFlow installed from (source or binary):
    tf-nightly (see below)
  • Python version:
    3.5.3

Describe the problem

Describe the problem clearly here. Be sure to convey here why it's a bug in TensorFlow or a feature request.

When I run the minimal example provided above with tf-nightly==1.8.0.dev20180404, everything works correctly. It saturates one CPU core and leaves the rest unused. However, when run with tf-nightly==1.8.0.dev20180405 TensorFlow consumes all 8 cores.

@mrry
Copy link
Contributor

mrry commented Apr 6, 2018

While driving by, I noticed that the threads are being created when the C API's TF_Graph is being constructed, with a stack like this:

#0  __pthread_create_2_1 (newthread=0x5555566085d8, attr=0x0, start_routine=0x7fffd0ac0b10, arg=0x5555561f70f8) at pthread_create.c:511
#1  0x00007fffd0ac0c4c in std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>, void (*)()) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00007fffd0ac0d51 in std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007fffd1512720 in tensorflow::(anonymous namespace)::PosixEnv::StartThread(tensorflow::ThreadOptions const&, std::string const&, std::function<void ()>) ()
   from /usr/local/google/home/mrry/tf-nightly/local/lib/python2.7/site-packages/tensorflow/python/../libtensorflow_framework.so
#4  0x00007fffd14e4467 in tensorflow::thread::ThreadPool::ThreadPool(tensorflow::Env*, tensorflow::ThreadOptions const&, std::string const&, int, bool) ()
   from /usr/local/google/home/mrry/tf-nightly/local/lib/python2.7/site-packages/tensorflow/python/../libtensorflow_framework.so
#5  0x00007fffd14e4710 in tensorflow::thread::ThreadPool::ThreadPool(tensorflow::Env*, std::string const&, int) ()
   from /usr/local/google/home/mrry/tf-nightly/local/lib/python2.7/site-packages/tensorflow/python/../libtensorflow_framework.so
#6  0x00007fffd188dfdd in tensorflow::LocalDevice::EigenThreadPoolInfo::EigenThreadPoolInfo(tensorflow::SessionOptions const&) ()
   from /usr/local/google/home/mrry/tf-nightly/local/lib/python2.7/site-packages/tensorflow/python/../libtensorflow_framework.so
#7  0x00007fffd188e158 in tensorflow::LocalDevice::LocalDevice(tensorflow::SessionOptions const&, tensorflow::DeviceAttributes const&) ()
   from /usr/local/google/home/mrry/tf-nightly/local/lib/python2.7/site-packages/tensorflow/python/../libtensorflow_framework.so
#8  0x00007fffd18b475b in tensorflow::ThreadPoolDevice::ThreadPoolDevice(tensorflow::SessionOptions const&, std::string const&, tensorflow::gtl::IntType<tensorflow::Bytes_tag_, long long>, tensorflow::DeviceLocality const&, tensorflow::Allocator*) ()
   from /usr/local/google/home/mrry/tf-nightly/local/lib/python2.7/site-packages/tensorflow/python/../libtensorflow_framework.so
#9  0x00007fffd18b4e06 in tensorflow::ThreadPoolDeviceFactory::CreateDevices(tensorflow::SessionOptions const&, std::string const&, std::vector<tensorflow::Device*, std::allocator<tensorflow::Device*> >*) () from /usr/local/google/home/mrry/tf-nightly/local/lib/python2.7/site-packages/tensorflow/python/../libtensorflow_framework.so
#10 0x00007fffd188bd3a in tensorflow::(anonymous namespace)::GetCPUDevice(tensorflow::Env*) ()
   from /usr/local/google/home/mrry/tf-nightly/local/lib/python2.7/site-packages/tensorflow/python/../libtensorflow_framework.so
#11 0x00007fffd188bf11 in tensorflow::GraphRunner::GraphRunner(tensorflow::Env*) ()
   from /usr/local/google/home/mrry/tf-nightly/local/lib/python2.7/site-packages/tensorflow/python/../libtensorflow_framework.so
#12 0x00007fffd5aa13f7 in tensorflow::ShapeRefiner::ShapeRefiner(int, tensorflow::OpRegistryInterface const*) ()
   from /usr/local/google/home/mrry/tf-nightly/local/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#13 0x00007fffd33d197d in TF_Graph::TF_Graph() () from /usr/local/google/home/mrry/tf-nightly/local/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#14 0x00007fffd33d1a8e in TF_NewGraph () from /usr/local/google/home/mrry/tf-nightly/local/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#15 0x00007fffd3036169 in _wrap_TF_NewGraph () from /usr/local/google/home/mrry/tf-nightly/local/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
[Python guff]

It looks like that call to GetCPUDevice() (frame 10) is default initializing a SessionOptions and using it to create the CPU device's threadpool. Since the threadpools are static by default, by the time you come to create the session, it's too late. Strangely, adding use_per_session_threads=True to the ConfigProto didn't fix it for me. Setting the environment variable TF_C_API_GRAPH_CONSTRUCTION=0 did reduce the CPU usage to the intended single core.

/cc @skye

@skye
Copy link
Member

skye commented Apr 10, 2018

This should be fixed now.

@skye skye closed this as completed Apr 10, 2018
@reedwm reedwm reopened this Apr 10, 2018
@skye
Copy link
Member

skye commented Apr 18, 2018

9b18bd7

@skye skye closed this as completed Apr 18, 2018
@skye
Copy link
Member

skye commented Apr 18, 2018

@robieta thanks for the clear repro for this problem, it was very helpful!

@robieta
Copy link
Author

robieta commented Apr 19, 2018

My pleasure. Thanks for getting everything sorted out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants