-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setting default openmp settings for MKL kernels #19136
Changes from all commits
f57d78c
b3e956b
2ef8e85
ec8db8f
bca25ac
ebfb36f
deb7aa1
a67ef80
a7f096a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
/* Copyright 2018 The TensorFlow Authors. All Rights Reserved. | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
==============================================================================*/ | ||
|
||
#ifdef INTEL_MKL | ||
|
||
#include "tensorflow/core/common_runtime/threadpool_device.h" | ||
|
||
#include "tensorflow/core/lib/core/status_test_util.h" | ||
#include "tensorflow/core/platform/cpu_info.h" | ||
#include "tensorflow/core/platform/logging.h" | ||
#include "tensorflow/core/platform/test.h" | ||
#include "tensorflow/core/public/session_options.h" | ||
|
||
namespace tensorflow { | ||
|
||
#ifdef _OPENMP | ||
TEST(MKLThreadPoolDeviceTest, TestOmpDefaults) { | ||
SessionOptions options; | ||
unsetenv("OMP_NUM_THREADS"); | ||
|
||
ThreadPoolDevice* tp = new ThreadPoolDevice( | ||
options, "/device:CPU:0", Bytes(256), DeviceLocality(), cpu_allocator()); | ||
|
||
const int ht = port::NumHyperthreadsPerCore(); | ||
EXPECT_EQ(omp_get_max_threads(), (port::NumSchedulableCPUs() + ht - 1) / ht); | ||
} | ||
|
||
TEST(MKLThreadPoolDeviceTest, TestOmpPreSets) { | ||
SessionOptions options; | ||
setenv("OMP_NUM_THREADS", "314", 1); | ||
|
||
ThreadPoolDevice* tp = new ThreadPoolDevice( | ||
options, "/device:CPU:0", Bytes(256), DeviceLocality(), cpu_allocator()); | ||
|
||
EXPECT_EQ(omp_get_max_threads(), 314); | ||
} | ||
#endif // _OPENMP | ||
|
||
} // namespace tensorflow | ||
|
||
#endif // INTEL_MKL |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -31,7 +31,11 @@ limitations under the License. | |
#include "tensorflow/core/public/session_options.h" | ||
|
||
#ifdef INTEL_MKL | ||
#ifdef _OPENMP | ||
#include <omp.h> | ||
#endif | ||
#include "tensorflow/core/common_runtime/mkl_cpu_allocator.h" | ||
#include "tensorflow/core/platform/cpu_info.h" | ||
#endif | ||
|
||
namespace tensorflow { | ||
|
@@ -43,7 +47,26 @@ ThreadPoolDevice::ThreadPoolDevice(const SessionOptions& options, | |
: LocalDevice(options, Device::BuildDeviceAttributes( | ||
name, DEVICE_CPU, memory_limit, locality)), | ||
allocator_(allocator), | ||
scoped_allocator_mgr_(new ScopedAllocatorMgr(name)) {} | ||
scoped_allocator_mgr_(new ScopedAllocatorMgr(name)) { | ||
#ifdef INTEL_MKL | ||
#ifdef _OPENMP | ||
const char* user_omp_threads = getenv("OMP_NUM_THREADS"); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. std::getenv is only thread-safe with other getenv but not setenv and unsetenv. Some other parts of TensorFlow use setenv and unsetenv, like Jayaram said. Just a note that we might want to look into this in the future. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As long as {set,unset,put,clear}env only happen in tests, I think this is fine: even if the tests are not single-threaded, the only damage that can be done is to the tests themselves. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oops. Sorry. For some reason I thought core/platform/posix/subprocess.cc is using setenv, but it actually just mentions setenv. Just confirmed that there aren't any {set,unset,put,clear}env calls outside of tests. That's great! :) |
||
if (user_omp_threads == nullptr) { | ||
// OMP_NUM_THREADS controls MKL's intra-op parallelization | ||
// Default to available physical cores | ||
const int mkl_intra_op = port::NumSchedulableCPUs(); | ||
const int ht = port::NumHyperthreadsPerCore(); | ||
omp_set_num_threads((mkl_intra_op + ht - 1) / ht); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From @jktomer: can num_logical_cores not be divisible by num_hyper_threads? In other words, can each physical core have different number of hyper-threads? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In hardware, no but the user can choose to explicitly disable some hyperthreads and not others using OS controls. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you! |
||
} else { | ||
uint64 user_val = 0; | ||
if (strings::safe_strtou64(user_omp_threads, &user_val)) { | ||
// Superflous but triggers OpenMP loading | ||
omp_set_num_threads(user_val); | ||
} | ||
} | ||
#endif // _OPENMP | ||
#endif // INTEL_MKL | ||
} | ||
|
||
ThreadPoolDevice::~ThreadPoolDevice() {} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -344,5 +344,28 @@ int CPUModelNum() { | |
#endif | ||
} | ||
|
||
int CPUIDNumSMT() { | ||
#ifdef PLATFORM_IS_X86 | ||
// https://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration | ||
// https://software.intel.com/en-us/articles/intel-sdm (Vol 3A) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Adding "section Detecting Hardware Multi-threading Support and Topology" after Vol 3A would be helpful. |
||
// Section: Detecting Hardware Multi-threads Support and Topology | ||
// Uses CPUID Leaf 11 to enumerate system topology on Intel x86 architectures | ||
// Other cases not supported | ||
uint32 eax, ebx, ecx, edx; | ||
// Check if system supports Leaf 11 | ||
GETCPUID(eax, ebx, ecx, edx, 0, 0); | ||
if (eax >= 11) { | ||
// 1) Leaf 11 available? CPUID.(EAX=11, ECX=0):EBX != 0 | ||
// 2) SMT_Mask_Width = CPUID.(EAX=11, ECX=0):EAX[4:0] if CPUID.(EAX=11, | ||
// ECX=0):ECX[15:8] is 1 | ||
GETCPUID(eax, ebx, ecx, edx, 11, 0); | ||
if (ebx != 0 && ((ecx & 0xff00) >> 8) == 1) { | ||
return 1 << (eax & 0x1f); // 2 ^ SMT_Mask_Width | ||
} | ||
} | ||
#endif // PLATFORM_IS_X86 | ||
return 0; | ||
} | ||
|
||
} // namespace port | ||
} // namespace tensorflow |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -74,6 +74,11 @@ int NumSchedulableCPUs() { | |
return kDefaultCores; | ||
} | ||
|
||
int NumHyperthreadsPerCore() { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should implement this properly in tensorflow/core/platform/cpu_info.{h,cc}, i.e. query the relevant bit fields in the cpu info). Here's a stack overflow article with example code: https://stackoverflow.com/questions/2901694/programmatically-detect-number-of-physical-processors-cores-or-if-hyper-threadin/2921632#2921632 Hmmm, wait a second, you're the Intel engineer, you probably know how to do this even better :-) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was going to go down the CPUID route but wanted to check if 'hwloc' (https://www.open-mpi.org/projects/hwloc/ BSD license) was acceptable instead. With that we dont have to maintain the code inside tensorflow. If we decide to enumerate cpuid instead, this is the best source so far for Intel CPUs: https://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration). Need to look into what AMD cpus use. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rmlarsen added code to extract SMT contexts per core on modern intel cpus. can you take a look now? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This change appears to have broken my build on Windows. I get LNK2019 on tensorflow::Port::NumHyperthreadsPerCore. Verified that the commit preceding this builds fine. (2058a24). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Me too . This change appears to have broken my build on Windows. I get LNK2019 on tensorflow::Port::NumHyperthreadsPerCore |
||
static const int ht_per_core = tensorflow::port::CPUIDNumSMT(); | ||
return (ht_per_core > 0) ? ht_per_core : 1; | ||
} | ||
|
||
void* AlignedMalloc(size_t size, int minimum_alignment) { | ||
#if defined(__ANDROID__) | ||
return memalign(minimum_alignment, size); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please get rid of anything manipulating the environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are other places in Tensorflow unit tests where unsetenv is being used to ensure we are checking the default paths (
tensorflow/tensorflow/core/platform/cloud/google_auth_provider_test.cc
Line 71 in a3d90a9
Do you have an alternative that I could look into?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find any thread-safe versions of these APIs. Maybe we can put locks around the calls since these are just tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is OK in unit tests.