[INTEL MKL] Enabling TF + OneDNN in stock TF build #47745

gzmkl · 2021-03-12T00:11:29Z

Refactor code to do the following:

code that is included in stock TF build will be guarded by macro INTEL_MKL
code that is included only when --config=mkl is used is now guarded by macro ENABLE_MKL
when building stock TF for x86 Linux/Windows the macro INTEL_MKL will be defined, and won't be defined for other architectures.
Added runtime env variable ENABLE_ONEDNN_OPTS that will enable oneDNN optimizations in stock TF.
for config=mkl, ENABLE_MKL will be defined.

This PR assumes

a) oneDNN is upgraded to 2.1 (PR #: #47743) and

b) stock TF and TF+oneDNN are using same oneDNN build (#47679)

gbaned · 2021-03-17T14:12:52Z

@gzmkl Can you please resolve conflicts? Thanks!

gzmkl · 2021-03-18T21:06:58Z

@gbaned The merge conflicts were related to oneDNN upgrade PR. They are gone after the upgrade PR is merged

penpornk

Thank you very much for the PR!

tensorflow/core/common_runtime/threadpool_device.cc

tensorflow/core/common_runtime/eager/mkl_eager_op_rewrite_test.cc

tensorflow/core/common_runtime/mkl_tfconversion_pass.cc

tensorflow/core/common_runtime/process_util.cc

penpornk · 2021-03-19T03:37:52Z

tensorflow/core/common_runtime/threadpool_device.cc

 #ifdef _OPENMP
 #include <omp.h>
 #endif
+#endif


Nit:

Suggested change

#endif

#endif // defined(ENABLE_ONEDNN_OPENMP) && defined(INTEL_MKL)

Yes, we are doing the change. Also replace INTEL_MKL with ENABLE_MKL due to line 35 change.

The comment should be added to line 39 though. #endif in line 38 is for #ifdef _OPENMP. I will change this internally.

tensorflow/core/kernels/mkl/mkl_fused_ops_test.cc

tensorflow/core/util/util.cc

tensorflow/tensorflow.bzl

third_party/mkl/build_defs.bzl

penpornk · 2021-03-19T17:28:23Z

@gzmkl By the way, if any change requires long testing time and you prefer to do it later. Please feel free to just add a TODO instead.

gzmkl · 2021-03-19T18:09:42Z

@penpornk Thank you for the code review. Most code change suggestions are very good and I have done code change.
Local test will take a while but will try to push them to the PR branch today.

penpornk · 2021-03-19T18:53:00Z

@gzmkl Thank you very much for the quick responses! Please take your time. :)
The oneDNN v2.1 upgrade is still not in the clear. It made one test timed out (due to long compilation time) so I submitted a fix this morning. I'll have to wait for tonight's nightly test results to make sure v2.1 will stick. The soonest this PR can be merged is tomorrow (if nothing else goes wrong).

gzmkl · 2021-03-19T22:36:12Z

@penpornk I have committed changes per your suggestions. Please let me know if there is anything missing.
Thanks a lot.

penpornk

Thank you for the quick response! Several of my comments are still not addressed. But I can just make the rest of the minor modifications (and add TODOs for refactors) myself. I'm going to pull the PR in to test now.

There are two comments that I need answers though. Could you please reply to these questions? Q1, Q2

penpornk · 2021-03-19T23:19:56Z

tensorflow/core/common_runtime/threadpool_device.cc

 #ifdef _OPENMP
 #include <omp.h>
 #endif
+#endif


The comment should be added to line 39 though. #endif in line 38 is for #ifdef _OPENMP. I will change this internally.

tensorflow/core/util/util.cc

gbaned · 2021-03-22T11:24:36Z

@gzmkl Can you please check @penpornk's comments and keep us posted ? Thanks!

gzmkl · 2021-03-22T16:39:52Z

@gzmkl Can you please check @penpornk's comments and keep us posted ? Thanks!

Yes, I just did that. Thanks!

PiperOrigin-RevId: 364470371 Change-Id: I6ff0e6bec13cb9dd4bf7396c22495c505f801cb1

penpornk · 2021-03-23T04:29:56Z

@gzmkl The changes from this PR have been merged in 4a24610. Somehow Github doesn't mark it as merged. I'm closing this PR now. Thank you again for the PR! :)

ScottTodd · 2021-03-23T21:12:37Z

Hey, I'm noticing some build failures in a downstream project (IREE) after this was merged. Full logs are here. Maybe we're using different flags/compilers to build.

Relevant logs snippet:

ERROR: /home/kbuilder/.cache/bazel/_bazel_kbuilder/1900d0fac5123d725c8d2e08b3f8c209/external/org_tensorflow/tensorflow/core/kernels/mkl/BUILD:299:22: C++ compilation of rule '@org_tensorflow//tensorflow/core/kernels/mkl:mkl_softmax_op' failed (Exit 1): clang failed: error executing command /usr/bin/clang -U_FORTIFY_SOURCE -fstack-protector -Wall -Wthread-safety -Wself-assign -fcolor-diagnostics -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 262 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox clang failed: error executing command /usr/bin/clang -U_FORTIFY_SOURCE -fstack-protector -Wall -Wthread-safety -Wself-assign -fcolor-diagnostics -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 262 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox
In file included from external/org_tensorflow/tensorflow/core/kernels/mkl/mkl_softmax_op.cc:22:
In file included from external/org_tensorflow/tensorflow/core/framework/numeric_op.h:19:
In file included from external/org_tensorflow/tensorflow/core/framework/op_kernel.h:24:
In file included from external/org_tensorflow/tensorflow/core/framework/allocator.h:28:
In file included from external/org_tensorflow/tensorflow/core/platform/logging.h:27:
external/org_tensorflow/tensorflow/core/platform/default/logging.h:280:9: error: call to function 'operator<<' that is neither visible in the template definition nor found by argument-dependent lookup
  (*os) << v;
        ^
external/org_tensorflow/tensorflow/core/platform/default/logging.h:339:3: note: in instantiation of function template specialization 'tensorflow::internal::MakeCheckOpValueString<dnnl::memory::format_tag>' requested here
  MakeCheckOpValueString(comb.ForVar1(), v1);
  ^
external/org_tensorflow/tensorflow/core/platform/default/logging.h:384:1: note: in instantiation of function template specialization 'tensorflow::internal::MakeCheckOpString<dnnl::memory::format_tag, dnnl::memory::format_tag>' requested here
TF_DEFINE_CHECK_OP_IMPL(Check_NE, !=)  // Use CHECK(x == NULL) instead.
^
external/org_tensorflow/tensorflow/core/platform/default/logging.h:357:38: note: expanded from macro 'TF_DEFINE_CHECK_OP_IMPL'
      return ::tensorflow::internal::MakeCheckOpString(v1, v2, exprtext); \
                                     ^
external/org_tensorflow/tensorflow/core/util/mkl_util.h:472:7: note: in instantiation of function template specialization 'tensorflow::internal::Check_NEImpl<dnnl::memory::format_tag, dnnl::memory::format_tag>' requested here
      DCHECK_NE(format_tag, memory::format_tag::undef);
      ^
external/org_tensorflow/tensorflow/core/platform/default/logging.h:418:31: note: expanded from macro 'DCHECK_NE'
#define DCHECK_NE(val1, val2) CHECK_NE(val1, val2)
                              ^
external/org_tensorflow/tensorflow/core/platform/default/logging.h:405:30: note: expanded from macro 'CHECK_NE'
#define CHECK_NE(val1, val2) CHECK_OP(Check_NE, !=, val1, val2)
                             ^
external/org_tensorflow/tensorflow/core/platform/default/logging.h:401:40: note: expanded from macro 'CHECK_OP'
#define CHECK_OP(name, op, val1, val2) CHECK_OP_LOG(name, op, val1, val2)
                                       ^
external/org_tensorflow/tensorflow/core/platform/default/logging.h:395:31: note: expanded from macro 'CHECK_OP_LOG'
      ::tensorflow::internal::name##Impl(                      \
                              ^
<scratch space>:124:1: note: expanded from here
Check_NEImpl
^
external/org_tensorflow/tensorflow/core/util/mkl_util.h:184:22: note: 'operator<<' should be declared prior to the call site or in namespace 'dnnl'
inline std::ostream& operator<<(std::ostream& os,
                     ^
1 error generated.

This is the problematic line:

tensorflow/tensorflow/core/util/mkl_util.h

Line 472 in a45aa6f

DCHECK_NE(format_tag, memory::format_tag::undef);

This can be fixed by either removing that line or moving this operator<< from namespace tensorflow to namespace dnnl:

tensorflow/tensorflow/core/util/mkl_util.h

Lines 184 to 194 in a45aa6f

    
           inline std::ostream& operator<<(std::ostream& os, 
        
                                           const memory::format_tag& tag) { 
        
             if (tag == memory::format_tag::undef) { 
        
               os << "undef"; 
        
             } else if (tag == memory::format_tag::any) { 
        
               os << "any"; 
        
             } else { 
        
               os << "invalid"; 
        
             } 
        
             return os; 
        
           }

How do you want to proceed?

penpornk · 2021-03-23T21:31:41Z

@ScottTodd Sorry about the break and thank you for proposing the fix!
I'll leave this to @gzmkl and @agramesh1 to decide. I think making changes in TensorFlow is probably easier than in oneDNN (namespace dnnl is in an external library.)

agramesh1 · 2021-03-23T21:35:35Z

@penpornk thanks, we are looking at it now.

ScottTodd · 2021-03-23T21:35:45Z

@ScottTodd Sorry about the break and thank you for proposing the fix!
I'll leave this to @gzmkl and @agramesh1 to decide. I think making changes in TensorFlow is probably easier than in oneDNN (namespace dnnl is in an external library.)

Changes don't need to be made in the external library, simply putting

namespace dnnl {
inline std::ostream& operator<<(std::ostream& os,
                                const memory::format_tag& tag) {
  if (tag == memory::format_tag::undef) {
    os << "undef";
  } else if (tag == memory::format_tag::any) {
    os << "any";
  } else {
    os << "invalid";
  }
  return os;
}
}  // namespace

above namespace tensorflow:

tensorflow/tensorflow/core/util/mkl_util.h

Lines 52 to 56 in a45aa6f

    
           #ifdef _WIN32 
        
           typedef unsigned int uint; 
        
           #endif 
        
           namespace tensorflow {

works when I build locally. (I think memory::format_tag is inlined without being fully qualified as dnnl::memory::format_tag, the macros aren't compiling as expected)

That's just one solution though. Not sure what works best for your projects.

gzmkl · 2021-03-23T21:49:33Z

After discussing with team and reviewing the related code in mkl_util.h, removing the related line of code is the right choice.
We will submit an PR.

gzmkl · 2021-03-23T22:56:42Z

@ScottTodd @penpornk A PR to fix the issue has been submitted here #48030.
Thanks!

Enabling TF + OneDNN in stock TF build

341cb48

gzmkl requested review from kkimdev, penpornk and qqfish as code owners March 12, 2021 00:11

google-ml-butler bot added the size:L CL Change Size: Large label Mar 12, 2021

google-cla bot added the cla: yes label Mar 12, 2021

gbaned self-assigned this Mar 12, 2021

gbaned added the comp:mkl MKL related issues label Mar 12, 2021

gbaned added this to Assigned Reviewer in PR Queue via automation Mar 12, 2021

gzmkl added 3 commits March 12, 2021 09:51

fix clang issue

2c163bc

fix a typo introdued during clang fix

7528a4f

fix sanity checks

abc9309

alenik01 mentioned this pull request Mar 15, 2021

Update of --config=mkl_aarch64 with Compute Library support #47775

Merged

Merge branch 'master' into ag_sbin

5cf8b59

gbaned added the stat:awaiting response Status - Awaiting response from author label Mar 17, 2021

penpornk added the kokoro:force-run Tests on submitted change label Mar 18, 2021

kokoro-team removed the kokoro:force-run Tests on submitted change label Mar 18, 2021

penpornk requested changes Mar 19, 2021

View reviewed changes

PR Queue automation moved this from Assigned Reviewer to Reviewer Requested Changes Mar 19, 2021

code change per code review suggestions

c8c5233

PR Queue automation moved this from Reviewer Requested Changes to Approved by Reviewer Mar 19, 2021

penpornk approved these changes Mar 19, 2021

View reviewed changes

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Mar 19, 2021

kokoro-team removed the kokoro:force-run Tests on submitted change label Mar 19, 2021

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Mar 21, 2021

gbaned removed the ready to pull PR ready for merge process label Mar 22, 2021

fix incorrect comment

9ce67eb

agramesh1 mentioned this pull request Mar 23, 2021

[INTEL MKL] Disable group conv test. #47997

Merged

copybara-service bot pushed a commit that referenced this pull request Mar 23, 2021

Merge pull request #47745 from Intel-tensorflow:ag_sbin

4a24610

PiperOrigin-RevId: 364470371 Change-Id: I6ff0e6bec13cb9dd4bf7396c22495c505f801cb1

penpornk closed this Mar 23, 2021

PR Queue automation moved this from Approved by Reviewer to Closed/Rejected Mar 23, 2021

gzmkl mentioned this pull request Mar 23, 2021

[INTEL MKL] Fix a build error within mkl_util #48030

Merged

netfs mentioned this pull request Mar 25, 2021

fix broken bazel macros, to fix TF serving build. #48083

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[INTEL MKL] Enabling TF + OneDNN in stock TF build #47745

[INTEL MKL] Enabling TF + OneDNN in stock TF build #47745

gzmkl commented Mar 12, 2021 •

edited

gbaned commented Mar 17, 2021

gzmkl commented Mar 18, 2021

penpornk left a comment

penpornk Mar 19, 2021

gzmkl Mar 19, 2021

penpornk Mar 19, 2021

gzmkl Mar 22, 2021

penpornk commented Mar 19, 2021

gzmkl commented Mar 19, 2021

penpornk commented Mar 19, 2021

gzmkl commented Mar 19, 2021

penpornk left a comment

penpornk Mar 19, 2021

gbaned commented Mar 22, 2021

gzmkl commented Mar 22, 2021

penpornk commented Mar 23, 2021 •

edited

ScottTodd commented Mar 23, 2021

penpornk commented Mar 23, 2021

agramesh1 commented Mar 23, 2021

ScottTodd commented Mar 23, 2021

gzmkl commented Mar 23, 2021

gzmkl commented Mar 23, 2021

	#endif
	#endif // defined(ENABLE_ONEDNN_OPENMP) && defined(INTEL_MKL)

[INTEL MKL] Enabling TF + OneDNN in stock TF build #47745

[INTEL MKL] Enabling TF + OneDNN in stock TF build #47745

Conversation

gzmkl commented Mar 12, 2021 • edited

gbaned commented Mar 17, 2021

gzmkl commented Mar 18, 2021

penpornk left a comment

Choose a reason for hiding this comment

penpornk Mar 19, 2021

Choose a reason for hiding this comment

gzmkl Mar 19, 2021

Choose a reason for hiding this comment

penpornk Mar 19, 2021

Choose a reason for hiding this comment

gzmkl Mar 22, 2021

Choose a reason for hiding this comment

penpornk commented Mar 19, 2021

gzmkl commented Mar 19, 2021

penpornk commented Mar 19, 2021

gzmkl commented Mar 19, 2021

penpornk left a comment

Choose a reason for hiding this comment

penpornk Mar 19, 2021

Choose a reason for hiding this comment

gbaned commented Mar 22, 2021

gzmkl commented Mar 22, 2021

penpornk commented Mar 23, 2021 • edited

ScottTodd commented Mar 23, 2021

penpornk commented Mar 23, 2021

agramesh1 commented Mar 23, 2021

ScottTodd commented Mar 23, 2021

gzmkl commented Mar 23, 2021

gzmkl commented Mar 23, 2021

gzmkl commented Mar 12, 2021 •

edited

penpornk commented Mar 23, 2021 •

edited