[INTEL MKL] Execute small gemm's single threaded. #47577

Srini511 · 2021-03-04T22:08:24Z

This PR introduces a simple cost metric to determine if dnnl_sgemm needs to be run with 1 thread or the whole threadpool.

Please note: Please dont merge this PR until #47543 is merged

google-cla · 2021-03-09T19:50:52Z

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

Srini511 · 2021-03-09T19:53:22Z

@penpornk I tried to fix the merge conflict online and it has cancelled the cla check. Can you please help resolve this?

Srini511 · 2021-03-09T21:01:46Z

@googlebot I fixed it

google-cla · 2021-03-09T21:02:09Z

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

penpornk · 2021-03-09T21:25:38Z

Manually changing CLA to yes because the two commits in this PR are from the same github account which has CLA.

penpornk

Thank you for the PR and I'm sorry for the delay!

penpornk · 2021-03-19T23:56:00Z

tensorflow/core/kernels/mkl/mkl_matmul_op.cc

+    // the kernel single threaded. Here we are coming up with a cost model based
+    // on based on L1 sizes. If we find that matrices are small enough, we will


Nit: based on appears twice.

Suggested change

// the kernel single threaded. Here we are coming up with a cost model based

// on based on L1 sizes. If we find that matrices are small enough, we will

// the kernel single threaded. Here we are coming up with a cost model based

// on L1 sizes. If we find that matrices are small enough, we will

penpornk · 2021-03-19T23:57:57Z

tensorflow/core/kernels/mkl/mkl_matmul_op.cc

+    // the kernel single threaded. Here we are coming up with a cost model based
+    // on based on L1 sizes. If we find that matrices are small enough, we will
+    // execute single threaded. This may need tuning.
+    bool single_threaded = ExecuteSingleThreadedGemm(m, n, k);


Nit: The call is short enough and the result is only used once. I don't think we need a variable for it.

penpornk · 2021-03-19T23:59:15Z

tensorflow/core/kernels/mkl/mkl_matmul_op.cc

+    if (!single_threaded) {
+      dnnl::threadpool_interop::sgemm(char_transa, char_transb, m, n, k, alpha,
+                                      a, lda, b, ldb, beta, c, ldc, &eigen_tp);
+    } else {
+      // for now call single threaded gemm.
+      dnnl::threadpool_interop::sgemm(char_transa, char_transb, m, n, k, alpha,
+                                      a, lda, b, ldb, beta, c, ldc, nullptr);
+    }


Nit: Can we swap the logic? E.g., start with single-threaded first so we don't have to negate the condition?

if (ExecuteSingleThreadedGemm(m, n, k)) { ... // nullptr for thread pool } else { ... // &eigen_tp }

penpornk · 2021-03-20T00:09:05Z

tensorflow/core/kernels/mkl/mkl_matmul_ops_common.h


 namespace tensorflow {

+#define L1_SIZE 32 * 1024


Just FYI, TensorFlow can read cache sizes through Eigen, although that may come with some latency costs.

tensorflow/tensorflow/core/kernels/conv_grad_ops_3d.cc

Line 395 in add0a78

const auto cache_sizes = Eigen::internal::CacheSizes();

penpornk · 2021-03-20T00:11:18Z

tensorflow/core/kernels/mkl/mkl_matmul_ops_common.h

+  // a heuristic but what we are targetting are very small models whose
+  // total size is < few L1's. So we will do this simple calculation
+  // to determine if the MM should be run on a single thread.
+  return ((sizeof(float) * (m * n + k * (m + n))) < L1_SIZE * 8);


Just curious, why is L1_SiZE multiplied by 8? If it's some magic constant, please say so in the comment (in the code).

penpornk

@Srini511 Since the branch cut is in two days, I'll just pull the PR in and make the modifications myself to save time. I'd appreciate if you could help answer the question about L1_SIZE * 8 later when you have time though. :)

Srini511 · 2021-03-23T16:33:53Z

@penpornk , Yes 8 is just a magic number that worked with internal workloads. I had made the changes internally and was testing them on specific models. Thanks!

execute small gemm's single threaded.

ece3721

Srini511 requested a review from penpornk as a code owner March 4, 2021 22:08

google-ml-butler bot added the size:S CL Change Size: Small label Mar 4, 2021

google-cla bot added the cla: yes label Mar 4, 2021

gbaned self-assigned this Mar 5, 2021

gbaned added the comp:mkl MKL related issues label Mar 5, 2021

Merge branch 'master' into sriniva2/small_matmul_cost

96930b5

google-cla bot added cla: no and removed cla: yes labels Mar 9, 2021

penpornk added cla: yes and removed cla: no labels Mar 9, 2021

gbaned added the awaiting review Pull request awaiting review label Mar 19, 2021

penpornk requested changes Mar 20, 2021

View reviewed changes

penpornk removed the awaiting review Pull request awaiting review label Mar 20, 2021

penpornk approved these changes Mar 23, 2021

View reviewed changes

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Mar 23, 2021

kokoro-team removed the kokoro:force-run Tests on submitted change label Mar 23, 2021

copybara-service bot merged commit 178d7f8 into tensorflow:master Mar 24, 2021

		// the kernel single threaded. Here we are coming up with a cost model based
		// on based on L1 sizes. If we find that matrices are small enough, we will

[INTEL MKL] Execute small gemm's single threaded. #47577

[INTEL MKL] Execute small gemm's single threaded. #47577

Uh oh!

Conversation

Srini511 commented Mar 4, 2021

Uh oh!

google-cla bot commented Mar 9, 2021

Uh oh!

Srini511 commented Mar 9, 2021

Uh oh!

Srini511 commented Mar 9, 2021

Uh oh!

google-cla bot commented Mar 9, 2021

Uh oh!

penpornk commented Mar 9, 2021

Uh oh!

penpornk left a comment

Choose a reason for hiding this comment

Uh oh!

penpornk Mar 19, 2021

Choose a reason for hiding this comment

Uh oh!

penpornk Mar 19, 2021

Choose a reason for hiding this comment

Uh oh!

penpornk Mar 19, 2021

Choose a reason for hiding this comment

Uh oh!

penpornk Mar 20, 2021

Choose a reason for hiding this comment

Uh oh!

penpornk Mar 20, 2021

Choose a reason for hiding this comment

Uh oh!

penpornk left a comment

Choose a reason for hiding this comment

Uh oh!

Srini511 commented Mar 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants