-
Notifications
You must be signed in to change notification settings - Fork 75k
[INTEL MKL] Execute small gemm's single threaded. #47577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[INTEL MKL] Execute small gemm's single threaded. #47577
Conversation
|
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. ℹ️ Googlers: Go here for more info. |
|
@penpornk I tried to fix the merge conflict online and it has cancelled the cla check. Can you please help resolve this? |
|
@googlebot I fixed it |
|
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. ℹ️ Googlers: Go here for more info. |
|
Manually changing CLA to yes because the two commits in this PR are from the same github account which has CLA. |
penpornk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR and I'm sorry for the delay!
| // the kernel single threaded. Here we are coming up with a cost model based | ||
| // on based on L1 sizes. If we find that matrices are small enough, we will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: based on appears twice.
| // the kernel single threaded. Here we are coming up with a cost model based | |
| // on based on L1 sizes. If we find that matrices are small enough, we will | |
| // the kernel single threaded. Here we are coming up with a cost model based | |
| // on L1 sizes. If we find that matrices are small enough, we will |
| // the kernel single threaded. Here we are coming up with a cost model based | ||
| // on based on L1 sizes. If we find that matrices are small enough, we will | ||
| // execute single threaded. This may need tuning. | ||
| bool single_threaded = ExecuteSingleThreadedGemm(m, n, k); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The call is short enough and the result is only used once. I don't think we need a variable for it.
| if (!single_threaded) { | ||
| dnnl::threadpool_interop::sgemm(char_transa, char_transb, m, n, k, alpha, | ||
| a, lda, b, ldb, beta, c, ldc, &eigen_tp); | ||
| } else { | ||
| // for now call single threaded gemm. | ||
| dnnl::threadpool_interop::sgemm(char_transa, char_transb, m, n, k, alpha, | ||
| a, lda, b, ldb, beta, c, ldc, nullptr); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Can we swap the logic? E.g., start with single-threaded first so we don't have to negate the condition?
if (ExecuteSingleThreadedGemm(m, n, k)) {
... // nullptr for thread pool
} else {
... // &eigen_tp
}|
|
||
| namespace tensorflow { | ||
|
|
||
| #define L1_SIZE 32 * 1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI, TensorFlow can read cache sizes through Eigen, although that may come with some latency costs.
| const auto cache_sizes = Eigen::internal::CacheSizes(); |
| // a heuristic but what we are targetting are very small models whose | ||
| // total size is < few L1's. So we will do this simple calculation | ||
| // to determine if the MM should be run on a single thread. | ||
| return ((sizeof(float) * (m * n + k * (m + n))) < L1_SIZE * 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, why is L1_SiZE multiplied by 8? If it's some magic constant, please say so in the comment (in the code).
penpornk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Srini511 Since the branch cut is in two days, I'll just pull the PR in and make the modifications myself to save time. I'd appreciate if you could help answer the question about L1_SIZE * 8 later when you have time though. :)
|
@penpornk , Yes 8 is just a magic number that worked with internal workloads. I had made the changes internally and was testing them on specific models. Thanks! |
This PR introduces a simple cost metric to determine if dnnl_sgemm needs to be run with 1 thread or the whole threadpool.
Please note: Please dont merge this PR until #47543 is merged