Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Intel MKL] CNMS performance optimization by using priority queue #45934

Merged

Conversation

yimeisun123
Copy link
Contributor

Replace vector with priority queue to improve the performance in CNMS kernel

@google-ml-butler google-ml-butler bot added the size:S CL Change Size: Small label Dec 23, 2020
@google-cla google-cla bot added the cla: yes label Dec 23, 2020
@gbaned gbaned self-assigned this Dec 23, 2020
@gbaned gbaned added the comp:mkl MKL related issues label Dec 23, 2020
@gbaned gbaned added this to Assigned Reviewer in PR Queue via automation Dec 23, 2020
@gbaned gbaned requested a review from penpornk December 23, 2020 13:33
@yimeisun123
Copy link
Contributor Author

Below is the benchmark comparison between baseline and baseline with this PR.

Benchmark Time(ns) per iteration Baseline Time(ns) per iteration PR Baseline/PR Ratio
BM_CombinedNMS_cpu_1_500_25_1 3454 3661 0.943458072
BM_CombinedNMS_cpu_28_500_25_1 51500 52549 0.980037679
BM_CombinedNMS_cpu_32_500_25_1 55597 57057 0.974411553
BM_CombinedNMS_cpu_64_500_25_1 82095 84250 0.974421365
BM_CombinedNMS_cpu_1_1000_25_1 3059 3029 1.009904259
BM_CombinedNMS_cpu_28_1000_25_1 54182 54159 1.000424675
BM_CombinedNMS_cpu_32_1000_25_1 60447 59889 1.009317237
BM_CombinedNMS_cpu_64_1000_25_1 90236 89500 1.008223464
BM_CombinedNMS_cpu_1_1917_25_1 3593 3377 1.063962097
BM_CombinedNMS_cpu_28_1917_25_1 62043 57952 1.070592904
BM_CombinedNMS_cpu_32_1917_25_1 69649 64234 1.084301149
BM_CombinedNMS_cpu_64_1917_25_1 110165 100424 1.096998725
BM_CombinedNMS_cpu_1_2500_25_1 3908 3404 1.148061105
BM_CombinedNMS_cpu_28_2500_25_1 69272 62534 1.107749384
BM_CombinedNMS_cpu_32_2500_25_1 75578 66765 1.1320003
BM_CombinedNMS_cpu_64_2500_25_1 122426 107127 1.142811803
BM_CombinedNMS_cpu_1_500_25_25 2965 3001 0.988003999
BM_CombinedNMS_cpu_28_500_25_25 50016 51001 0.980686653
BM_CombinedNMS_cpu_32_500_25_25 54472 55636 0.979078295
BM_CombinedNMS_cpu_64_500_25_25 82680 84588 0.977443609
BM_CombinedNMS_cpu_1_1000_25_25 3219 3122 1.031069827
BM_CombinedNMS_cpu_28_1000_25_25 55428 55251 1.003203562
BM_CombinedNMS_cpu_32_1000_25_25 61423 60425 1.016516343
BM_CombinedNMS_cpu_64_1000_25_25 98121 96076 1.021285233
BM_CombinedNMS_cpu_1_1917_25_25 3716 3239 1.147267675
BM_CombinedNMS_cpu_28_1917_25_25 65656 60912 1.077882847
BM_CombinedNMS_cpu_32_1917_25_25 74634 70001 1.066184769
BM_CombinedNMS_cpu_64_1917_25_25 125768 115796 1.086116964
BM_CombinedNMS_cpu_1_2500_25_25 3908 3575 1.093146853
BM_CombinedNMS_cpu_28_2500_25_25 74623 67591 1.10403752
BM_CombinedNMS_cpu_32_2500_25_25 83959 75973 1.105116291
BM_CombinedNMS_cpu_64_2500_25_25 144104 128128 1.124687812
BM_CombinedNMS_cpu_1_500_90_1 6511 6538 0.995870297
BM_CombinedNMS_cpu_28_500_90_1 129866 132216 0.982226054
BM_CombinedNMS_cpu_32_500_90_1 141676 146484 0.967177303
BM_CombinedNMS_cpu_64_500_90_1 238119 248641 0.957681959
BM_CombinedNMS_cpu_1_1000_90_1 7347 7065 1.039915074
BM_CombinedNMS_cpu_28_1000_90_1 145292 142354 1.020638689
BM_CombinedNMS_cpu_32_1000_90_1 158022 156830 1.007600587
BM_CombinedNMS_cpu_64_1000_90_1 270193 266259 1.014775087
BM_CombinedNMS_cpu_1_1917_90_1 8450 7920 1.066919192
BM_CombinedNMS_cpu_28_1917_90_1 173900 160502 1.083475595
BM_CombinedNMS_cpu_32_1917_90_1 196951 173393 1.13586477
BM_CombinedNMS_cpu_64_1917_90_1 340503 307652 1.106779738
BM_CombinedNMS_cpu_1_2500_90_1 9129 8477 1.076914003
BM_CombinedNMS_cpu_28_2500_90_1 191715 169376 1.131889996
BM_CombinedNMS_cpu_32_2500_90_1 214517 187388 1.144774479
BM_CombinedNMS_cpu_64_2500_90_1 382273 331639 1.152678063
BM_CombinedNMS_cpu_1_500_90_90 6483 6478 1.000771843
BM_CombinedNMS_cpu_28_500_90_90 133296 135703 0.982262736
BM_CombinedNMS_cpu_32_500_90_90 144556 151654 0.953196091
BM_CombinedNMS_cpu_64_500_90_90 252979 255974 0.988299593
BM_CombinedNMS_cpu_1_1000_90_90 7341 7220 1.016759003
BM_CombinedNMS_cpu_28_1000_90_90 154367 151577 1.018406486
BM_CombinedNMS_cpu_32_1000_90_90 173259 167141 1.036603826
BM_CombinedNMS_cpu_64_1000_90_90 300782 293236 1.025733539
BM_CombinedNMS_cpu_1_1917_90_90 8563 8060 1.062406948
BM_CombinedNMS_cpu_28_1917_90_90 196507 179871 1.092488506
BM_CombinedNMS_cpu_32_1917_90_90 222948 203074 1.097865803
BM_CombinedNMS_cpu_64_1917_90_90 397955 364657 1.091313207
BM_CombinedNMS_cpu_1_2500_90_90 9368 8621 1.086648881
BM_CombinedNMS_cpu_28_2500_90_90 229845 202881 1.132905496
BM_CombinedNMS_cpu_32_2500_90_90 251444 226392 1.11065762
BM_CombinedNMS_cpu_64_2500_90_90 456374 403514 1.130999172
BM_CombinedNMS_cpu_1_500_200_1 12767 13203 0.966977202
BM_CombinedNMS_cpu_28_500_200_1 225725 234416 0.962924886
BM_CombinedNMS_cpu_32_500_200_1 252504 264165 0.955857135
BM_CombinedNMS_cpu_64_500_200_1 451719 469485 0.962158535
BM_CombinedNMS_cpu_1_1000_200_1 14195 13936 1.01858496
BM_CombinedNMS_cpu_28_1000_200_1 258482 253823 1.018355311
BM_CombinedNMS_cpu_32_1000_200_1 291218 285598 1.019678009
BM_CombinedNMS_cpu_64_1000_200_1 523317 517501 1.011238626
BM_CombinedNMS_cpu_1_1917_200_1 16427 14862 1.105302113
BM_CombinedNMS_cpu_28_1917_200_1 326365 292076 1.117397527
BM_CombinedNMS_cpu_32_1917_200_1 366054 330116 1.108864763
BM_CombinedNMS_cpu_64_1917_200_1 677261 602361 1.12434404
BM_CombinedNMS_cpu_1_2500_200_1 18088 15840 1.141919192
BM_CombinedNMS_cpu_28_2500_200_1 364184 315420 1.154600216
BM_CombinedNMS_cpu_32_2500_200_1 415267 354909 1.17006613
BM_CombinedNMS_cpu_64_2500_200_1 777997 656129 1.185737866
BM_CombinedNMS_cpu_1_500_200_200 13066 13079 0.99900604
BM_CombinedNMS_cpu_28_500_200_200 237618 241347 0.984549218
BM_CombinedNMS_cpu_32_500_200_200 264774 268368 0.986607941
BM_CombinedNMS_cpu_64_500_200_200 472916 486986 0.971107999
BM_CombinedNMS_cpu_1_1000_200_200 14559 14003 1.039705777
BM_CombinedNMS_cpu_28_1000_200_200 276220 274795 1.005185684
BM_CombinedNMS_cpu_32_1000_200_200 316452 309632 1.022026147
BM_CombinedNMS_cpu_64_1000_200_200 579009 568403 1.018659296
BM_CombinedNMS_cpu_1_1917_200_200 17198 15559 1.10534096
BM_CombinedNMS_cpu_28_1917_200_200 371166 335064 1.10774658
BM_CombinedNMS_cpu_32_1917_200_200 417406 373736 1.116847186
BM_CombinedNMS_cpu_64_1917_200_200 791732 714316 1.108377805
BM_CombinedNMS_cpu_1_2500_200_200 19240 16614 1.158059468
BM_CombinedNMS_cpu_28_2500_200_200 437781 384334 1.139063939
BM_CombinedNMS_cpu_32_2500_200_200 488780 424662 1.150985961
BM_CombinedNMS_cpu_64_2500_200_200 933122 807729 1.155241424

@gbaned gbaned added the awaiting review Pull request awaiting review label Dec 28, 2020
Copy link
Member

@penpornk penpornk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR and benchmark results!

PR Queue automation moved this from Assigned Reviewer to Approved by Reviewer Dec 30, 2020
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Dec 30, 2020
@penpornk penpornk removed the awaiting review Pull request awaiting review label Dec 30, 2020
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Dec 30, 2020
@gbaned gbaned added ready to pull PR ready for merge process and removed ready to pull PR ready for merge process labels Jan 4, 2021
@copybara-service copybara-service bot merged commit 0c68f6b into tensorflow:master Jan 12, 2021
PR Queue automation moved this from Approved by Reviewer to Merged Jan 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes comp:mkl MKL related issues ready to pull PR ready for merge process size:S CL Change Size: Small
Projects
PR Queue
  
Merged
Development

Successfully merging this pull request may close these issues.

None yet

4 participants