Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use xa_nnlib for depthwise_conv for Fusion F1 #47380

Merged
merged 4 commits into from Feb 25, 2021

Conversation

njeffrie
Copy link
Contributor

The code in this change is the subset of functionality needed for int8 svdf for Hifi4 copied from pnikam-cad/tensorflow@a737c1e/tensorflow/lite/micro/kernels/xtensa_hifi/depthwise_conv.cc

Note that the current change has not pulled in the floating point, uint8 implementation or the Hifi5 implementation.

Profiled the person_detection_benchmark with the following command:

make -f tensorflow/lite/micro/tools/make/Makefile TARGET=xtensa OPTIMIZED_KERNEL_DIR=xtensa TARGET_ARCH=fusion_f1 XTENSA_CORE=F1_190305_swupgrade run_person_detection_benchmark -j8
gives a latency of 9.661M ticks with this change vs 73.761M ticks without this change.

Per OP latency with this change:

WithPersonDataIterations(1) took 9661310 ticks (9661 ms)
DEPTHWISE_CONV_2D took 1156157 ticks (1156 ms).
DEPTHWISE_CONV_2D took 628525 ticks (628 ms).
CONV_2D took 987374 ticks (987 ms).
DEPTHWISE_CONV_2D took 395420 ticks (395 ms).
CONV_2D took 554630 ticks (554 ms).
DEPTHWISE_CONV_2D took 545252 ticks (545 ms).
CONV_2D took 665222 ticks (665 ms).
DEPTHWISE_CONV_2D took 172412 ticks (172 ms).
CONV_2D took 334262 ticks (334 ms).
DEPTHWISE_CONV_2D took 283280 ticks (283 ms).
CONV_2D took 444854 ticks (444 ms).
DEPTHWISE_CONV_2D took 88394 ticks (88 ms).
CONV_2D took 225302 ticks (225 ms).
DEPTHWISE_CONV_2D took 158090 ticks (158 ms).
CONV_2D took 335894 ticks (335 ms).
DEPTHWISE_CONV_2D took 158090 ticks (158 ms).
CONV_2D took 335894 ticks (335 ms).
DEPTHWISE_CONV_2D took 158090 ticks (158 ms).
CONV_2D took 335894 ticks (335 ms).
DEPTHWISE_CONV_2D took 158090 ticks (158 ms).
CONV_2D took 335894 ticks (335 ms).
DEPTHWISE_CONV_2D took 158090 ticks (158 ms).
CONV_2D took 335894 ticks (335 ms).
DEPTHWISE_CONV_2D took 59525 ticks (59 ms).
CONV_2D took 173270 ticks (173 ms).
DEPTHWISE_CONV_2D took 112424 ticks (112 ms).
CONV_2D took 283862 ticks (283 ms).
AVERAGE_POOL_2D took 75604 ticks (75 ms).
CONV_2D took 3398 ticks (3 ms).
RESHAPE took 290 ticks (0 ms).
SOFTMAX took 1933 ticks (1 ms).

Without this change:

KeywordRunNIerations(1) took 38516 ticks (38 ms)
DEPTHWISE_CONV_2D took 11961939 ticks (11961 ms).
DEPTHWISE_CONV_2D took 12296923 ticks (12296 ms).
CONV_2D took 987358 ticks (987 ms).
DEPTHWISE_CONV_2D took 6138259 ticks (6138 ms).
CONV_2D took 554614 ticks (554 ms).
DEPTHWISE_CONV_2D took 12063331 ticks (12063 ms).
CONV_2D took 665206 ticks (665 ms).
DEPTHWISE_CONV_2D took 3018615 ticks (3018 ms).
CONV_2D took 334246 ticks (334 ms).
DEPTHWISE_CONV_2D took 5837463 ticks (5837 ms).
CONV_2D took 444838 ticks (444 ms).
DEPTHWISE_CONV_2D took 1462009 ticks (1462 ms).
CONV_2D took 225286 ticks (225 ms).
DEPTHWISE_CONV_2D took 2734009 ticks (2734 ms).
CONV_2D took 335878 ticks (335 ms).
DEPTHWISE_CONV_2D took 2734009 ticks (2734 ms).
CONV_2D took 335878 ticks (335 ms).
DEPTHWISE_CONV_2D took 2734009 ticks (2734 ms).
CONV_2D took 335878 ticks (335 ms).
DEPTHWISE_CONV_2D took 2734009 ticks (2734 ms).
CONV_2D took 335878 ticks (335 ms).
DEPTHWISE_CONV_2D took 2734009 ticks (2734 ms).
CONV_2D took 335878 ticks (335 ms).
DEPTHWISE_CONV_2D took 685980 ticks (685 ms).
CONV_2D took 173254 ticks (173 ms).
DEPTHWISE_CONV_2D took 1197084 ticks (1197 ms).
CONV_2D took 283846 ticks (283 ms).
AVERAGE_POOL_2D took 75604 ticks (75 ms).
CONV_2D took 3382 ticks (3 ms).
RESHAPE took 290 ticks (0 ms).
SOFTMAX took 1933 ticks (1 ms).

Confirmed that the kernel_conv_test passes with:

make -f tensorflow/lite/micro/tools/make/Makefile TARGET=xtensa OPTIMIZED_KERNEL_DIR=xtensa TARGET_ARCH=fusion_f1 XTENSA_CORE=F1_190305_swupgrade test_kernel_depthwise_conv_test -j8

Progress towards http://b/177457688

@google-ml-butler google-ml-butler bot added the size:L CL Change Size: Large label Feb 24, 2021
@google-ml-butler
Copy link

Thanks for contributing to TensorFlow Lite Micro.

To keep this process moving along, we'd like to make sure that you have completed the items on this list:

We would like to have a discussion on the Github issue first to determine the best path forward, and then proceed to the PR review.

@google-cla google-cla bot added the cla: yes label Feb 24, 2021
@gbaned gbaned self-assigned this Feb 25, 2021
@gbaned gbaned added the comp:micro Related to TensorFlow Lite Microcontrollers label Feb 25, 2021
@gbaned gbaned added this to Assigned Reviewer in PR Queue via automation Feb 25, 2021
return kTfLiteOk;
}

#if defined(Fusion_F1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uppercase FUSION_F1

Copy link
Contributor Author

@njeffrie njeffrie Feb 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed Ran both Fusion F1 build + tests to verify.

PR Queue automation moved this from Assigned Reviewer to Reviewer Requested Changes Feb 25, 2021
PR Queue automation moved this from Reviewer Requested Changes to Approved by Reviewer Feb 25, 2021
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Feb 25, 2021
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Feb 25, 2021
@copybara-service copybara-service bot merged commit dd854c7 into tensorflow:master Feb 25, 2021
PR Queue automation moved this from Approved by Reviewer to Merged Feb 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes comp:micro Related to TensorFlow Lite Microcontrollers ready to pull PR ready for merge process size:L CL Change Size: Large
Projects
PR Queue
  
Merged
Development

Successfully merging this pull request may close these issues.

None yet

4 participants