Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tflite] add xnnpack delegate to label_image #36749

Merged

Conversation

freedomtan
Copy link
Contributor

@freedomtan freedomtan commented Feb 14, 2020

The XNNPACK Delegate uses XNNPACK, a fairly optimized floating point library, to run some inference operators (see the delegate's README for currently supported ops). With XNNPACK, I was able to get performance numbers similar to what @Maratyszcza described at XNNPACK's readme. I also got good numbers on Pixel 4 and Oppo Reno 3. Numbers on x86 machines are also good.

  • Single-threaded, label_image -m MODEL_NAME -x 1 -c 50 -t 1
Model Pixel 3a, ms Pixel 4, ms Oppo Reno 3, ms
MobileNet v1 1.0X 88.02 29.95 36.23
MobileNet v2 1.0X 55.13 19.13 21.69
MobileNet v3 Large 44.84 15.69 17.65
MobileNet v3 Small 14.23 5.58 5.66
  • multi-threaded, the number of threads is set to be the number of big cores. label_image -m MODEL_NAME -x 1 -c 50 -t NUMBER_OF_BIG_CORES
Model Pixel 3a, ms Pixel 4, ms Oppo Reno 3, ms
MobileNet v1 1.0X 46.00 10.60 13.23
MobileNet v2 1.0X 28.57 6.89 7.56
MobileNet v3 Large 24.62 6.52 6.88
MobileNet v3 Small 8.24 2.47 2.37

@tensorflow-bot tensorflow-bot bot added the size:S CL Change Size: Small label Feb 14, 2020
@gbaned gbaned self-assigned this Feb 14, 2020
@gbaned gbaned added the comp:lite TF Lite related issues label Feb 14, 2020
@gbaned gbaned added this to Assigned Reviewer in PR Queue via automation Feb 14, 2020
@gbaned gbaned requested a review from abattery February 14, 2020 12:53
@gbaned gbaned added the awaiting review Pull request awaiting review label Feb 19, 2020
Copy link
Contributor

@srjoglekar246 srjoglekar246 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some results you observe with XNNPack on label_image, to the PR description?

tensorflow/lite/examples/label_image/label_image.cc Outdated Show resolved Hide resolved
PR Queue automation moved this from Assigned Reviewer to Reviewer Requested Changes Feb 28, 2020
@tensorflowbutler tensorflowbutler removed the awaiting review Pull request awaiting review label Mar 2, 2020
@gbaned gbaned added the awaiting review Pull request awaiting review label Mar 3, 2020
Copy link
Contributor

@srjoglekar246 srjoglekar246 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now. Thanks!

As one last thing, could you update the README in examples/label_image, for other users to use the delegate?

Will approve once this change is in.

@tensorflowbutler tensorflowbutler removed the awaiting review Pull request awaiting review label Mar 5, 2020
@gbaned
Copy link
Contributor

gbaned commented Mar 5, 2020

@freedomtan Can you please check srjoglekar246's comments and keep us posted? Thanks!

@gbaned gbaned added the stat:awaiting response Status - Awaiting response from author label Mar 5, 2020
@freedomtan
Copy link
Contributor Author

@srjoglekar246 and @gbaned: thanks. I updated it.

PR Queue automation moved this from Reviewer Requested Changes to Approved by Reviewer Mar 5, 2020
@tensorflow-bot tensorflow-bot bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Mar 5, 2020
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Mar 5, 2020
@gbaned gbaned removed the stat:awaiting response Status - Awaiting response from author label Mar 6, 2020
PR Queue automation moved this from Approved by Reviewer to Reviewer Requested Changes Mar 10, 2020
@tensorflow-bot tensorflow-bot bot removed the ready to pull PR ready for merge process label Mar 11, 2020
PR Queue automation moved this from Reviewer Requested Changes to Approved by Reviewer Mar 11, 2020
@tensorflow-bot tensorflow-bot bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Mar 11, 2020
@srjoglekar246
Copy link
Contributor

@gbaned Can you help get this in with the new change?

@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Mar 11, 2020
@gbaned gbaned added ready to pull PR ready for merge process and removed ready to pull PR ready for merge process labels Mar 12, 2020
@tensorflow-bot tensorflow-bot bot removed the ready to pull PR ready for merge process label Mar 13, 2020
@gbaned gbaned added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Mar 16, 2020
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Mar 16, 2020
@tensorflow-copybara tensorflow-copybara merged commit b417a83 into tensorflow:master Mar 16, 2020
PR Queue automation moved this from Approved by Reviewer to Merged Mar 16, 2020
@gunan
Copy link
Contributor

gunan commented Mar 17, 2020

I am not sure how it passed the presubmits, but it looks like the build on macos is now failing after this cl:

ERROR: /Volumes/BuildData/tmpfs/tensor_flow/tensorflow/lite/tools/evaluation/stages/BUILD:171:1: Couldn't build file tensorflow/lite/tools/evaluation/stages/inference_profiler_stage_test: Linking of rule '//tensorflow/lite/tools/evaluation/stages:inference_profiler_stage_test' failed (Exit 1)
ld: warning: option -s is obsolete and being ignored
Undefined symbols for architecture x86_64:
  "__cvtu32_mask16", referenced from:
      _xnn_f32_clamp_ukernel__avx512f in libavx512f_ukernels.a(avx512f_f5094ccab65aa097758ad595fc013aa6.o)
      _xnn_f32_dwconv_ukernel_up16x25__avx512f in libavx512f_ukernels.a(up16x25-avx512f_858bc72ba4ab0c22a96b33f8c986da4d.o)
      _xnn_f32_dwconv_ukernel_up16x4__avx512f in libavx512f_ukernels.a(up16x4-avx512f_bda6330d2e7849b699d64546d5554778.o)
      _xnn_f32_dwconv_ukernel_up16x9__avx512f in libavx512f_ukernels.a(up16x9-avx512f_b7bac09f3bd6e9da99c3bd67f089f13b.o)
      _xnn_f32_gemm_ukernel_1x16__avx512f_broadcast in libavx512f_ukernels.a(1x16-avx512f-broadcast_9b0d6ef6fa981981cc969d9d4c7172fe.o)
      _xnn_f32_gemm_ukernel_7x16__avx512f_broadcast in libavx512f_ukernels.a(7x16-avx512f-broadcast_453776902dcf985bc6b1450a4b846b2e.o)
      _xnn_f32_hswish_ukernel__avx512f_x32 in libavx512f_ukernels.a(avx512f-x32_f8deb97e68fad5860a602672bfd8cdf3.o)
      ...
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

I will revert this for now. we can then look into why the presubmits here did not catch the problem.

@freedomtan
Copy link
Contributor Author

@gunan How do I reproduce the error? I guess some recent XNNPACK related changes broke it. With my freedomtan:add_xnnpack_to_label_image branch, I can build tensorflow/lite/tools/evaluation/stages/inference_profiler_stage_test, tensorflow/lite/examples/label_image:label_image, and others. It also works after rebasing my branch to recent 690c33e.

@srjoglekar246
Copy link
Contributor

@Maratyszcza FYI

tensorflow-copybara pushed a commit that referenced this pull request Mar 17, 2020
PiperOrigin-RevId: 301479850
Change-Id: I434b3dfd432ede55ea10b820f38f9af3173e9a41
@Maratyszcza
Copy link
Contributor

I suspect the issue is due to Apple Clang lacking support for _cvtu32_mask16 AVX512 intrinsic. XNNPACK has a polyfill for this case, but it is enabled only for Apple Clang pre-10, while it seems that Apple Clang 10 also lacks _cvtu32_mask16. Presumed fix landed in google/XNNPACK@b187835

freedomtan added a commit to freedomtan/tensorflow that referenced this pull request Mar 25, 2020
rebase and resubmit tensorflow#36749 to see if it works.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes comp:lite TF Lite related issues ready to pull PR ready for merge process size:S CL Change Size: Small
Projects
PR Queue
  
Merged
Development

Successfully merging this pull request may close these issues.

None yet

9 participants