Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize model for iOS Neural Engine #2526

Closed
Leon0402 opened this issue Mar 19, 2021 · 18 comments
Closed

Optimize model for iOS Neural Engine #2526

Leon0402 opened this issue Mar 19, 2021 · 18 comments
Labels
enhancement New feature or request Stale

Comments

@Leon0402
Copy link

Leon0402 commented Mar 19, 2021

🚀 Feature

Provide optimized yolov5 version for the iOS Neural Engine, so real time object detection is possible.

Motivation

Disclaimer: I only used yolov5 version 2 and therefore can't be sure if the following is the only problem with the newest versions (But it is very likely still a problem after looking at the source code)

When running yolov5 version 2 it's quite slow on iOS devices and real time detection is impossible. This is due to the fact that yolov5 is not fully running on the Neural Engine, but switching between the CPU/GPU and Neural Engine, which increases detection time by at least factor 4.

Pitch

With some small adjustments to the model configuration, it can run a lot faster on iOS devices. In particular I found out that in version2 the SPP Layer is the bottle neck. Kernel Sizes above 7 are not supported by the Neural Engine and will force the device to switch back to the CPU / GPU.

Disclaimer: The Neural Engine is propitary, so this is not documented anywhere.

This is the only problem I had with yolov5 version2 (there are potentially more in newer yolov5 versions). I'm not an expert here, but I can only suspect that this performance improvement is in general (so not iOS deviced) bad and will decrease accuracy. It probably also leads to less accuracy on iOS devices, although the performances gain outweighs it by far.

So the question is really here: Are you willing to accept a PR in any way? Or perhaps itegrate it yourself in some way (as you probably know the best how to do it).
I understand that you probably don't want to change the model configuration in any way that could potentially make it worse on all devices except iOS devices. What is your preferred solution?

@Leon0402 Leon0402 added the enhancement New feature or request label Mar 19, 2021
@glenn-jocher
Copy link
Member

@Leon0402 thanks for the feature request! It sounds like you've done some good testing in iOS, and your discovery of SPP layer execution being sent to the CPU is interesting. I can't say for sure if I've observed similar effects.

Are your results inconsistent with the YOLOv5 iOS benchmarks?

Can you also test this YOLOv5s CoreML model and see if the issue persists?
https://github.com/ultralytics/yolov5/releases/download/v4.0/yolov5s.mlmodel

Are you using the latest version of MacOS, XCode, and iOS?

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the Stale label Apr 19, 2021
@Leon0402
Copy link
Author

Leon0402 commented Apr 20, 2021

Hi @glenn-jocher,
sorry for the late response, but I currently don't have a suitable test device therefore I needed a little bit longer to do some experiments.

So I did test your model and also exported the model myself to see if there are any issues. In fact, the results with your model are consistent with the Yolov5 iOS benchmarks. My Model, yolov5 v4 exported on the other hand, needs > 100ms. So it is slower by factor 5 or more.
I believe this is mainly because you use a much smaller image size. You also use FP8, but I believe this just has an impact on the model size and accuracy, but not on the speed.

Could you explain why you choose the image size instead of regular 640x640? Have you tested how it affects quality?

The original statement is true anyway, your model also doesn't run one ANE completly. It has problems with the pooling layer as well. Here is a modified version of your model:

yolov5s-original-ane.mlmodel.zip

It's not pretrained, so expect a worse quality. I changed the kernel size to something lower and now the model runs completly on ANE. Unfortunately it has no (noticable) effect on performance in my tests, but the cpu usage dropped by about 20%, so it's still an improvement.

As I said earlier the original model with 640x640 image size is quite fast as well, when this fix is applied (I think its a few ms slower, but this would need some more testing). As I expect this to have a better quality, it might be worth it as well. But perhaps you could explain me what impact the image size has.

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the Stale label May 22, 2021
@Leon0402
Copy link
Author

still relevant

@github-actions github-actions bot removed the Stale label May 26, 2021
@glenn-jocher
Copy link
Member

@Leon0402 YOLOv5 iOS models are exported at 320x192 heightxwidth for portrait 4k video inference. YOLOv5s inference speed is about 17-18ms on iPhone 11/12. See #1276 for a full profile of various devices. Image pixels directly correlate to FLOPS and inference time.

Could you submit a PR or a explain the updates you made for your improvements to allow the 'pooling layer' (you mean the SPP() module?) to run on the ANE? Thanks!! Maybe we can tweak the architecture a bit to improve the exportability.

@Leon0402
Copy link
Author

Leon0402 commented Jun 1, 2021

YOLOv5 iOS models are exported at 320x192 heightxwidth for portrait 4k video inference

Does this improve quality as it's closer to the actual image size you get (compared to quadratic images)? -> Sorry no expert here.

Just asking as we use the same model regardless of the phone orientation I believe

Could you submit a PR or a explain the updates you made for your improvements to allow the 'pooling layer' (you mean the SPP() module?) to run on the ANE?

Just change here https://github.com/ultralytics/yolov5/blob/develop/models/yolov5s.yaml#L23 the kernel params [5, 9, 13] to something like [3, 5, 7]. Whatever makes sense here and is <= 7, then it should run on ANE.

@glenn-jocher
Copy link
Member

@Leon0402 oh that's super interesting, so 2d convolutions > 7x7 pixels are not supported on ANE?

The rectangular export helps more with reducing FLOPS than with improving inference, as iOS provides a few options for image resize which can pad or stretch your input image into your model shape.

@Leon0402
Copy link
Author

Leon0402 commented Jun 1, 2021

@glenn-jocher I cannot explain the technical details why 2D convolutions > 7x7 pixel are not supported on ANE (yet). Neither if that's exactly the case. Just based on my experiments the calls to ANE where splitted up whenever I had a value greater than 7 there. I obviously haven't tested it for every number greater than 7 :-) So could be that the conclusion I made is not entirely true.
Furthermore it could always change in the future.

Unfortunately ANE is completly propitary, so basically everything we know about it is based on experiments.

If you make some test, let me know if you experience any Performance Imrpovements (CPU Usage or inference speed)

@glenn-jocher
Copy link
Member

@Leon0402 interesting! Yes I agree ANE documentation is severely lacking unfortunately. Thanks for the info, I'll post any new results I find here.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 2, 2021

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

@github-actions github-actions bot added the Stale label Jul 2, 2021
@github-actions github-actions bot closed this as completed Jul 7, 2021
@usmanqureshi94
Copy link

any updates on the results ?

@abdullahabid10
Copy link

Hi @glenn-jocher, any update on whether the current implementation fully supports ANE?

@glenn-jocher
Copy link
Member

@abdullahabid10 yes of course

@abdullahabid10
Copy link

@abdullahabid10 yes of course

@glenn-jocher thanks! Can you please let me know in which release this issue was fixed?

@glenn-jocher
Copy link
Member

@abdullahabid10 oh sorry I'm not sure what issue you are referring to. What was the problem before?

@abdullahabid10
Copy link

Could you submit a PR or a explain the updates you made for your improvements to allow the 'pooling layer' (you mean the SPP() module?) to run on the ANE? Thanks!! Maybe we can tweak the architecture a bit to improve the exportability.

@glenn-jocher I'm referring to this. It seems like the model doesn't completely run on ANE. So just wanted an update on what's the current status?

@glenn-jocher
Copy link
Member

@abdullahabid10 oh I think that's a very old conversation. It's a bit difficult to determine ANE utilization but it should be near 100% as iOS times for iDetection are lightning fast, i.e. see #1276

iDetection v7.8 Inference Speeds

  Year ASIC
-process
ANE
(TOPS)
YOLOv5s
(ms)
YOLOv5m
(ms)
YOLOv5l
(ms)
YOLOv5x
(ms)
iPhone 6 2014 A8-20nm - 90 180 350 500
iPhone 6s 2015 A9-16nm - 148 216 304 475
iPhone 7 2016 A10-16nm - 94.7 140.7 216.7 289.8
iPhone 8/X 2017 A11-10nm 0.6 - - - -
iPhone XR/XS 2018 A12-7nm 5.0 22.3 25.8 43.2 57.7
iPhone 11 2019 A13-7nm 6.0 17.4 21.3 27.8 41.2
iPhone 12 2020 A14-5nm 11.0 14.3 16.5 21.0 28.8
iPhone 13 2021 A15-5nm 15.8

*CoreML models exported as FP8 320x192 with release v3.1
*Measured with iDetection v7.8 at 100% battery at 25°C. Average speed after 10 seconds recorded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Stale
Projects
None yet
Development

No branches or pull requests

4 participants