Optimize model for iOS Neural Engine #2526

Leon0402 · 2021-03-19T08:32:07Z

🚀 Feature

Provide optimized yolov5 version for the iOS Neural Engine, so real time object detection is possible.

Motivation

Disclaimer: I only used yolov5 version 2 and therefore can't be sure if the following is the only problem with the newest versions (But it is very likely still a problem after looking at the source code)

When running yolov5 version 2 it's quite slow on iOS devices and real time detection is impossible. This is due to the fact that yolov5 is not fully running on the Neural Engine, but switching between the CPU/GPU and Neural Engine, which increases detection time by at least factor 4.

Pitch

With some small adjustments to the model configuration, it can run a lot faster on iOS devices. In particular I found out that in version2 the SPP Layer is the bottle neck. Kernel Sizes above 7 are not supported by the Neural Engine and will force the device to switch back to the CPU / GPU.

Disclaimer: The Neural Engine is propitary, so this is not documented anywhere.

This is the only problem I had with yolov5 version2 (there are potentially more in newer yolov5 versions). I'm not an expert here, but I can only suspect that this performance improvement is in general (so not iOS deviced) bad and will decrease accuracy. It probably also leads to less accuracy on iOS devices, although the performances gain outweighs it by far.

So the question is really here: Are you willing to accept a PR in any way? Or perhaps itegrate it yourself in some way (as you probably know the best how to do it).
I understand that you probably don't want to change the model configuration in any way that could potentially make it worse on all devices except iOS devices. What is your preferred solution?

glenn-jocher · 2021-03-19T19:30:36Z

@Leon0402 thanks for the feature request! It sounds like you've done some good testing in iOS, and your discovery of SPP layer execution being sent to the CPU is interesting. I can't say for sure if I've observed similar effects.

Are your results inconsistent with the YOLOv5 iOS benchmarks?

Can you also test this YOLOv5s CoreML model and see if the issue persists?
https://github.com/ultralytics/yolov5/releases/download/v4.0/yolov5s.mlmodel

Are you using the latest version of MacOS, XCode, and iOS?

github-actions · 2021-04-19T00:16:08Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Leon0402 · 2021-04-20T07:49:32Z

Hi @glenn-jocher,
sorry for the late response, but I currently don't have a suitable test device therefore I needed a little bit longer to do some experiments.

So I did test your model and also exported the model myself to see if there are any issues. In fact, the results with your model are consistent with the Yolov5 iOS benchmarks. My Model, yolov5 v4 exported on the other hand, needs > 100ms. So it is slower by factor 5 or more.
I believe this is mainly because you use a much smaller image size. You also use FP8, but I believe this just has an impact on the model size and accuracy, but not on the speed.

Could you explain why you choose the image size instead of regular 640x640? Have you tested how it affects quality?

The original statement is true anyway, your model also doesn't run one ANE completly. It has problems with the pooling layer as well. Here is a modified version of your model:

yolov5s-original-ane.mlmodel.zip

It's not pretrained, so expect a worse quality. I changed the kernel size to something lower and now the model runs completly on ANE. Unfortunately it has no (noticable) effect on performance in my tests, but the cpu usage dropped by about 20%, so it's still an improvement.

As I said earlier the original model with 640x640 image size is quite fast as well, when this fix is applied (I think its a few ms slower, but this would need some more testing). As I expect this to have a better quality, it might be worth it as well. But perhaps you could explain me what impact the image size has.

github-actions · 2021-05-22T00:09:10Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Leon0402 · 2021-05-25T10:00:43Z

still relevant

glenn-jocher · 2021-05-26T13:16:57Z

@Leon0402 YOLOv5 iOS models are exported at 320x192 heightxwidth for portrait 4k video inference. YOLOv5s inference speed is about 17-18ms on iPhone 11/12. See #1276 for a full profile of various devices. Image pixels directly correlate to FLOPS and inference time.

Could you submit a PR or a explain the updates you made for your improvements to allow the 'pooling layer' (you mean the SPP() module?) to run on the ANE? Thanks!! Maybe we can tweak the architecture a bit to improve the exportability.

Leon0402 · 2021-06-01T17:40:56Z

YOLOv5 iOS models are exported at 320x192 heightxwidth for portrait 4k video inference

Does this improve quality as it's closer to the actual image size you get (compared to quadratic images)? -> Sorry no expert here.

Just asking as we use the same model regardless of the phone orientation I believe

Could you submit a PR or a explain the updates you made for your improvements to allow the 'pooling layer' (you mean the SPP() module?) to run on the ANE?

Just change here https://github.com/ultralytics/yolov5/blob/develop/models/yolov5s.yaml#L23 the kernel params [5, 9, 13] to something like [3, 5, 7]. Whatever makes sense here and is <= 7, then it should run on ANE.

glenn-jocher · 2021-06-01T17:55:09Z

@Leon0402 oh that's super interesting, so 2d convolutions > 7x7 pixels are not supported on ANE?

The rectangular export helps more with reducing FLOPS than with improving inference, as iOS provides a few options for image resize which can pad or stretch your input image into your model shape.

Leon0402 · 2021-06-01T18:31:51Z

@glenn-jocher I cannot explain the technical details why 2D convolutions > 7x7 pixel are not supported on ANE (yet). Neither if that's exactly the case. Just based on my experiments the calls to ANE where splitted up whenever I had a value greater than 7 there. I obviously haven't tested it for every number greater than 7 :-) So could be that the conclusion I made is not entirely true.
Furthermore it could always change in the future.

Unfortunately ANE is completly propitary, so basically everything we know about it is based on experiments.

If you make some test, let me know if you experience any Performance Imrpovements (CPU Usage or inference speed)

glenn-jocher · 2021-06-01T20:43:00Z

@Leon0402 interesting! Yes I agree ANE documentation is severely lacking unfortunately. Thanks for the info, I'll post any new results I find here.

github-actions · 2021-07-02T00:08:40Z

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Wiki – https://github.com/ultralytics/yolov5/wiki
Tutorials – https://docs.ultralytics.com/yolov5
Docs – https://docs.ultralytics.com

Access additional Ultralytics ⚡ resources:

Ultralytics HUB – https://ultralytics.com
Vision API – https://ultralytics.com/yolov5
About Us – https://ultralytics.com/about
Join Our Team – https://ultralytics.com/work
Contact Us – https://ultralytics.com/contact

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

usmanqureshi94 · 2021-08-06T10:13:36Z

any updates on the results ?

abdullahabid10 · 2022-05-12T04:10:09Z

Hi @glenn-jocher, any update on whether the current implementation fully supports ANE?

glenn-jocher · 2022-05-12T10:03:52Z

@abdullahabid10 yes of course

abdullahabid10 · 2022-05-12T15:32:57Z

@abdullahabid10 yes of course

@glenn-jocher thanks! Can you please let me know in which release this issue was fixed?

glenn-jocher · 2022-05-12T16:12:32Z

@abdullahabid10 oh sorry I'm not sure what issue you are referring to. What was the problem before?

abdullahabid10 · 2022-05-12T16:57:54Z

Could you submit a PR or a explain the updates you made for your improvements to allow the 'pooling layer' (you mean the SPP() module?) to run on the ANE? Thanks!! Maybe we can tweak the architecture a bit to improve the exportability.

@glenn-jocher I'm referring to this. It seems like the model doesn't completely run on ANE. So just wanted an update on what's the current status?

glenn-jocher · 2022-05-12T18:26:53Z

@abdullahabid10 oh I think that's a very old conversation. It's a bit difficult to determine ANE utilization but it should be near 100% as iOS times for iDetection are lightning fast, i.e. see #1276

iDetection v7.8 Inference Speeds

	Year	ASIC -process	ANE (TOPS)	YOLOv5s (ms)	YOLOv5m (ms)	YOLOv5l (ms)	YOLOv5x (ms)
iPhone 6	2014	A8-20nm	-	90	180	350	500
iPhone 6s	2015	A9-16nm	-	148	216	304	475
iPhone 7	2016	A10-16nm	-	94.7	140.7	216.7	289.8
iPhone 8/X	2017	A11-10nm	0.6	-	-	-	-
iPhone XR/XS	2018	A12-7nm	5.0	22.3	25.8	43.2	57.7
iPhone 11	2019	A13-7nm	6.0	17.4	21.3	27.8	41.2
iPhone 12	2020	A14-5nm	11.0	14.3	16.5	21.0	28.8
iPhone 13	2021	A15-5nm	15.8

*CoreML models exported as FP8 320x192 with release v3.1
*Measured with iDetection v7.8 at 100% battery at 25°C. Average speed after 10 seconds recorded.

Leon0402 added the enhancement New feature or request label Mar 19, 2021

github-actions bot added the Stale label Apr 19, 2021

github-actions bot removed the Stale label Apr 21, 2021

Leon0402 mentioned this issue May 4, 2021

Export a different resolutions dbsystel/yolov5-coreml-tools#1

Open

github-actions bot added the Stale label May 22, 2021

github-actions bot removed the Stale label May 26, 2021

github-actions bot added the Stale label Jul 2, 2021

github-actions bot closed this as completed Jul 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize model for iOS Neural Engine #2526

Optimize model for iOS Neural Engine #2526

Leon0402 commented Mar 19, 2021 •

edited

glenn-jocher commented Mar 19, 2021

github-actions bot commented Apr 19, 2021

Leon0402 commented Apr 20, 2021 •

edited

github-actions bot commented May 22, 2021

Leon0402 commented May 25, 2021

glenn-jocher commented May 26, 2021

Leon0402 commented Jun 1, 2021

glenn-jocher commented Jun 1, 2021

Leon0402 commented Jun 1, 2021

glenn-jocher commented Jun 1, 2021

github-actions bot commented Jul 2, 2021 •

edited by glenn-jocher

usmanqureshi94 commented Aug 6, 2021

abdullahabid10 commented May 12, 2022

glenn-jocher commented May 12, 2022

abdullahabid10 commented May 12, 2022

glenn-jocher commented May 12, 2022

abdullahabid10 commented May 12, 2022

glenn-jocher commented May 12, 2022

Optimize model for iOS Neural Engine #2526

Optimize model for iOS Neural Engine #2526

Comments

Leon0402 commented Mar 19, 2021 • edited

🚀 Feature

Motivation

Pitch

glenn-jocher commented Mar 19, 2021

github-actions bot commented Apr 19, 2021

Leon0402 commented Apr 20, 2021 • edited

github-actions bot commented May 22, 2021

Leon0402 commented May 25, 2021

glenn-jocher commented May 26, 2021

Leon0402 commented Jun 1, 2021

glenn-jocher commented Jun 1, 2021

Leon0402 commented Jun 1, 2021

glenn-jocher commented Jun 1, 2021

github-actions bot commented Jul 2, 2021 • edited by glenn-jocher

usmanqureshi94 commented Aug 6, 2021

abdullahabid10 commented May 12, 2022

glenn-jocher commented May 12, 2022

abdullahabid10 commented May 12, 2022

glenn-jocher commented May 12, 2022

abdullahabid10 commented May 12, 2022

glenn-jocher commented May 12, 2022

iDetection v7.8 Inference Speeds

Leon0402 commented Mar 19, 2021 •

edited

Leon0402 commented Apr 20, 2021 •

edited

github-actions bot commented Jul 2, 2021 •

edited by glenn-jocher