Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New PR for native pose detection on iOS using ONNX yolov8 pose model. #8907

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

pbanavara
Copy link

@pbanavara pbanavara commented Mar 13, 2024

This is a native approach for yolo pose detection models to be run on iOS. The other option is to convert the model to CoreML. This is based off of the ONNX models provided by Microsoft. Please refer to the README for details.

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

WARNING ⚠️ this PR is very large, summary may not cover all changes.

This large code block defines the OrtApi structure, which is a comprehensive collection of function pointers that make up the ONNX Runtime (ORT) C API. This API provides mechanisms to create and manage resources such as tensors, models, sessions, and execution environments. It supports a wide range of functionalities including loading models, running inference sessions, managing custom operators, allocating memory, and much more.

🌟 Summary

The OrtApi structure serves as the primary interface for interacting with ONNX Runtime via C, offering a rich set of functionalities for model execution and management.

📊 Key Changes

  • Provides functions for session management, including creating and running sessions.
  • Offers tensor manipulation capabilities, such as creation, data setting and getting.
  • Supports memory management, including custom allocators.
  • Allows for the customization and extension via custom operators.
  • Enables configuration of session options and execution providers (such as CUDA, TensorRT).
  • Contains utilities for asynchronous execution and shape inference.

🎯 Purpose & Impact

  • Ease of Use: Simplifies interaction with ONNX Runtime by providing C API access, making it accessible for applications written in C or other languages capable of interfacing with C.
  • Extensibility: Facilitates the integration of custom operations and execution providers, enhancing flexibility and supporting specialized use cases.
  • Performance Optimization: Offers detailed configuration options for sessions and execution providers, allowing users to finely tune performance to their specific needs.
  • Cross-Platform and Cross-Device: Supports a variety of execution providers, enabling efficient model execution across different hardware platforms.

Copy link

github-actions bot commented Mar 13, 2024

CLA Assistant Lite bot All Contributors have signed the CLA. ✅

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 Hello @pbanavara, thank you for submitting an Ultralytics YOLOv8 🚀 PR! To allow your work to be integrated as seamlessly as possible, we advise you to:

  • ✅ Verify your PR is up-to-date with ultralytics/ultralytics main branch. If your PR is behind you can update your code by clicking the 'Update branch' button or by running git pull and git merge main locally.
  • ✅ Verify all YOLOv8 Continuous Integration (CI) checks are passing.
  • ✅ Update YOLOv8 Docs for any new or updated features.
  • ✅ Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." — Bruce Lee

See our Contributing Guide for details and let us know if you have any questions!

Copy link

codecov bot commented Mar 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.00%. Comparing base (d608565) to head (05a9187).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8907   +/-   ##
=======================================
  Coverage   75.99%   76.00%           
=======================================
  Files         121      121           
  Lines       15332    15332           
=======================================
+ Hits        11652    11653    +1     
+ Misses       3680     3679    -1     
Flag Coverage Δ
Benchmarks 36.02% <ø> (ø)
GPU 37.92% <ø> (-0.02%) ⬇️
Tests 71.30% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pbanavara
Copy link
Author

I have read the CLA Document and I sign the CLA

@glenn-jocher
Copy link
Member

@pbanavara thanks for the PR! I think this might be a bit much to incorporate though.

Do you have any ONNX benchmarks on iOS? We have an open-source iOS framework for deploying YOLOv8 in https://github.com/ultralytics/yolo-ios-app that I think is much, much simpler than this PR. Perhaps you could take a look and update that repo with additional functionality for pose?

image

@Burhan-Q Burhan-Q added the pose Related to pose Task label Mar 15, 2024
@pbanavara
Copy link
Author

Hi @glenn-jocher sorry I was out for a few days. Was in surgery. Will update this PR shortly by incorporating this with the YOLO iOS app.

@glenn-jocher
Copy link
Member

@pbanavara hi there! No worries at all, I hope everything went well with your surgery and you're on the path to a quick recovery 🙏.

Looking forward to your updates on integrating the PR with the YOLO iOS app. If you have any questions or need any assistance along the way, feel free to reach out. Take care!

@pbanavara
Copy link
Author

@glenn-jocher quick update

I am able to get the model working in the yolo-ios-app just figuring out how to draw the rect and the keypoints. I am reusing your boundingboxes, but the rect is not painting. However, the processing is very slow. Around 3.5 to 4 fps. I don't think this ONNX model is practical at such speeds.

The object detection coreML model for comparison runs at almost 30fps. So the only scalable option is to convert the pose model into CoreML - I will start working on that and see if I can complete the body pose painting in the previewLayer.

@glenn-jocher
Copy link
Member

@pbanavara hi there! 🌟 Great to hear you've made progress with integrating the model into the yolo-ios-app. Drawing keypoints and bounding boxes certainly requires some tweaks, especially in rendering them efficiently.

The discrepancy in FPS between the ONNX model and the CoreML model is indeed significant. CoreML is highly optimized for iOS devices, providing a more seamless integration and faster processing times, so your plan to convert the pose model into CoreML sounds like the right approach to enhance performance.

For the drawing issue, ensure your drawing code executes on the main thread, something like this for keypoints:

DispatchQueue.main.async {
    // Your code to draw keypoints here
}

This might also solve the slow rendering issues if they're not related to model inference speed.

Keep us updated on your progress with the CoreML conversion, and feel free to reach out if you hit any snags! Looking forward to seeing the body pose detection running smoothly on the iOS app.

@pbanavara
Copy link
Author

@glenn-jocher I got the pose detection model to work.

They key part of the code is here

https://github.com/pbanavara/yolo-ios-app/blob/main/YOLO/ViewController.swift

lines 223 to 249

and

https://github.com/pbanavara/yolo-ios-app/blob/main/YOLO/Utils.swift

The fps is low but definitely usable. Attaching a video output to show the slow FPS ( Unfortunately the FPS label isn't showing as the preview layer is removed to save memory )

This includes a Podfile and a xcworkspace. Am not sure how to integrate this in the main app. If you have some ideas let me know. The code is mostly all yours except for the pose model changes. I have removed the CoreML related code as much as I could.

https://drive.google.com/file/d/13R9U0VXTIIgkwFy8ujqULbPZ14AnIJhe/view?usp=sharing

@glenn-jocher
Copy link
Member

@pbanavara Hey, fantastic work on getting the pose detection model up and running! 🎉 It's really exciting to see such progress. I took a look at your code snippets; you've done a great job integrating the pose model. The video demo also helps to visualize the performance; even if the FPS is on the lower side, it's a great starting point.

Regarding the fps label issue, it might be worth looking into overlay techniques that don't interfere with memory optimization strategies you've implemented. Sometimes, drawing directly on a separate overlay view can help.

For integrating this into the main yolo-ios-app, we could consider modularizing the pose detection feature as an optional model that can be toggled depending on the use case. This way, we can maintain the core functionality while also offering extended capabilities like pose detection to users who need it.

As for removing CoreML related code, that makes perfect sense given the shift towards ONNX for pose detection. However, as we discussed, exploring CoreML conversion for enhanced performance could be a valuable next step.

If you're open to it, perhaps you could create a pull request with your changes? That way, we can review the integration process together and find the best approach to merge it seamlessly. Also, don't hesitate to share any further thoughts or needs for assistance. Your contribution is greatly appreciated!

@pbanavara
Copy link
Author

Thank you @glenn-jocher I have submitted a PR. Fixed the FPS label issue as well. It was a dumb mistake on my part.

ultralytics/yolo-ios-app#13

@glenn-jocher
Copy link
Member

@pbanavara Awesome, thanks for submitting the PR and for fixing the FPS label issue! 🎉 I'll take a look at it shortly. Great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pose Related to pose Task
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants