Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU bottleneck when running the pose estimation demo #61

Closed
pramod-wick opened this issue Jun 13, 2021 · 12 comments
Closed

CPU bottleneck when running the pose estimation demo #61

pramod-wick opened this issue Jun 13, 2021 · 12 comments
Labels
help wanted Extra attention is needed

Comments

@pramod-wick
Copy link

Hi,

I am trying to track pose estimates using the "Tracking pedestrians with AlphaPose" demo as a reference. However I am using Nvidia trt-pose (https://github.com/NVIDIA-AI-IOT/trt_pose) instead of alpha pose as given in the demo.

The pose estimation alone runs well at around 25fps (having about 50% CPU usage), however when I include the pose tracking, my fps drops to about 10-12 fps and its definitely a CPU bottleneck as my CPU usage is around 98% when running tracking.
I would like to know if this is considered "normal" with the pose estimation tracking or I am doing something wrong in my end.

PC specs
GTX 1060 6GB
intel i7 8500 H
6GB ram

Thanks for the great work.

@lweicker
Copy link

Hi,

Do you track every skeleton points of each individual ? Are you dealing with a lot of people on your videos?

If not done yet, did you try with a sample video containing only one person? Are the results in terms of FPS the same?

@pramod-wick
Copy link
Author

Hi,

I tested on a video with 1 person with tracking on 18 keypoints, which brings about 10fps. I also tested on a video with 4 people and got around 10fps too.
I also did some experimenting and tried tracking 6 or 8 keypoints for each person which increases the fps to about 20.

@lweicker
Copy link

Can you please share your code and sample video you are using?

I'll look into it.

@pramod-wick
Copy link
Author

Hi,

Since I have already integrated this code into another project, I cannot share the entire thing. However I made a demo python code of what i'm trying to do. I've simplified a lot of the things and generated some fake keypoints. If I disable the tracker (enable_tracker=False), code runs with < 10% CPU utilization, however with enable_tracker=True I get 100% CPU utilization.

Also even though in my actual code there was a fps drop too, I cannot replicate this fps drop here, most likely because this demo code is too simple to cause any drop. However the high CPU utilization is very strange indeed.

https://gist.github.com/pramod-wick/7338033c8ce03285cc6e2662f746da56

@lweicker
Copy link

Hi,

I ran your code and got the same CPU utilization as you. I also did not notice any drop in fps.

Are you sure that this drop is not due to another part of your code?

I suggest you to profile your application so that you can see exactly which processes take most time. To do that, you can run the following command:
python -m cProfile --sort cumulative YOUR_APPPLICATION.py &> out.log

A file called "out.log" will be created. Inside it, you'll find a table with the cumulative times that each function took. Feel free to share this table if you need another eye.

@pramod-wick
Copy link
Author

Hi,

Thanks for the quick response and support :), Could you confirm that with "enable_tracker=True", the CPU utilization is not increased? . I also ran the cProfile command, however do not see any problems in the log file (attached)
out.log

@lweicker
Copy link

Hi,

I confirm that the CPU utilization is also at 100% when with tracker enabled, but no drop in fps.

Regarding the profiling, I was referring to your initial application, i.e. the one for which you notice high FPS drops. Can you please perform a profiling of this application and analyze the log? Sorry for the misunderstanding.

@pramod-wick
Copy link
Author

pramod-wick commented Jun 17, 2021

Hi,

I performed profiling for a 100 frames of my original application with the norfair pose tracker (posesort.log) vs with SORT tracker (abewley/sort) with bounding box input (sort.log).

There definitely is a increase in inference time, By analyzing the log file, I guess that the 100% CPU utilization in the norfair pose tracker causes my cnn (densenet) to take up extra processing time.

I would like to know if CPU utilization could be reduced in anyway or is this is expected behavior when tracking 18 points?

Thanks again for the support

@joaqo
Copy link
Collaborator

joaqo commented Jun 17, 2021

Hi @pramod-wick, you should be getting much more than 10fps with that machine. There are a ton of CPU speed optimization opportunities that we plan to tackle soon, but even without those you should be getting larger numbers than those.

Also, thank you @lweicker for the help with answering!

@lweicker
Copy link

Hi,

I checked your logs; the differences in process time for trtkeypoints.py:98(find_key_points) and densenet.py:XX(forward) between SORT and norfair are odd. I don't understand how the tracking could influence the process densenet at all. The input video for both logs was the same? Is your process done sequentially? You only changed the tracking algorithm between your two runs or is there anything else different?

For information, I also run trt_pose and norfair (among other processes) on one of my application. I run it with a Nvidia Jetson Xavier NX. My application is catching two rtsp streams @ 1080p, live. The average process time is about 18 fps (over 4500 iterations) with a minimum of 11 fps.

The only difference I can imagine is the number of tracked points. In my case, I only track 1 point per person (which consists in a combination of keypoints) instead of the 18 you mentioned. In each image process I track between 0 and 15 people detected.

For another case, I use norfair combined with ssd mobilenet v2 model (among other processes) on another Xavier NX. Process takes on average 15.5fps with minimum of about 9 fps, but this time for 4 rtsp input streams (3x 1080p, 1x 4K).

@pramod-wick
Copy link
Author

@joaqo Yes definitely looking forward for those optimizations :)

@lweicker I agree the results are strange indeed, because the trt pose densenet is completely decoupled from the tracking, so the tracking should not influence it. It was the same program on the same video (100 frames), only difference was swapping the tracking algorithm, and yes the program is sequential.

Good to hear it running well on the jetson devices, In my case it may very well be the number of keypoints, since if I reduce number of keypoints to around 6 the CPU utilization drops <20% and no significant changes in fps.

@dekked dekked added the help wanted Extra attention is needed label Jun 30, 2021
@dekked
Copy link
Member

dekked commented Aug 31, 2022

The optimized Kalman filter is default since #145. We also have a profiling demo that uses TRT pose! Therefore, I am closing this issue.

Please open another issue should you encounter more performance issues in the future 💪

@dekked dekked closed this as completed Aug 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants