CPU bottleneck when running the pose estimation demo #61

pramod-wick · 2021-06-13T07:53:30Z

Hi,

I am trying to track pose estimates using the "Tracking pedestrians with AlphaPose" demo as a reference. However I am using Nvidia trt-pose (https://github.com/NVIDIA-AI-IOT/trt_pose) instead of alpha pose as given in the demo.

The pose estimation alone runs well at around 25fps (having about 50% CPU usage), however when I include the pose tracking, my fps drops to about 10-12 fps and its definitely a CPU bottleneck as my CPU usage is around 98% when running tracking.
I would like to know if this is considered "normal" with the pose estimation tracking or I am doing something wrong in my end.

PC specs
GTX 1060 6GB
intel i7 8500 H
6GB ram

Thanks for the great work.

lweicker · 2021-06-15T08:38:22Z

Hi,

Do you track every skeleton points of each individual ? Are you dealing with a lot of people on your videos?

If not done yet, did you try with a sample video containing only one person? Are the results in terms of FPS the same?

pramod-wick · 2021-06-17T07:26:28Z

Hi,

I tested on a video with 1 person with tracking on 18 keypoints, which brings about 10fps. I also tested on a video with 4 people and got around 10fps too.
I also did some experimenting and tried tracking 6 or 8 keypoints for each person which increases the fps to about 20.

lweicker · 2021-06-17T07:44:20Z

Can you please share your code and sample video you are using?

I'll look into it.

pramod-wick · 2021-06-17T10:02:39Z

Hi,

Since I have already integrated this code into another project, I cannot share the entire thing. However I made a demo python code of what i'm trying to do. I've simplified a lot of the things and generated some fake keypoints. If I disable the tracker (enable_tracker=False), code runs with < 10% CPU utilization, however with enable_tracker=True I get 100% CPU utilization.

Also even though in my actual code there was a fps drop too, I cannot replicate this fps drop here, most likely because this demo code is too simple to cause any drop. However the high CPU utilization is very strange indeed.

https://gist.github.com/pramod-wick/7338033c8ce03285cc6e2662f746da56

lweicker · 2021-06-17T10:45:39Z

Hi,

I ran your code and got the same CPU utilization as you. I also did not notice any drop in fps.

Are you sure that this drop is not due to another part of your code?

I suggest you to profile your application so that you can see exactly which processes take most time. To do that, you can run the following command:
python -m cProfile --sort cumulative YOUR_APPPLICATION.py &> out.log

A file called "out.log" will be created. Inside it, you'll find a table with the cumulative times that each function took. Feel free to share this table if you need another eye.

pramod-wick · 2021-06-17T11:21:29Z

Hi,

Thanks for the quick response and support :), Could you confirm that with "enable_tracker=True", the CPU utilization is not increased? . I also ran the cProfile command, however do not see any problems in the log file (attached)
out.log

lweicker · 2021-06-17T11:35:34Z

Hi,

I confirm that the CPU utilization is also at 100% when with tracker enabled, but no drop in fps.

Regarding the profiling, I was referring to your initial application, i.e. the one for which you notice high FPS drops. Can you please perform a profiling of this application and analyze the log? Sorry for the misunderstanding.

pramod-wick · 2021-06-17T12:08:16Z

Hi,

I performed profiling for a 100 frames of my original application with the norfair pose tracker (posesort.log) vs with SORT tracker (abewley/sort) with bounding box input (sort.log).

There definitely is a increase in inference time, By analyzing the log file, I guess that the 100% CPU utilization in the norfair pose tracker causes my cnn (densenet) to take up extra processing time.

I would like to know if CPU utilization could be reduced in anyway or is this is expected behavior when tracking 18 points?

Thanks again for the support

joaqo · 2021-06-17T19:03:52Z

Hi @pramod-wick, you should be getting much more than 10fps with that machine. There are a ton of CPU speed optimization opportunities that we plan to tackle soon, but even without those you should be getting larger numbers than those.

Also, thank you @lweicker for the help with answering!

lweicker · 2021-06-18T07:18:20Z

Hi,

I checked your logs; the differences in process time for trtkeypoints.py:98(find_key_points) and densenet.py:XX(forward) between SORT and norfair are odd. I don't understand how the tracking could influence the process densenet at all. The input video for both logs was the same? Is your process done sequentially? You only changed the tracking algorithm between your two runs or is there anything else different?

For information, I also run trt_pose and norfair (among other processes) on one of my application. I run it with a Nvidia Jetson Xavier NX. My application is catching two rtsp streams @ 1080p, live. The average process time is about 18 fps (over 4500 iterations) with a minimum of 11 fps.

The only difference I can imagine is the number of tracked points. In my case, I only track 1 point per person (which consists in a combination of keypoints) instead of the 18 you mentioned. In each image process I track between 0 and 15 people detected.

For another case, I use norfair combined with ssd mobilenet v2 model (among other processes) on another Xavier NX. Process takes on average 15.5fps with minimum of about 9 fps, but this time for 4 rtsp input streams (3x 1080p, 1x 4K).

pramod-wick · 2021-06-18T14:25:52Z

@joaqo Yes definitely looking forward for those optimizations :)

@lweicker I agree the results are strange indeed, because the trt pose densenet is completely decoupled from the tracking, so the tracking should not influence it. It was the same program on the same video (100 frames), only difference was swapping the tracking algorithm, and yes the program is sequential.

Good to hear it running well on the jetson devices, In my case it may very well be the number of keypoints, since if I reduce number of keypoints to around 6 the CPU utilization drops <20% and no significant changes in fps.

dekked · 2022-08-31T22:49:20Z

The optimized Kalman filter is default since #145. We also have a profiling demo that uses TRT pose! Therefore, I am closing this issue.

Please open another issue should you encounter more performance issues in the future 💪

dekked added the help wanted Extra attention is needed label Jun 30, 2021

dekked closed this as completed Aug 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU bottleneck when running the pose estimation demo #61

CPU bottleneck when running the pose estimation demo #61

pramod-wick commented Jun 13, 2021

lweicker commented Jun 15, 2021

pramod-wick commented Jun 17, 2021

lweicker commented Jun 17, 2021

pramod-wick commented Jun 17, 2021

lweicker commented Jun 17, 2021

pramod-wick commented Jun 17, 2021

lweicker commented Jun 17, 2021

pramod-wick commented Jun 17, 2021 •

edited

joaqo commented Jun 17, 2021

lweicker commented Jun 18, 2021

pramod-wick commented Jun 18, 2021

dekked commented Aug 31, 2022

CPU bottleneck when running the pose estimation demo #61

CPU bottleneck when running the pose estimation demo #61

Comments

pramod-wick commented Jun 13, 2021

lweicker commented Jun 15, 2021

pramod-wick commented Jun 17, 2021

lweicker commented Jun 17, 2021

pramod-wick commented Jun 17, 2021

lweicker commented Jun 17, 2021

pramod-wick commented Jun 17, 2021

lweicker commented Jun 17, 2021

pramod-wick commented Jun 17, 2021 • edited

joaqo commented Jun 17, 2021

lweicker commented Jun 18, 2021

pramod-wick commented Jun 18, 2021

dekked commented Aug 31, 2022

pramod-wick commented Jun 17, 2021 •

edited