New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU bottleneck when running the pose estimation demo #61
Comments
Hi, Do you track every skeleton points of each individual ? Are you dealing with a lot of people on your videos? If not done yet, did you try with a sample video containing only one person? Are the results in terms of FPS the same? |
Hi, I tested on a video with 1 person with tracking on 18 keypoints, which brings about 10fps. I also tested on a video with 4 people and got around 10fps too. |
Can you please share your code and sample video you are using? I'll look into it. |
Hi, Since I have already integrated this code into another project, I cannot share the entire thing. However I made a demo python code of what i'm trying to do. I've simplified a lot of the things and generated some fake keypoints. If I disable the tracker (enable_tracker=False), code runs with < 10% CPU utilization, however with enable_tracker=True I get 100% CPU utilization. Also even though in my actual code there was a fps drop too, I cannot replicate this fps drop here, most likely because this demo code is too simple to cause any drop. However the high CPU utilization is very strange indeed. https://gist.github.com/pramod-wick/7338033c8ce03285cc6e2662f746da56 |
Hi, I ran your code and got the same CPU utilization as you. I also did not notice any drop in fps. Are you sure that this drop is not due to another part of your code? I suggest you to profile your application so that you can see exactly which processes take most time. To do that, you can run the following command: A file called "out.log" will be created. Inside it, you'll find a table with the cumulative times that each function took. Feel free to share this table if you need another eye. |
Hi, Thanks for the quick response and support :), Could you confirm that with "enable_tracker=True", the CPU utilization is not increased? . I also ran the cProfile command, however do not see any problems in the log file (attached) |
Hi, I confirm that the CPU utilization is also at 100% when with tracker enabled, but no drop in fps. Regarding the profiling, I was referring to your initial application, i.e. the one for which you notice high FPS drops. Can you please perform a profiling of this application and analyze the log? Sorry for the misunderstanding. |
Hi, I performed profiling for a 100 frames of my original application with the norfair pose tracker (posesort.log) vs with SORT tracker (abewley/sort) with bounding box input (sort.log). There definitely is a increase in inference time, By analyzing the log file, I guess that the 100% CPU utilization in the norfair pose tracker causes my cnn (densenet) to take up extra processing time. I would like to know if CPU utilization could be reduced in anyway or is this is expected behavior when tracking 18 points? Thanks again for the support |
Hi @pramod-wick, you should be getting much more than 10fps with that machine. There are a ton of CPU speed optimization opportunities that we plan to tackle soon, but even without those you should be getting larger numbers than those. Also, thank you @lweicker for the help with answering! |
Hi, I checked your logs; the differences in process time for For information, I also run trt_pose and norfair (among other processes) on one of my application. I run it with a Nvidia Jetson Xavier NX. My application is catching two rtsp streams @ 1080p, live. The average process time is about 18 fps (over 4500 iterations) with a minimum of 11 fps. The only difference I can imagine is the number of tracked points. In my case, I only track 1 point per person (which consists in a combination of keypoints) instead of the 18 you mentioned. In each image process I track between 0 and 15 people detected. For another case, I use norfair combined with ssd mobilenet v2 model (among other processes) on another Xavier NX. Process takes on average 15.5fps with minimum of about 9 fps, but this time for 4 rtsp input streams (3x 1080p, 1x 4K). |
@joaqo Yes definitely looking forward for those optimizations :) @lweicker I agree the results are strange indeed, because the trt pose densenet is completely decoupled from the tracking, so the tracking should not influence it. It was the same program on the same video (100 frames), only difference was swapping the tracking algorithm, and yes the program is sequential. Good to hear it running well on the jetson devices, In my case it may very well be the number of keypoints, since if I reduce number of keypoints to around 6 the CPU utilization drops <20% and no significant changes in fps. |
The optimized Kalman filter is default since #145. We also have a profiling demo that uses TRT pose! Therefore, I am closing this issue. Please open another issue should you encounter more performance issues in the future 💪 |
Hi,
I am trying to track pose estimates using the "Tracking pedestrians with AlphaPose" demo as a reference. However I am using Nvidia trt-pose (https://github.com/NVIDIA-AI-IOT/trt_pose) instead of alpha pose as given in the demo.
The pose estimation alone runs well at around 25fps (having about 50% CPU usage), however when I include the pose tracking, my fps drops to about 10-12 fps and its definitely a CPU bottleneck as my CPU usage is around 98% when running tracking.
I would like to know if this is considered "normal" with the pose estimation tracking or I am doing something wrong in my end.
PC specs
GTX 1060 6GB
intel i7 8500 H
6GB ram
Thanks for the great work.
The text was updated successfully, but these errors were encountered: