Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU utils occupies a lot when inference #352

Closed
sfchen94 opened this issue Sep 9, 2022 · 12 comments
Closed

CPU utils occupies a lot when inference #352

sfchen94 opened this issue Sep 9, 2022 · 12 comments

Comments

@sfchen94
Copy link

sfchen94 commented Sep 9, 2022

Hi,
when inferencing, I notice that the program occupies a lot of CPU load.

Before running the program:
ori

After running the program:
set the thread as 1
1

set the thread as 30
image

It seems that there is no improvement.

Is there any way to reduce CPU utils?

@2006pmach
Copy link
Collaborator

Hm didn't realize that before. I currently don't know what is causing this. Let me know if you found the issue.

@srama2512
Copy link

@sfchen94 - Which tracker are you running?

@sfchen94
Copy link
Author

sfchen94 commented Sep 15, 2022

@srama2512
Dimp50,
but I guess other trackers may have the similar problem

@sfchen94
Copy link
Author

sfchen94 commented Sep 15, 2022

BTW, this problem only occurs when inferencing,
The training stage has no this kind of problem.

@srama2512
Copy link

@sfchen94 - Got it. I'm noticing high CPU usage during inference with KYS tracker as well. The GPU usage is quite low.

@2006pmach
Copy link
Collaborator

Hm I am still not sure why this is happening. Maybe it is related to opencv. What helps to reduce the load on the CPUs is limiting the number of CPUs that can be used by the python script with taskset --cpu-list 0-1 this limits the usage to two cores. So running for example taskset --cpu-list 0-1 python run_tracker.py tomp tomp50 lasot reduces the CPU workload without decreasing the FPS of the tracker but since this is not measuring the data loading time the overall throughput might be lower. Maybe @goutamgmb has an idea?

@sfchen94
Copy link
Author

@2006pmach
Cool. It works!
But why can it reduce CPU but still can have the same FPS 😆

@2006pmach
Copy link
Collaborator

2006pmach commented Sep 15, 2022

So to compute the FPS we only measure the time that the tracker takes here namely the call out = tracker.track(image, info) everything else is not measured to compute the FPS. So it could be that the overall runtime of the scripts is higher now since for example the data loading time could be increased (but this is not reflected in the FPS). I did't check this though. For me it is still not clear what is causing the high CPU load and what these cores are doing exactly...

@srama2512
Copy link

@2006pmach - Thanks for the taskset solution. It appears to be working right now. I restricted the CPU usage to 0-39 in my 80-core cluster machine. Interestingly, I'm observing that more kernel threads (red) are occupying the CPU load when compared to normal threads (green). Is this suggestive of anything specific to you?
image

@sfchen94
Copy link
Author

sfchen94 commented Sep 15, 2022

Yes, the real problem was not solved.
For example, I have 40 CPU cores in total.
Initially, the program needs 50% CPU loading.
But when I force it to use CPU #1-10.
It definitely occupies a maximum of 25% CPU loading.

But actually, this case assigned 50% CPU loading to 25% CPU core. The program still needs the same CPU loading after we use taskset.

@sfchen94 sfchen94 reopened this Sep 15, 2022
@Little-Podi
Copy link

Hi. I think the following code may help you to solve this issue. In my case, the CPU occupation can be reduced by inserting these code, and the inference speed can also be improved a little.

import torch

cpu_num = 8  # Num of CPUs you want to use
os.environ['OMP_NUM_THREADS'] = str(cpu_num)
os.environ['OPENBLAS_NUM_THREADS'] = str(cpu_num)
os.environ['MKL_NUM_THREADS'] = str(cpu_num)
os.environ['VECLIB_MAXIMUM_THREADS'] = str(cpu_num)
os.environ['NUMEXPR_NUM_THREADS'] = str(cpu_num)
torch.set_num_threads(cpu_num)

@sfchen94
Copy link
Author

Hi. I think the following code may help you to solve this issue. In my case, the CPU occupation can be reduced by inserting these code, and the inference speed can also be improved a little.

import torch

cpu_num = 8  # Num of CPUs you want to use
os.environ['OMP_NUM_THREADS'] = str(cpu_num)
os.environ['OPENBLAS_NUM_THREADS'] = str(cpu_num)
os.environ['MKL_NUM_THREADS'] = str(cpu_num)
os.environ['VECLIB_MAXIMUM_THREADS'] = str(cpu_num)
os.environ['NUMEXPR_NUM_THREADS'] = str(cpu_num)
torch.set_num_threads(cpu_num)

This method can actually ease the CPU utils,
so I temporally close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants