Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nthreads error when using GPUs (specific functions in tomopy that get error: remove_stripe and minus_log) #576

Closed
lipigupta opened this issue Feb 17, 2022 · 9 comments
Labels
question Troubleshooting requests

Comments

@lipigupta
Copy link

lipigupta commented Feb 17, 2022

Describe the problem

I am running tomopy using GPUs and the remove_stripe_fw and minus_log raise the following error:
remove_stripe_fw(): Error. nthreads must be a positive integer
minus_log(): Error. nthreads cannot be larger than environment variable "NUMEXPR_MAX_THREADS" (1)

When I run in notebook, it shows the error but is able to continue. When I submit my script as a batch job, it will not complete.

I noticed that nthreads is set based on the number of CPUs available in the multiprocessing utils, but not sure why these errors still arise (or why it thinks nthreads is not a positive number.) I also set the pool_size and ncores according to the number of GPUs I am using. I still get these errors.

I have tried setting :

os.environ.setdefault("NUMEXPR_MAX_THREADS", "999")
os.environ.setdefault("OMP_NUM_THREADS", "999")

Error:

Error.  nthreads cannot be larger than environment variable "NUMEXPR_MAX_THREADS" (1)
(took 0.27 seconds)
remove_stripe_fw 
Error.  nthreads must be a positive integer
(took 0.43 seconds)
proj chunk 1 of 27
phase_retrieval (took 0.42 seconds)
...

Platform Information:

  • NERSC Jupterlab
  • Python Version 3.9.7
  • TomoPy Version 1.10.1

Additional context
I noticed some older issues from 2018 but they may have gone stale - maybe I missed the solution to this. This is a collaborator's code, I am just trying to provide GPU support at NERSC.

@lipigupta lipigupta added the question Troubleshooting requests label Feb 17, 2022
@carterbox
Copy link
Member

Could you please reference those issues here if you think they are relevant?

@lipigupta
Copy link
Author

#307

@carterbox
Copy link
Member

These errors are not from TomoPy. They are coming from numexpr. I don't know the solution to this problem, but from what you have described, it looks like your environment variable setting is not successful. You have tried to set NUMEXPR_MAX_THREADS=999, but numexpr is still reporting that this variable is set to 1.

@lipigupta
Copy link
Author

But I think the issue is it is getting overwritten in the remove_stripe_fw and various other functions (in tomopy).

@lipigupta
Copy link
Author

For example in:

with mproc.set_numexpr_threads(ncore):

@carterbox
Copy link
Member

carterbox commented Feb 17, 2022

The mproc.set_numexpr_threads(ncore) context manager is used around minus_log(), but not around remove_stripe_fw().

Also, set_numexpr_threads() changes a separate parameter from NUMEXPR_MAX_THREADS. One is the current number of threads to use for a task the other is a hard limit.

https://numexpr.readthedocs.io/projects/NumExpr3/en/latest/user_guide.html?highlight=NUMEXPR_MAX_THREADS#threadpool-configuration

@lipigupta
Copy link
Author

lipigupta commented Feb 17, 2022

In any case it seems to be changing nthreads, which is causing the error. It seems to be changing it to some number greater than the default max - how can I prevent this?

Also, any clue why nthreads is getting set to a negative number or non-integer in remove_stripe_fw?
The error I get is:
remove_stripe_fw(): Error. nthreads must be a positive integer

@carterbox
Copy link
Member

There are two problems. I'm still not sure that they are caused by the same issue. Let's focus on one for now. The default max number of threads for numexpr is 64 according to the docs, but error you reported says the max number of threads has been set to 1.

You said you used this expression to set the NUMEXPR_MAX_THREADS environment variable:

os.environ.setdefault("NUMEXPR_MAX_THREADS", "999")

but this expression doesn't do anything if NUMEXPR_MAX_THREADS is already set. For example, if NUMEXPR_MAX_THREADS was already set to 1, then it would remain 1. Instead you should do something like this:

os.environ["NUMEXPR_MAX_THREADS"] = "999"

Because this would override and previously set environment variable.

NUMEXPR_MAX_THREADS is not mentioned in the tomopy source code anywhere, so this variable is not being set by TomoPy to 1. If you want NUMEXPR_MAX_THREADS to be 1 and don't want TomoPy to try to spawn more than 1 thread, then you should ensure that ncore=1 is explicitly passed to all TomoPy function calls who accept ncore. Otherwise, it looks like the default value of ncore is multiprocessing.cpu_count() which on any modern computer will be more than 1.

Try each of those approaches and let's see if that reduces your errors. Then we can try to figure out why there is a non-positive number of threads.

@lipigupta
Copy link
Author

Thank you! I wasn't aware that the "setdefault" in os.environ.setdefault("NUMEXPR_MAX_THREADS", "999") won't in fact reset the value. I no longer see the thread errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Troubleshooting requests
Projects
None yet
Development

No branches or pull requests

2 participants