-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault after OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable using blas 1.1 through LSF #1668
Comments
Everything runs fine when I set
|
Is there any information about which OpenBLAS versions the |
twice the stack size exceeds gigabyte you ordered |
It does appear that thread creation fails, for when I use The conda blas version numbers are metapackages, but I don't actually understand the difference between @brada4 Do you mean, tune down the stack size or tune down the gigabyte I ordered? I don't understand ulimit terribly well, may imposing a lower limit reduce the risk of running out of resources? |
You may need to set lower stack , like |
@martin-frbg blas-1.1 and blas-1.0 are dlopen() configuration wrappers for openblas 0.2.20 |
@gerritholl did you try with a smaller stack size (or higher memory limit in LSF, depending on what your LSF admin allows) ? |
When I import the
Python
packagenumpy
withblas
being1.1-openblas
in a script running through LSF, Python raises aSystemError
and segmentation fault after repeated instances ofOpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
andOpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
. When I run the same script outside LSF, on the same machine with the same environment, it succeeds. It also succeeds (inside or outside LSF) when I use blas 1.0, either1.0-openblas
or1.0-mkl
.I run
bsub
as follows to submit a job to LSF:test.sh
is a wrapper to ensure I runtest2.sh
with a clear environment, in order to ensure identical circumstances whether I run inside or outside LSF:In
test2.sh
, I write out and set up some environmental information and run Python attempting toimport numpy
:Running this through LSF results in the following stdout:
(omitted output of
ldconfig -v
for brevity)And to stderr:
I also studied the output of
ldconfig -v
, but I don't know what to look for and it's too long to put here. However, I did compare the sorted outputs when running through LSF or not:When I run outside LSF, the stdout is
(output of
ldconfig -v
omitted for brevity)and the output to stderr is limited to the same ldconfig errors as before:
I'm running Python 3.6.3 with a conda environment sourced primarily from
anaconda
andconda-forge
. I've noticed previously that when I set a tightulimit -v
, thenimport numpy
fails with the sameOpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
. But presently, there is noulimit -v
set. The onlyulimit
differences between the LSF and non-LSF case are that for several properties, the limits within LSF are much more generous than outside LSF, so I can't see how ulimit limitations are causing the failures within LSF in this case. And in my previous case, I managed to reproduce the problem outside LSF as well.As stated, everything appears to work fine when using a
numpy
built onblas 1.0
(either openblas or mkl) rather thanblas 1.1
(I can find only openblas, no mkl). There must be some difference in environment between running outside or inside LSF inblas 1.1
(openblas), but I can't pin it down. What else may I look at?I do not have LSF administrator access.
The text was updated successfully, but these errors were encountered: