Segmentation fault after OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable using blas 1.1 through LSF #1668

gerritholl · 2018-07-05T11:59:32Z

When I import the Python package numpy with blas being 1.1-openblas in a script running through LSF, Python raises a SystemError and segmentation fault after repeated instances of OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable and OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max. When I run the same script outside LSF, on the same machine with the same environment, it succeeds. It also succeeds (inside or outside LSF) when I use blas 1.0, either 1.0-openblas or 1.0-mkl.

I run bsub as follows to submit a job to LSF:

bsub -q short-serial -W 00:01 -R "rusage[mem=1000]" -M 1000 -cwd $HOME -oo ~/test.lsf.out -eo ~/test.lsf.err -J test $HOME/test.sh

test.sh is a wrapper to ensure I run test2.sh with a clear environment, in order to ensure identical circumstances whether I run inside or outside LSF:

$ cat test.sh
#!/bin/sh
env -i ~/test2.sh --noprofile --norc

In test2.sh, I write out and set up some environmental information and run Python attempting to import numpy:

$ cat test2.sh
export
ulimit -a
ldconfig -v
export PATH= # needed to avoid https://github.com/conda/conda/issues/7486
.  /group_workspaces/cems2/fiduceo/Users/gholl/anaconda3/etc/profile.d/conda.sh
conda activate
conda activate FCDR
python -c "import numpy; print('Success 1')"
python ~/mwe.py

Running this through LSF results in the following stdout:

$ cat test.lsf.out
Sender: LSF System <lsfadmin@host334.jc.rl.ac.uk>
Subject: Job 2475445: <test> in cluster <lotus> Exited

Job <test> was submitted from host <host293.jc.rl.ac.uk> by user <gholl> in cluster <lotus> at Wed Jul  4 18:44:42 2018.
Job was executed on host(s) <host334.jc.rl.ac.uk>, in queue <short-serial>, as user <gholl> in cluster <lotus> at Wed Jul  4 18:44:42 2018.
</home/users/gholl> was used as the home directory.
</home/users/gholl> was used as the working directory.
Started at Wed Jul  4 18:44:42 2018.
Terminated at Wed Jul  4 18:44:44 2018.
Results reported at Wed Jul  4 18:44:44 2018.

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
/home/users/gholl/test.sh
------------------------------------------------------------

Exited with exit code 139.

Resource usage summary:

    CPU time :                                   0.60 sec.
    Max Memory :                                 -
    Average Memory :                             -
    Total Requested Memory :                     1000.00 MB
    Delta Memory :                               -
    Max Swap :                                   -
    Max Processes :                              -
    Max Threads :                                -
    Run time :                                   2 sec.
    Turnaround time :                            2 sec.

The output (if any) follows:

export OLDPWD
export PWD="/home/users/gholl"
export SHLVL="1"
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 1032189
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 4096
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8589930496
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1032189
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

(omitted output of ldconfig -v for brevity)

PS:

Read file </home/users/gholl/test.lsf.err> for stderr output of this job.

And to stderr:

$ cat test.lsf.err
/sbin/ldconfig: /etc/ld.so.conf.d/kernel-2.6.32-696.23.1.el6.x86_64.conf:6: duplicate hwcap 1 nosegneg
/sbin/ldconfig: /etc/ld.so.conf.d/kernel-2.6.32-754.el6.x86_64.conf:6: duplicate hwcap 1 nosegneg
/sbin/ldconfig: /opt/platform_mpi/lib/linux_amd64/libhpmpi.so is not an ELF file - it has the wrong magic bytes at the start.

/sbin/ldconfig: Can't create temporary cache file /etc/ld.so.cache~: Permission denied
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/group_workspaces/cems2/fiduceo/Users/gholl/anaconda3/envs/FCDR/lib/python3.6/site-packages/numpy/__init__.py", line 142, in <module>
    from . import add_newdocs
  File "/group_workspaces/cems2/fiduceo/Users/gholl/anaconda3/envs/FCDR/lib/python3.6/site-packages/numpy/add_newdocs.py", line 13, in <module>
    from numpy.lib import add_newdoc
  File "/group_workspaces/cems2/fiduceo/Users/gholl/anaconda3/envs/FCDR/lib/python3.6/site-packages/numpy/lib/__init__.py", line 8, in <module>
    from .type_check import *
  File "/group_workspaces/cems2/fiduceo/Users/gholl/anaconda3/envs/FCDR/lib/python3.6/site-packages/numpy/lib/type_check.py", line 11, in <module>
    import numpy.core.numeric as _nx
  File "/group_workspaces/cems2/fiduceo/Users/gholl/anaconda3/envs/FCDR/lib/python3.6/site-packages/numpy/core/__init__.py", line 16, in <module>
    from . import multiarray
SystemError: initialization of multiarray raised unreported exception
/home/users/gholl/test2.sh: line 8: 52786 Segmentation fault      (core dumped) python -c "import numpy; print('Success 1')"
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1032189 current, 1032189 max
Traceback (most recent call last):
  File "/home/users/gholl/mwe.py", line 2, in <module>
    import numpy
  File "/group_workspaces/cems2/fiduceo/Users/gholl/anaconda3/envs/FCDR/lib/python3.6/site-packages/numpy/__init__.py", line 142, in <module>
    from . import add_newdocs
  File "/group_workspaces/cems2/fiduceo/Users/gholl/anaconda3/envs/FCDR/lib/python3.6/site-packages/numpy/add_newdocs.py", line 13, in <module>
    from numpy.lib import add_newdoc
  File "/group_workspaces/cems2/fiduceo/Users/gholl/anaconda3/envs/FCDR/lib/python3.6/site-packages/numpy/lib/__init__.py", line 8, in <module>
    from .type_check import *
  File "/group_workspaces/cems2/fiduceo/Users/gholl/anaconda3/envs/FCDR/lib/python3.6/site-packages/numpy/lib/type_check.py", line 11, in <module>
    import numpy.core.numeric as _nx
  File "/group_workspaces/cems2/fiduceo/Users/gholl/anaconda3/envs/FCDR/lib/python3.6/site-packages/numpy/core/__init__.py", line 16, in <module>
    from . import multiarray
SystemError: initialization of multiarray raised unreported exception
/home/users/gholl/test2.sh: line 9: 52789 Segmentation fault      (core dumped) python ~/mwe.py

I also studied the output of ldconfig -v, but I don't know what to look for and it's too long to put here. However, I did compare the sorted outputs when running through LSF or not:

$ diff -u <(sort test.lsf.out_core) <(sort test.nolsf.out_core)
--- /dev/fd/63  2018-07-04 18:56:05.405440986 +0100
+++ /dev/fd/62  2018-07-04 18:56:05.405440986 +0100
@@ -1,4 +1,4 @@
-core file size          (blocks, -c) unlimited
+core file size          (blocks, -c) 0
 cpu time               (seconds, -t) unlimited
 data seg size           (kbytes, -d) unlimited
 export OLDPWD
@@ -102,9 +102,12 @@
        libBrokenLocale.so.1 -> libBrokenLocale-2.12.so
        libBrokenLocale.so.1 -> libBrokenLocale-2.12.so
        libbtf.so.1 -> libbtf.so.1.1.0
+       libbtparser.so.2 -> libbtparser.so.2.2.2
        libbz2.so.1 -> libbz2.so.1.0.4
        libcairo.so.2 -> libcairo.so.2.10800.8
        libcamd.so.2 -> libcamd.so.2.2.0
+       libcanberra-gtk.so.0 -> libcanberra-gtk.so.0.1.5
+       libcanberra.so.0 -> libcanberra.so.0.2.1
        libcanna16.so.1 -> libcanna16.so.1.2.0
        libcanna.so.1 -> libcanna.so.1.2.0
        libcap-ng.so.0 -> libcap-ng.so.0.0.0
@@ -116,7 +119,7 @@
        libcdt.so.5 -> libcdt.so.5.0.0
        libcfitsio.so.0 -> libcfitsio.so.0
        libcgraph.so.6 -> libcgraph.so.6.0.0
-       libcgroup.so.1 -> libcgroup.so.1.0.40
+       libCharLS.so.1 -> libCharLS.so.1.0
        libcholmod.so.1 -> libcholmod.so.1.7.1
        libcidn.so.1 -> libcidn-2.12.so
        libcidn.so.1 -> libcidn-2.12.so
@@ -188,6 +191,7 @@
        libeggdbus-1.so.0 -> libeggdbus-1.so.0.0.0
        libEGL.so.1 -> libEGL.so.1.0.0
        libelf.so.1 -> libelf-0.164.so
+       libenchant.so.1 -> libenchant.so.1.5.0
        libepoxy.so.0 -> libepoxy.so.0.0.0
        libesoobS.so.2 -> libesoobS.so.2.0.0
        libevent-1.4.so.2 -> libevent-1.4.so.2.1.3
@@ -263,6 +267,7 @@
        libgmodule-2.0.so.0 -> libgmodule-2.0.so.0.2800.8
        libgmp.so.3 -> libgmp.so.3.5.0
        libgmpxx.so.4 -> libgmpxx.so.4.1.0
+       libgnomecanvas-2.so.0 -> libgnomecanvas-2.so.0.2600.0
        libgnutls-extra.so.26 -> libgnutls-extra.so.26.22.6
        libgnutls.so.26 -> libgnutls.so.26.22.6
        libgnutlsxx.so.26 -> libgnutlsxx.so.26.14.12
@@ -357,6 +362,7 @@
        libgstvideo-0.10.so.0 -> libgstvideo-0.10.so.0.20.0
        libgta.so.0 -> libgta.so.0.0.1
        libgthread-2.0.so.0 -> libgthread-2.0.so.0.2800.8
+       libgtksourceview-2.0.so.0 -> libgtksourceview-2.0.so.0.0.0
        libgtk-x11-2.0.so.0 -> libgtk-x11-2.0.so.0.2400.23
        libgtrtst.so.2 -> libgtrtst.so.2.0.0
        libgudev-1.0.so.0 -> libgudev-1.0.so.0.0.1
@@ -378,9 +384,9 @@
        libhwloc.so.4 -> libhwloc.so
 /lib/i686: (hwcap: 0x0008000000000000)
 /lib/i686/nosegneg: (hwcap: 0x0028000000000000)
-       libibmad.so.5 -> libibmad.so.5.4.0
+       libibmad.so.5 -> libibmad.so.5.5.0
        libibnetdisc.so.5 -> libibnetdisc.so.5.3.0
-       libibumad.so.3 -> libibumad.so.3.0.4
+       libibumad.so.3 -> libibumad.so.3.1.0
        libibverbs.so.1 -> libibverbs.so.1.0.0
        libICE.so.6 -> libICE.so.6.3.0
        libicudata.so.42 -> libicudata.so.42.1
@@ -499,6 +505,7 @@
        libnih.so.1 -> libnih.so.1.0.0
        libnl.so.1 -> libnl.so.1.1.4
        libnn.so.2 -> libnn.so.2.0.0
+       libnotify.so.1 -> libnotify.so.1.2.3
        libnsl.so.1 -> libnsl-2.12.so
        libnsl.so.1 -> libnsl-2.12.so
        libnspr4.so -> libnspr4.so
@@ -542,17 +549,17 @@
        libopcodes-2.20.51.0.2-5.48.el6.so -> libopcodes-2.20.51.0.2-5.48.el6.so
        libopenjp2.so.7 -> libopenjp2.so.2.3.0
        libopenjpeg.so.2 -> libopenjpeg.so.2.1.3.0
-       libopensm.so.5 -> libopensm.so.5.2.0
+       libopensm.so.12 -> libopensm.so.12.0.0
        liboplodbcS.so.2 -> liboplodbcS.so.2.0.0
        liboraodbcS.so.2 -> liboraodbcS.so.2.0.0
        libORBit-2.so.0 -> libORBit-2.so.0.1.0
        libORBitCosNaming-2.so.0 -> libORBitCosNaming-2.so.0.1.0
        libORBit-imodule-2.so.0 -> libORBit-imodule-2.so.0.0.0
-       libosmcomp.so.3 -> libosmcomp.so.3.0.8
+       libosmcomp.so.3 -> libosmcomp.so.3.0.6
        libOSMesa16.so.6 -> libOSMesa16.so.6.5.3
        libOSMesa32.so.6 -> libOSMesa32.so.6.5.3
        libOSMesa.so.6 -> libOSMesa.so.6.5.3
-       libosmvendor.so.3 -> libosmvendor.so.3.0.9
+       libosmvendor.so.3 -> libosmvendor.so.3.0.8
        libossp-uuid.so.16 -> libossp-uuid.so.16.0.21
        libotf.so.0 -> libotf.so.0.0.0
        libp11-kit.so.0 -> libp11-kit.so.0.0.0
@@ -676,7 +683,6 @@
        libsensors.so.4 -> libsensors.so.4.2.0
        libsepol.so.1 -> libsepol.so.1
        libserf-1.so.1 -> libserf-1.so.1.3.0
-       libsgutils2.so.2 -> libsgutils2.so.2.0.0
        libshiboken-python2.7.so.1.2 -> libshiboken-python2.7.so.1.2.1
        libshp.so.1 -> libshp.so.1.0.1
        libslang.so.2 -> libslang.so.2.2.1
@@ -686,6 +692,7 @@
        libsndfile.so.1 -> libsndfile.so.1.0.20
        libsnmp.so.20 -> libsnmp.so.20.0.0
        libsoftokn3.so -> libsoftokn3.so
+       libspatialite.so.2 -> libspatialite.so.2.0.4
        libspqr.so.1 -> libspqr.so.1.1.2
        libsqlite3.so.0 -> libsqlite3.so.0.8.6
        libssh2.so.1 -> libssh2.so.1.0.1
@@ -754,6 +761,7 @@
        libvorbisenc.so.2 -> libvorbisenc.so.2.0.6
        libvorbisfile.so.3 -> libvorbisfile.so.3.3.2
        libvorbis.so.0 -> libvorbis.so.0.4.3
+       libvpx.so.1 -> libvpx.so.1.3.0
        libvte.so.9 -> libvte.so.9.2501.0
        libwbclient.so.0 -> libwbclient.so.0
        libwebpdecoder.so.1 -> libwebpdecoder.so.1.0.3
@@ -764,6 +772,7 @@
        libwlm-nosched.so -> libwlm-nosched.so
        libwmf-0.2.so.7 -> libwmf-0.2.so.7.1.0
        libwmflite-0.2.so.7 -> libwmflite-0.2.so.7.0.1
+       libwnck-1.so.22 -> libwnck-1.so.22.3.23
        libwrap.so.0 -> libwrap.so.0.7.6
        libwx_baseu-2.8.so.0 -> libwx_baseu-2.8.so.0.8.0
        libwx_baseu-3.0.so.0 -> libwx_baseu-3.0.so.0.2.0
@@ -800,6 +809,7 @@
        libX11.so.6 -> libX11.so.6.3.0
        libX11-xcb.so.1 -> libX11-xcb.so.1.0.0
        libX11-xcb.so.1 -> libX11-xcb.so.1.0.0
+       libx86.so.1 -> libx86.so.1
        libXau.so.6 -> libXau.so.6.0.0
        libXau.so.6 -> libXau.so.6.0.0
        libXaw3d.so.7 -> libXaw3d.so.7.0
@@ -842,6 +852,7 @@
        libxcb.so.1 -> libxcb.so.1.1.0
        libxcb-sync.so.1 -> libxcb-sync.so.1.0.0
        libxcb-sync.so.1 -> libxcb-sync.so.1.0.0
+       libxcb-util.so.1 -> libxcb-util.so.1.0.0
        libxcb-xevie.so.0 -> libxcb-xevie.so.0.0.0
        libxcb-xevie.so.0 -> libxcb-xevie.so.0.0.0
        libxcb-xf86dri.so.0 -> libxcb-xf86dri.so.0.0.0
@@ -889,6 +900,7 @@
        libXp.so.6 -> libXp.so.6.2.0
        libXrandr.so.2 -> libXrandr.so.2.2.0
        libXrender.so.1 -> libXrender.so.1.3.0
+       libXRes.so.1 -> libXRes.so.1.0.0
        libxslt.so.1 -> libxslt.so.1.1.26
        libxtables.so.4 -> libxtables.so.4.0.0-1.4.7
        libXt.so.6 -> libXt.so.6.0.0
@@ -897,20 +909,21 @@
        libXxf86dga.so.1 -> libXxf86dga.so.1.0.0
        libXxf86misc.so.1 -> libXxf86misc.so.1.1.0
        libXxf86vm.so.1 -> libXxf86vm.so.1.0.0
-       libyaml-0.so.2 -> libyaml-0.so.2.0.4
        libz.so.1 -> libz.so.1.2.3
 max locked memory       (kbytes, -l) unlimited
 max memory size         (kbytes, -m) unlimited
-max user processes              (-u) 1032189
-open files                      (-n) 4096
+max user processes              (-u) 1024
+open files                      (-n) 48000
 /opt/platform_mpi/lib/linux_amd64:
        p11-kit-trust.so -> libnssckbi.so
-pending signals                 (-i) 1032189
+pending signals                 (-i) 515955
 pipe size            (512 bytes, -p) 8
 POSIX message queues     (bytes, -q) 819200
 real-time priority              (-r) 0
 scheduling priority             (-e) 0
-stack size              (kbytes, -s) 8589930496
+stack size              (kbytes, -s) 2097151
+Success
+Success 1
 /usr/lib:
 /usr/lib64:
 /usr/lib64/atlas:

When I run outside LSF, the stdout is

$ env -i ~/test2.sh --noprofile --norc
export OLDPWD
export PWD="/home/users/gholl"
export SHLVL="1"
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 515955
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 48000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 2097151
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

(output of ldconfig -v omitted for brevity)

Success 1
Success

and the output to stderr is limited to the same ldconfig errors as before:

/sbin/ldconfig: /etc/ld.so.conf.d/kernel-2.6.32-696.23.1.el6.x86_64.conf:6: duplicate hwcap 1 nosegneg
/sbin/ldconfig: /etc/ld.so.conf.d/kernel-2.6.32-754.el6.x86_64.conf:6: duplicate hwcap 1 nosegneg
/sbin/ldconfig: /opt/platform_mpi/lib/linux_amd64/libhpmpi.so is not an ELF file - it has the wrong magic bytes at the start.

/sbin/ldconfig: Can't create temporary cache file /etc/ld.so.cache~: Permission denied

I'm running Python 3.6.3 with a conda environment sourced primarily from anaconda and conda-forge. I've noticed previously that when I set a tight ulimit -v, then import numpy fails with the same OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable. But presently, there is no ulimit -v set. The only ulimit differences between the LSF and non-LSF case are that for several properties, the limits within LSF are much more generous than outside LSF, so I can't see how ulimit limitations are causing the failures within LSF in this case. And in my previous case, I managed to reproduce the problem outside LSF as well.

As stated, everything appears to work fine when using a numpy built on blas 1.0 (either openblas or mkl) rather than blas 1.1 (I can find only openblas, no mkl). There must be some difference in environment between running outside or inside LSF in blas 1.1 (openblas), but I can't pin it down. What else may I look at?

I do not have LSF administrator access.

The text was updated successfully, but these errors were encountered:

gerritholl · 2018-07-05T13:49:34Z

Everything runs fine when I set

export OMP_NUM_THREADS=1
export USE_SIMPLE_THREADED_LEVEL3= 1

martin-frbg · 2018-07-05T14:43:43Z

Is there any information about which OpenBLAS versions the 1.1-openblas and 1.0-openblas correspond to ? The error suggests that creation of new threads actually failed due to a temporary system limitation (which may or may not have been due to the ulimit that is printed for information - perhaps you are actually hitting some memory limit imposed by LSF).
This message (and checking for success of pthread_create in general) was added more than two years ago (version 0.2.15 or thereabouts) in the context of issue #668

brada4 · 2018-07-05T15:07:10Z

twice the stack size exceeds gigabyte you ordered
probably tune it down to 10MB or so to not exceed LSF quota (and tell LSF admin about it)

gerritholl · 2018-07-05T15:38:25Z

It does appear that thread creation fails, for when I use 1.0-openblas I get a failure with matplotlib failing to initiate a thread.

The conda blas version numbers are metapackages, but I don't actually understand the difference between 1.1-openblas and 1.0-openblas. In either case, it's libopenblasp-r0.2.20.so that stays installed, so there must be something else going on that I don't understand.

@brada4 Do you mean, tune down the stack size or tune down the gigabyte I ordered? I don't understand ulimit terribly well, may imposing a lower limit reduce the risk of running out of resources?

brada4 · 2018-07-07T21:35:22Z

You may need to set lower stack , like ulimit -s 8192 ; test.sh
2GB or 8GB stack is enormous, like for millions deep recursions.
1GB memory limit in LSF may apply to address space of submitted process for example.

brada4 · 2018-07-07T21:42:44Z

@martin-frbg blas-1.1 and blas-1.0 are dlopen() configuration wrappers for openblas 0.2.20
in conda-forge one may find 0.3.1 wrapped by same wrappers
too bad failing setup cannot be easily debugged.

martin-frbg · 2018-07-10T06:18:23Z

@gerritholl did you try with a smaller stack size (or higher memory limit in LSF, depending on what your LSF admin allows) ?

brada4 mentioned this issue Sep 19, 2018

thread problem #1767

Closed

martin-frbg closed this as completed Dec 30, 2018

aineniamh mentioned this issue Feb 12, 2021

pangolin uses too many threads cov-lineages/pangolin#139

Closed

tcompa mentioned this issue May 10, 2022

Set OPENBLASNUMTHREADS=1, to avoid RLIMITNPROC when importing numpy within a parsl python_app fractal-analytics-platform/fractal-client#38

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault after OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable using blas 1.1 through LSF #1668

Segmentation fault after OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable using blas 1.1 through LSF #1668

gerritholl commented Jul 5, 2018 •

edited

Loading

gerritholl commented Jul 5, 2018

martin-frbg commented Jul 5, 2018

brada4 commented Jul 5, 2018

gerritholl commented Jul 5, 2018

brada4 commented Jul 7, 2018

brada4 commented Jul 7, 2018

martin-frbg commented Jul 10, 2018

Segmentation fault after OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable using blas 1.1 through LSF #1668

Segmentation fault after OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable using blas 1.1 through LSF #1668

Comments

gerritholl commented Jul 5, 2018 • edited Loading

gerritholl commented Jul 5, 2018

martin-frbg commented Jul 5, 2018

brada4 commented Jul 5, 2018

gerritholl commented Jul 5, 2018

brada4 commented Jul 7, 2018

brada4 commented Jul 7, 2018

martin-frbg commented Jul 10, 2018

gerritholl commented Jul 5, 2018 •

edited

Loading