BLAS memory allocation error in Scikit-learn KMeans & kNN & DBSCAN #3321

OnlyDeniko · 2021-07-20T12:36:02Z

scikit-learn/scikit-learn#20539

Do you have any ideas?

martin-frbg · 2021-07-20T13:44:17Z

The recommendation given there was on the right track, I wonder why it did not work. Multithreaded OpenBLAS requires a memory buffer per thread, and the maximum number of buffers is set at compile time. So there is an (ideally/normally )invisible limitation caused by what the OpenBLAS that came with either numpy or your operating system was configured for. Does it work when you set OPENBLAS_NUM_THREADS to a smaller value, like 16 or 32 ? (The OpenBLAS that comes with numpy 1.21 is built for 64 threads as recently established here #3318 (comment) but maybe you have some other version imported elsewhere in your combination of programs)

OnlyDeniko · 2021-07-20T16:14:00Z

I installed numpy: pip install -i https://pypi.anaconda.org/scipy-wheels-nightly/simple numpy
Reproducer works only with OPENBLAS_NUM_THREADS <= 17

martin-frbg · 2021-07-20T16:39:42Z

Sounds like that package was built with NUM_THREADS=16 although the conda-forge "recipe" for building openblas packages sets this number to 128. Not sure about the who/how/where for the packages though.

OnlyDeniko · 2021-07-20T20:06:54Z

Just to clarify, I download Scikit-learn from pip channel. Numpy downloads as a dependency and installing with openblas.
I found .so file in site-packages/numpy.libs folder, maybe it can help you libopenblas64_p-r0-6d9684d7.3.17.so

brada4 · 2021-07-20T21:33:29Z

These nightly builds are unstable and are only available as pip packages on PyPI

Please install official stable versions with conda and re-check. They dont even have the record of the release tag 6d9684d7 in the search

OnlyDeniko · 2021-07-21T06:41:37Z

I have no errors when I download numpy from conda channels (main, conda-forge, intel) because openblas downloads separately.
Error appears when I download numpy from pip. So, I download last stable version of numpy 1.21.1from pip channel and in site-packages/numpy.libs have libopenblasp-r0-2d23e62b.3.17.so

brada4 · 2021-07-21T13:57:00Z

So, problem is gone?
We cannot decode those numbers on real pypi, anaconda pypi, or conda forge. What is certain we did not make nightly build you downloaded from anaconda pypi.

OnlyDeniko · 2021-07-21T19:23:26Z

No, the problem remains. I do not download any nightly builds, I just type this: pip install numpy

brada4 · 2021-07-21T20:37:26Z

Please check configuration of .so files you got:

ctypes.CDLL("/path/to/your/lib/openblas.so").openblas_get_config()

nproc from your machine? Seems like that was not answered in other thread.

martin-frbg · 2021-07-21T20:58:17Z

If you do not want to limit the number of OpenBLAS threads, the only solution would seem to be to build OpenBLAS from source with a high enough NUM_THREADS. I cannot match the hash in the library names you mentioned to a specific build (and consequently build options), but I would expect the maximum supported thread count to be at least 64. Maybe what is happening
is that parts of Scikit-learn are themselves making parallel calls into OpenBLAS - limiting the thread count may even provide a performance benefit in that situation

OnlyDeniko · 2021-07-22T05:54:59Z

Please check configuration of .so files you got:
ctypes.CDLL("/path/to/your/lib/openblas.so").openblas_get_config()
nproc from your machine? Seems like that was not answered in other thread.

nproc=96

>>> import ctypes
>>> ctypes.CDLL("/home/ubuntu/miniconda3/lib/python3.8/site-packages/numpy.libs/libopenblasp-r0-2d23e62b.3.17.so").openblas_get_config()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/miniconda3/lib/python3.8/ctypes/__init__.py", line 381, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libgfortran-2e0d59d6.so.5.0.0: cannot open shared object file: No such file or directory

$ ldd libopenblasp-r0-2d23e62b.3.17.so
        linux-vdso.so.1 (0x00007ffcdf182000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f1f42a1b000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f1f427fc000)
        libgfortran-2e0d59d6.so.5.0.0 => not found
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1f4240b000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f1f44d16000)

isuruf · 2021-07-22T06:18:42Z

There are multiple installations you are talking about here.

libopenblas64_p-r0-6d9684d7.3.17.so

This is a nightly build with INTERFACE64=1

libopenblasp-r0-2d23e62b.3.17.so

I'm not sure where you got this from. It doesn't seem to be from PyPI

Can you remove numpy and make sure that there's nothing in /home/ubuntu/miniconda3/lib/python3.8/site-packages/numpy.libs/
and then install using pip install numpy? Send the output of pip install numpy.

OnlyDeniko · 2021-07-22T06:45:57Z

As you recommended I install numpy using this: pip install -i https://pypi.anaconda.org/scipy-wheels-nightly/simple numpy
And I got in site-packages/numpy.libs this libopenblas64_p-r0-6d9684d7.3.17.so. But it is does not help, so let's forget about it and talk about only about pip install numpy
You can try to download and will get the following result:

$ pip install numpy
Collecting numpy
  Using cached numpy-1.21.1-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.8 MB)
Installing collected packages: numpy
Successfully installed numpy-1.21.1

brada4 · 2021-07-22T07:33:48Z

Build command for OpenBLAS here: i.e DYNAMIC_ARCH and 128 threads
https://github.com/conda-forge/openblas-feedstock/blob/25dab765a98489e0a3e2ca8c3e7094e21e471425/recipe/build.sh#L68

OnlyDeniko · 2021-07-22T07:46:54Z

Build command for OpenBLAS here: i.e DYNAMIC_ARCH and 128 threads
https://github.com/conda-forge/openblas-feedstock/blob/25dab765a98489e0a3e2ca8c3e7094e21e471425/recipe/build.sh#L68

I'm not going to manually build openblas. As a user, I want to be sure that with the most ordinary download of a numpy, it will start and work on any machine. When downloading from a conda channels, this happens, but the problem is in the pip channel. Have you been able to reproduce the problem yourself?

isuruf · 2021-07-22T08:05:39Z

Can you run the script with OPENBLAS_VERBOSE=2?

OnlyDeniko · 2021-07-22T08:14:35Z

Can you run the script with OPENBLAS_VERBOSE=2?

Core: SkylakeX
OpenBLAS : Your OS does not support AVX512VL instructions. OpenBLAS is using Haswell kernels as a fallback, which may give poorer performance.
Core: Haswell
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
Segmentation fault (core dumped)

martin-frbg · 2021-07-22T08:26:44Z

Is there any control over the parallelism in scikit ? Assuming you had 8 threads running in parallel, each calling an OpenBLAS function that uses 16 threads, there would be no free slots in the buffer list for a 9th thread doing the same.

OnlyDeniko · 2021-07-22T08:32:15Z

Is there any control over the parallelism in scikit ? Assuming you had 8 threads running in parallel, each calling an OpenBLAS function that uses 16 threads, there would be no free slots in the buffer list for a 9th thread doing the same.

Actually, I do not know. Gotta ask the guys from the scikit-learn team

brada4 · 2021-07-22T12:38:24Z

This parallelism is used.
https://joblib.readthedocs.io/en/latest/parallel.html#avoiding-over-subscription-of-cpu-resources

Please set OPENBLAS_NUM_THREADS=1

jeremiedbb · 2021-07-22T12:42:37Z

In KMeans we call OpenBLAS gemm inside a parallel (openmp) loop, but we set openblas num threads to one to avoid nesting parallelism.

In KNN and DBSCAN we call OpenBLAS in a multi-process setup and as mentioned above we set the number of openblas threads such that the total number of threads does not exceed the number of cpus.

Setting OPENBLAS_NUM_THREADS=1 means all openblas calls will be sequential in non-nested regions which is non optimal unfortunately.

@OnlyDeniko do you set the n_jobs parameter for these estimators ?

OnlyDeniko · 2021-07-22T12:55:17Z

@jeremiedbb I set n_jobs=-1

jeremiedbb · 2021-07-22T12:58:26Z

Could you try setting to a lower value like 16 32 64 and see when it breaks ?

brada4 · 2021-07-22T14:00:32Z

@jeremiedbb it is MAX(50,NUM_CPU*2) memory regions.
This means that many temporary allocations can be used, one per call max, could be another if calls are nested.
Namely it breaks when your_threads X openblas_threads exceeds 256 , and it gets slow when you exceed real cores anyway.

OnlyDeniko · 2021-07-22T19:46:28Z

Could you try setting to a lower value like 16 32 64 and see when it breaks ?

Kmeans works with n_jobs <= 65
KNN does not work with n_jobs=-1, but works with n_jobs=None

brada4 · 2021-07-22T21:04:48Z

It is timing race until you reach 256 allocations. Oversubscription damages performance worse than linear. You somehow need to achieve that there is one OpenBLAS thread per CPU for fastest result, say njobs=48 OPENBLAS_NUM_THREADS=2 or something else that multiplies to 96 cores and returns result in shortest time.

jeremiedbb · 2021-07-22T22:15:15Z

For KMeans, we deal with the number of openblas threads internally so setting OMP_NUM_THREADS=64 or n_jobs=64 should be enough.

For KNN, I'd suggest to set n_jobs=1 and maybe OPENBLAS_NUM_THREADS=64, since I don't think multiprocessing brings something for this estimator. We are currently reworking it to have a way better scalability on multicore settings but it's still WIP.

isuruf · 2021-07-22T22:17:40Z

Since it looks like this is just a matter of increasing the parameter at build time from 64 to 128, can you open an issue in https://github.com/MacPython/openblas-libs/ ?

brada4 · 2021-07-23T00:58:26Z

this one was official release w openblas binary pulled from conda. 128threads and wild , well documented cpu oversubscription.
i think this issue can return to scikit. ctyprs trick can set threads on runtime.

OnlyDeniko · 2021-07-28T11:02:11Z

python -m threadpoolctl --import sklearn
[
  {
    "filepath": "/home/ubuntu/miniconda3/envs/dkulandi_bench/lib/python3.8/site-packages/scikit_learn.libs/libgomp-f7e03b3e.so.1.0.0",
    "prefix": "libgomp",
    "user_api": "openmp",
    "internal_api": "openmp",
    "version": null,
    "num_threads": 96
  },
  {
    "filepath": "/home/ubuntu/miniconda3/envs/dkulandi_bench/lib/python3.8/site-packages/numpy.libs/libopenblasp-r0-2d23e62b.3.17.so",
    "prefix": "libopenblas",
    "user_api": "blas",
    "internal_api": "openblas",
    "version": "0.3.17",
    "num_threads": 64,
    "threading_layer": "pthreads",
    "architecture": "SkylakeX"
  },
  {
    "filepath": "/home/ubuntu/miniconda3/envs/dkulandi_bench/lib/python3.8/site-packages/scipy.libs/libopenblasp-r0-085ca80a.3.9.so",
    "prefix": "libopenblas",
    "user_api": "blas",
    "internal_api": "openblas",
    "version": "0.3.9",
    "num_threads": 64,
    "threading_layer": "pthreads",
    "architecture": "Haswell"
  }
]

python -c "import joblib; print(joblib.cpu_count(only_physical_cores=True))"
48

python -c "import joblib; print(joblib.cpu_count())"
96

lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              96
On-line CPU(s) list: 0-95
Thread(s) per core:  2
Core(s) per socket:  24
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
Stepping:            7
CPU MHz:             1201.212
BogoMIPS:            5999.97
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            36608K
NUMA node0 CPU(s):   0-23,48-71
NUMA node1 CPU(s):   24-47,72-95
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni

ogrisel · 2021-07-28T12:43:52Z

Thanks for the details.

So we get the confirmation that your code relies on the OpenBLAS shipped in the numpy and scipy wheels and each wheel brings a different version. Usually this is not a problem.

But I am still not sure why this crashes in scikit-learn:

when calling KMeans with n_jobs=-1 this is equivalent to calling KMeans with n_jobs=96 on your machine and this creates 96 OpenMP threads, each of them should be OpenBLAS in sequential mode thanks to this line. joblib is not used at all in this case. ~~So I don't understand why we have the problem.~~ (see the edit below).
for k-NN and DBSCAN this is another story: in this case n_jobs=-1 is resolved to n_jobs=96 and creates 96 independent Python processes via joblib. Each of them is initialized with the OPENBLAS_NUM_THREADS=max(1, CPU_COUNT/n_jobs)=1 environment variable. So we should not have a problem either.

In either cases we should neither get oversubscription related performance problems: OpenBLAS should always run in sequential mode in the end.

According to:

https://github.com/xianyi/OpenBLAS/blob/develop/USAGE.md#program-is-terminated-because-you-tried-to-allocate-too-many-memory-regions

The error you observe could still be resolved by increasing the NUM_THREADS make variable at build time, as conda-forge does.

@OnlyDeniko do you confirm that you do not reproduce the problem if you install everything from conda-forge which sets NUM_THREADS=128 at build time? You can create a dedicated env with:

conda create -n sklearn-cf -c conda-forge scikit-learn
conda activate sklearn-cf
python -m threadpoolctl --import sklearn  # just to check
python your_reproducer_script.py

Python, numpy, scipy, openblas, joblib and threadpoolctl are all dependencies of scikit-learn so conda will install them all from conda-forge automatically.

Edit: from the reference linked above:

Despite its name, and due to the use of memory buffers in functions like SGEMM, the setting of NUM_THREADS can be relevant even for a single-threaded build of OpenBLAS, if such functions get called by multiple threads of a program that uses OpenBLAS. In some cases, the affected code may simply crash or throw a segmentation fault without displaying the above warning first.

So indeed for KMeans, even if OpenBLAS is called with 1 thread at runtime by 96 OpenMP threads so this might be the problem.

jeremiedbb · 2021-07-28T12:48:26Z

But I am still not sure why this crashes in scikit-learn:
when calling KMeans with n_jobs=-1 this is equivalent to calling KMeans with n_jobs=96 on your machine and this creates 96 OpenMP threads, each of them should be OpenBLAS in sequential mode thanks to this line. joblib is not used at all in this case. So I don't understand why we have the problem.
for k-NN and DBSCAN this is another story: in this case n_jobs=-1 is resolved to n_jobs=96 and creates 96 independent Python processes via joblib. Each of them is initialized with the OPENBLAS_NUM_THREADS=max(1, CPU_COUNT/n_jobs)=1 environment variable. So we should not have a problem either.

in both cases we try to create 96 memory regions which is more than 64. For KMeans, setting OMP_NUM_THREADS=64 or n_jobs=64 should be ok. For KNN setting n_jobs=64 and OPENBLAS_NUM_THREADS should be ok (alternatively n_jobs=1 and OPENBLAS_NUM_THREADS=64).

ogrisel · 2021-07-28T12:52:38Z

I don't understand why this breaks when we use joblib sub-processes for KNN: each worker process manages its memory independently of the other. There should be no shared buffers.

jeremiedbb · 2021-07-28T13:00:32Z

in KNN parallelism (assuming brute force) comes from pairwise distances computations which uses joblib with the threading backend.
Edit: we also use the threading backend for tree based solvers

ogrisel · 2021-07-28T13:36:57Z

Alright that makes sense then. And DBSCAN does the same to precompute the neighborhood graph.

OnlyDeniko · 2021-07-28T13:38:47Z

Thanks for the details.

So we get the confirmation that your code relies on the OpenBLAS shipped in the numpy and scipy wheels and each wheel brings a different version. Usually this is not a problem.

But I am still not sure why this crashes in scikit-learn:

when calling KMeans with n_jobs=-1 this is equivalent to calling KMeans with n_jobs=96 on your machine and this creates 96 OpenMP threads, each of them should be OpenBLAS in sequential mode thanks to this line. joblib is not used at all in this case. ~~So I don't understand why we have the problem.~~ (see the edit below).

for k-NN and DBSCAN this is another story: in this case n_jobs=-1 is resolved to n_jobs=96 and creates 96 independent Python processes via joblib. Each of them is initialized with the OPENBLAS_NUM_THREADS=max(1, CPU_COUNT/n_jobs)=1 environment variable. So we should not have a problem either.

In either cases we should neither get oversubscription related performance problems: OpenBLAS should always run in sequential mode in the end.

According to:

https://github.com/xianyi/OpenBLAS/blob/develop/USAGE.md#program-is-terminated-because-you-tried-to-allocate-too-many-memory-regions

The error you observe could still be resolved by increasing the NUM_THREADS make variable at build time, as conda-forge does.

@OnlyDeniko do you confirm that you do not reproduce the problem if you install everything from conda-forge which sets NUM_THREADS=128 at build time? You can create a dedicated env with:
conda create -n sklearn-cf -c conda-forge scikit-learn
conda activate sklearn-cf
python -m threadpoolctl --import sklearn  # just to check
python your_reproducer_script.py
Python, numpy, scipy, openblas, joblib and threadpoolctl are all dependencies of scikit-learn so conda will install them all from conda-forge automatically.

Edit: from the reference linked above:

Despite its name, and due to the use of memory buffers in functions like SGEMM, the setting of NUM_THREADS can be relevant even for a single-threaded build of OpenBLAS, if such functions get called by multiple threads of a program that uses OpenBLAS. In some cases, the affected code may simply crash or throw a segmentation fault without displaying the above warning first.

So indeed for KMeans, even if OpenBLAS is called with 1 thread at runtime by 96 OpenMP threads so this might be the problem.

Yes, I confirm

[
  {
    "filepath": "/home/ubuntu/miniconda3/envs/sklearn-cf/lib/libopenblasp-r0.3.17.so",
    "prefix": "libopenblas",
    "user_api": "blas",
    "internal_api": "openblas",
    "version": "0.3.17",
    "num_threads": 96,
    "threading_layer": "pthreads",
    "architecture": "SkylakeX"
  },
  {
    "filepath": "/home/ubuntu/miniconda3/envs/sklearn-cf/lib/libgomp.so.1.0.0",
    "prefix": "libgomp",
    "user_api": "openmp",
    "internal_api": "openmp",
    "version": null,
    "num_threads": 96
  }
]

ogrisel · 2021-07-28T13:40:12Z

I think we understand the root cause and the solution of the problem now (and workarounds). I think we can close the issue on this repo in favor of MacPython/openblas-libs#64 which I just created.

martin-frbg · 2021-07-28T14:06:22Z

@ogrisel thank you very much for looking into this. This buffer remains the major design flaw in OpenBLAS, but I suspect the only thing I can do short-term to mitigate its effect is to add more information to the error message, in particular the number of threads the library was built for.

ogrisel · 2021-07-28T14:20:21Z

I suspect the only thing I can do short-term to mitigate its effect is to add more information to the error message, in particular the number of threads the library was built for.

That would be great. You could also link to a dedicated markdown document on github that gives more details to users on how to introspect how many CPUs they have on their machine and how OpenBLAS where was installed from (I am pretty sure that most of OpenBLAS users do no know that they use OpenBLAS because they use it via numpy, scipy, pytorch, R or something similar).

ogrisel · 2021-08-19T16:56:04Z

Actually, this is not the only problem: using more than 64 threads seems to degrade the performance of a 4096x4096 DGEMM, see MacPython/openblas-libs#64 (comment).

martin-frbg · 2021-08-19T17:13:32Z

It is certainly possible to throw so many threads at a "small" problem that performance degrades again, but I believe it would need an unmanageable (and itself costly) set of rules to tailor the number of threads to each problem size, where OpenBLAS currently switches between 1 and all threads only. Hardware layout (cache locality, multi-die cpu interconnects etc) will also play a role.

ogrisel · 2021-08-20T09:06:32Z

I am not sure how to move forward with this. Increasing NUM_THREADS in the default builds of OpenBLAS used by the majority of the Python ecosystem is not necessarily a good idea because it can cause performance degradations of typical numpy/scipy workloads when running on machines with hundreds of cores.

Implementing an ad-hoc mitigation in scikit-learn for estimators who call OpenBLAS routines in sequential mode from a large number of externally managed threads is possible but complex, hard to maintain and brittle. See MacPython/openblas-libs#64 (comment) for a minimal reproducer and some details on how to technically implement this. But such an ad-hoc mitigation would not solve the problem for other libraries (apparently it might impact PyTorch users as well).

Ideally the problem should be solved in OpenBLAS by making it possible to allocate extra buffers when needed when OpenBLAS is called by a large number of externally managed threads.

mattip · 2021-08-20T10:48:31Z

It would be nice if there is a mechanism to report the error and return a sentinel or set some errno without crashing the process.

martin-frbg · 2021-08-20T11:23:05Z

you are not the first to come up with that suggestions. Unfortunately when we reach this situation there is nowhere left to go, and there is no universally agreed error code or mechanism to return "BLAS just died on you" anyway

martin-frbg · 2021-08-26T12:52:37Z

@ogrisel @mattip can you give ~~#3350~~ #3352 a spin please ? (This should malloc space for another 512 threads in an emergency - anything but elegant but probably better than giving up). Unfortunately our drone CI has stopped running and our travis is not yet up again after the migration, leaving me without serious multicore hardware. (PR was tested on a 12C system with OpenBLAS intentionally crippled to support only 4 threads though)

linuxl7 · 2023-04-28T08:50:03Z

change \site-packages\joblib\externals\loky\backend\context.py can be do it;

os_cpu_count = min(os.cpu_count() or 1,12)

cpu_count_user = min(_cpu_count_user(os_cpu_count),12)

OnlyDeniko mentioned this issue Jul 22, 2021

BLAS memory allocation error in KMeans & kNN & DBSCAN scikit-learn/scikit-learn#20539

Open

ogrisel mentioned this issue Jul 28, 2021

Build OpenBLAS with NUM_THREADS=128 MacPython/openblas-libs#64

Closed

martin-frbg mentioned this issue Jul 31, 2021

Improve the "tried to allocate too many buffers" error message #3330

Merged

martin-frbg mentioned this issue Aug 26, 2021

[WIP] malloc a new control structure when we run out of thread space #3350

Closed

martin-frbg mentioned this issue Aug 28, 2021

Allocate an auxiliary struct when running out of preconfigured threads #3352

Merged

martin-frbg linked a pull request Aug 29, 2021 that will close this issue

Allocate an auxiliary struct when running out of preconfigured threads #3352

Merged

martin-frbg removed a link to a pull request Aug 29, 2021

Allocate an auxiliary struct when running out of preconfigured threads #3352

Merged

martin-frbg linked a pull request Aug 29, 2021 that will close this issue

Allocate an auxiliary struct when running out of preconfigured threads #3352

Merged

martin-frbg closed this as completed in #3352 Sep 1, 2021

grahamfindlay mentioned this issue Jun 13, 2022

Segfault during Kmeans for frame selection DeepLabCut/DeepLabCut#1877

Closed

2 tasks

linuxl7 mentioned this issue Apr 28, 2023

change \site-packages\joblib\externals\loky\backend\context.py can be do it; #4025

Closed

ogrisel mentioned this issue May 4, 2023

KNeighborsClassifier OpenBLAS warning: precompiled NUM_THREADS exceeded, adding auxiliary array for thread metadata scikit-learn/scikit-learn#26307

Open

rgommers mentioned this issue Jun 29, 2023

BUG: large blas_memory_alloc allocations on scipy import scipy/scipy#18774

Closed

rragundez mentioned this issue Feb 25, 2024

large matrix core dump rragundez/chunkdot#16

Open

BLAS memory allocation error in Scikit-learn KMeans & kNN & DBSCAN #3321

BLAS memory allocation error in Scikit-learn KMeans & kNN & DBSCAN #3321

Comments

OnlyDeniko commented Jul 20, 2021 • edited by martin-frbg

martin-frbg commented Jul 20, 2021 • edited

OnlyDeniko commented Jul 20, 2021

martin-frbg commented Jul 20, 2021

OnlyDeniko commented Jul 20, 2021 • edited

brada4 commented Jul 20, 2021

OnlyDeniko commented Jul 21, 2021

brada4 commented Jul 21, 2021

OnlyDeniko commented Jul 21, 2021

brada4 commented Jul 21, 2021

martin-frbg commented Jul 21, 2021

OnlyDeniko commented Jul 22, 2021

isuruf commented Jul 22, 2021

OnlyDeniko commented Jul 22, 2021

brada4 commented Jul 22, 2021

OnlyDeniko commented Jul 22, 2021

isuruf commented Jul 22, 2021

OnlyDeniko commented Jul 22, 2021

martin-frbg commented Jul 22, 2021

OnlyDeniko commented Jul 22, 2021

brada4 commented Jul 22, 2021

jeremiedbb commented Jul 22, 2021

OnlyDeniko commented Jul 22, 2021

jeremiedbb commented Jul 22, 2021

brada4 commented Jul 22, 2021 • edited

OnlyDeniko commented Jul 22, 2021

brada4 commented Jul 22, 2021

jeremiedbb commented Jul 22, 2021

isuruf commented Jul 22, 2021

brada4 commented Jul 23, 2021

OnlyDeniko commented Jul 28, 2021

ogrisel commented Jul 28, 2021 • edited

jeremiedbb commented Jul 28, 2021

ogrisel commented Jul 28, 2021 • edited

jeremiedbb commented Jul 28, 2021 • edited

ogrisel commented Jul 28, 2021

OnlyDeniko commented Jul 28, 2021

ogrisel commented Jul 28, 2021

martin-frbg commented Jul 28, 2021

ogrisel commented Jul 28, 2021

ogrisel commented Aug 19, 2021

martin-frbg commented Aug 19, 2021

ogrisel commented Aug 20, 2021 • edited

mattip commented Aug 20, 2021

martin-frbg commented Aug 20, 2021

martin-frbg commented Aug 26, 2021 • edited

linuxl7 commented Apr 28, 2023

OnlyDeniko commented Jul 20, 2021 •

edited by martin-frbg

martin-frbg commented Jul 20, 2021 •

edited

OnlyDeniko commented Jul 20, 2021 •

edited

brada4 commented Jul 22, 2021 •

edited

ogrisel commented Jul 28, 2021 •

edited

ogrisel commented Jul 28, 2021 •

edited

jeremiedbb commented Jul 28, 2021 •

edited

ogrisel commented Aug 20, 2021 •

edited

martin-frbg commented Aug 26, 2021 •

edited