New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Still got multithread error "BLAS : Program is Terminated..." after set NUM_THREADS=32 #889

Closed
jiesutd opened this Issue May 21, 2016 · 11 comments

Comments

Projects
None yet
4 participants
@jiesutd

jiesutd commented May 21, 2016

Hi all,

I am trying to implement a multi-thread (less than 12 threads) program using OpenBLAS on MacbookPro 13. While it runs out with an error
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.

Here I tried two solutions, but either of them is not working.

  1. Solution One:
    I checked the Faq here: https://github.com/xianyi/OpenBLAS/wiki/faq#usage-1, which said I need to set the NUM_THREADS=32. I tried this method and it seems work during compile, the following is the output of the make result:

OpenBLAS build complete. (BLAS CBLAS LAPACK LAPACKE)
OS ... Darwin
Architecture ... x86_64
BINARY ... 64bit
C compiler ... CLANG (command line : clang)
Fortran compiler ... GFORTRAN (command line : gfortran)
-n Library Name ... libopenblas_haswellp-r0.2.19.dev.a
(Multi threaded; Max num-threads is 32)

  1. Solution Two:
    Based on the link: https://github.com/xianyi/OpenBLAS/wiki/faq#general-questions-1, I set OpenBLAS as single thread during compiling (make USE_THREAD=0). The compile output is quite similar with the first solution except the last line being:
    (Single threaded)

Then I used make install to reinstall it into directory /opt/OpenBLAS. Everything works fine until now.

However when I try to run my code in multithread (using pthread_t, pthread_create and pthread_join, less than 12 threads), it still got the error BLAS : Program is Terminated. Because you tried to allocate too many memory regions. (for both solutions)

I tried to update the OpenBLAS into the newest version, but it still got this error.

I am quite confused why does this happen, it there any other reasons causing this error or am I use OpenBLAS in wrong way?

Thanks!

@jeromerobert

This comment has been minimized.

Contributor

jeromerobert commented May 21, 2016

As your program is managing the threading the solution 2 is the way to go. Did you kept the NUM_THREADS=32 parameter for the second solution ?

@jiesutd

This comment has been minimized.

jiesutd commented May 21, 2016

@jeromerobert
Yes, I also set the NUM_THREADS = 32 in file Makefile.system by replacing NUM_THREADS = $(NUM_CORES) for the second solution. And follows with make USE_THREAD=0, sudo make install.

But it still generate the same error~ That's quite confusing.

I use the CMake for compiling my code, here is the openblas part of CMakeLists.txt. Do you think this error is caused by the CMakeLists.txt ? Thank you very much!

####for openblas
add_definitions(-DMSHADOW_USE_CUDA=0)
add_definitions(-DMSHADOW_USE_CBLAS=1)
add_definitions(-DMSHADOW_USE_MKL=0)
SET( CMAKE_SHARED_LINKER_FLAGS "-lm -lopenblas")
####endfor openblas
add_executable(LSTMLabeler LSTMLabeler.cpp)
target_link_libraries(LSTMLabeler openblas)

@jeromerobert

This comment has been minimized.

Contributor

jeromerobert commented May 21, 2016

That's the reason why you still get this error. As you don't want to use OpenBLAS threading you do not need NUM_THREADS at all.

By the way 32 is a very high value. When using OpenBLAS threading NUM_THREADS is supposed to be the number of physical cores of your machine (2 for you ?). It should be automatically detected by the build system. The only reason to manually set NUM_THREADS is to build OpenBLAS for an other machine which have more physical cores than the current machine.

@jiesutd

This comment has been minimized.

jiesutd commented May 21, 2016

Thanks jeromerobert!

But I also tried to let NUM_THREADS as default, i.e. NUM_THREADS = $(NUM_CORES) and make USE_THREAD=0, it still got the same error.

My computer has dual core with 4 virtual cores. When I monitor the cpu usage during the running of my program, it works well when %cpu = 100%, and it will crash when %cpu close to 200-250%.

I think the two cores of my machine is not the upper bound of my thread number, it should support much more threads (>12) than only two. But I can't make it work compatible with OpenBLAS.

@jeromerobert

This comment has been minimized.

Contributor

jeromerobert commented May 21, 2016

But I also tried to let NUM_THREADS as default, i.e. NUM_THREADS = $(NUM_CORES) and make > USE_THREAD=0, it still got the same error.

Then could you run your programm with a debugger and show use a stack when it segfault ? Did you check that your program is not buggy and do not erase the OpenBLAS memory regions (i.e. Valgrind, ASan, ...) ?

I think the two cores of my machine is not the upper bound of my thread number, it should support
much more threads (>12) than only two.

That's sound very odd. Could you explain why and how you expect to get better performance with 12 threads on a 2 cores machine ?

@brada4

This comment has been minimized.

Contributor

brada4 commented May 21, 2016

??? https://github.com/xianyi/OpenBLAS/wiki/faq#allocmorebuffers ???
Can you repeat the crash with plain 'make' followed by 'make install' on unmodified openblas source tree?

@jiesutd

This comment has been minimized.

jiesutd commented May 21, 2016

@brada4
I saw the faq about this error, and followed the instruction but it still failed.

The plain make and make install on unmodified openblas source is my original compile steps, but it will cause the crash.

Based on what I read in the OpenBLAS instructions, I think the plain make will config openblas running in multithread model (limited by NUM_THREADS). This openblas internal multithread will conflict with user's multithread setting, so I need compile openblas by setting openblas as single thread mode (my second solution make USE_THREAD=0). Besides, the multithread number may limited by system default number (NUM_THREADS = $(NUM_CORES)), so I need to enlarge the NUM_THREADS manually (my first solution: set NUM_THREADS=32 in makefile.system).

I tried both two solutions and their combinations (solution1, solution2, solution1+solution2), but they all failed in the same error...

@brada4

This comment has been minimized.

Contributor

brada4 commented May 21, 2016

Normally you manipulate Makefile.rule
NUM_THREADS=1 will disable all threading APIs for example.
Same effect can be achieved by setting OPENBLAS_NUM_THREADS=1 at runtime (and after that you call openblas functions from your own threads as much as you want)

@jiesutd

This comment has been minimized.

jiesutd commented May 21, 2016

@jeromerobert. Thanks for your comment.

  1. Yes, I will review my code if I can make sure the error comes from my code structure but not my configuration on OpenBLAS. I am not sure how to detect if my program can touch the OpenBLAS memory regions, but I will try to solve it.
  2. For the multithread in small cores, what I expect is that CPUs are mostly idle, if I can create more threads than core number, maybe the CPU scheduler can allocate more CPU time for my program? (I am not sure about that). But I think you are right, it would be proper to set thread numbers no more than core number. I will try it now.
@theoractice

This comment has been minimized.

Contributor

theoractice commented May 21, 2016

Hi jiesutd,
Try add openblas_set_num_threads(openblas_get_num_procs()); in your code before any OpenBLAS call as this resets the OpenBLAS threads manager.
I also have to say using too many threads (more than CPU core numbers) is not a good idea, especially for numeric computation. Something like a thread pool will be better.

@jiesutd

This comment has been minimized.

jiesutd commented May 21, 2016

@theoractice, Thanks for your suggestion!

Yes, I think you and @jeromerobert 's suggestion are right, more threads than core numbers does not bring significant benefit for your program. And when I set my thread number within the core number, the error disappeared. So I decide to use small threads now.

Anyway, thanks for all your meaningful help ! Have learned a lot in this issue. @jeromerobert @brada4 @theoractice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment