Martin Kroeker edited this page Aug 10, 2018 · 45 revisions

[Home] [Document] [FAQ] [Publications] [Download] [Mailing List] [Donation]

General questions

OS and Compiler


General questions

  • What is BLAS? Why is it important?

BLAS stands for Basic Linear Algebra Subprograms. BLAS provides standard interfaces for linear algebra, including BLAS1 (vector-vector operations), BLAS2 (matrix-vector operations), and BLAS3 (matrix-matrix operations). In general, BLAS is the computational kernel ("the bottom of the food chain") in linear algebra or scientific applications. Thus, if BLAS implementation is highly optimized, the whole application can get substantial benefit.

  • What is OpenBLAS? Why did you create this project?

OpenBLAS is an open source BLAS library forked from the GotoBLAS2-1.13 BSD version. Since Mr. Kazushige Goto left TACC, GotoBLAS is no longer being maintained. Thus, we created this project to continue developing OpenBLAS/GotoBLAS.

  • What's the difference between OpenBLAS and GotoBLAS?

In OpenBLAS 0.2.0, we optimized level 3 BLAS on the Intel Sandy Bridge 64-bit OS. We obtained a performance comparable with that Intel MKL.

We optimized level 3 BLAS performance on the ICT Loongson-3A CPU. It outperformed GotoBLAS by 135% in a single thread and 120% in 4 threads.

We fixed some GotoBLAS bugs including a SEGFAULT bug on the new Linux kernel, MingW32/64 bugs, and a ztrmm computing error bug on Intel Nehalem.

We also added some minor features, e.g. supporting "make install", compiling without LAPACK and upgrading the LAPACK version to 3.4.2.

You can find the full list of modifications in Changelog.txt.

  • Where do parameters GEMM_P, GEMM_Q, GEMM_R come from?

The detailed explanation is probably in the original publication authored by Kazushige Goto - Goto, Kazushige; van de Geijn, Robert A; Anatomy of high-performance matrix multiplication. ACM Transactions on Mathematical Software (TOMS). Volume 34 Issue 3, May 2008 While this article is paywalled and too old for preprints to be available on, more recent publications like contain at least a brief description of the algorithm. In practice, the values are derived by experimentation to yield the block sizes that give the highest performance. A general rule of thumb for selecting a starting point seems to be that PxQ is about half the size of L2 cache.

  • How can I report a bug?

Please file an issue at this issue page or send mail to the OpenBLAS mailing list.

Please provide the following information: CPU, OS, compiler, and OpenBLAS compiling flags (Makefile.rule). In addition, please describe how to reproduce this bug.

  • How to reference OpenBLAS.

You can reference our papers in this page. Alternatively, you can cite the OpenBLAS homepage

  • How can I use OpenBLAS in multi-threaded applications?

If your application is already multi-threaded, it will conflict with OpenBLAS multi-threading. Thus, you must set OpenBLAS to use single thread as following.

  • export OPENBLAS_NUM_THREADS=1 in the environment variables. Or
  • Call openblas_set_num_threads(1) in the application on runtime. Or
  • Build OpenBLAS single thread version, e.g. make USE_THREAD=0

If the application is parallelized by OpenMP, please build OpenBLAS with USE_OPENMP=1

  • What support is there for recent PC hardware ? What about GPU ?
  • As OpenBLAS is a volunteer project, it can take some time for the combination of a capable developer, free time, and particular hardware to come along, even for relatively common processors. Starting from 0.3.1, support is being added for AVX 512 (TARGET=SKYLAKEX), requiring a compiler that is capable of handling avx512 intrinsics. While AMD Zen processors should be autodetected by the build system, as of 0.3.2 they are still handled exactly like Intel Haswell. There once was an effort to build an OpenCL implementation that one can still find at , but work on this stopped in 2015.
  • How about the level 3 BLAS performance on Intel Sandy Bridge?

We obtained a performance comparable with Intel MKL that actually outperformed Intel MKL in some cases. Here is the result of the DGEMM subroutine's performance on Intel Core i5-2500K Windows 7 SP1 64-bit: Single Thread DGEMM Performance on Intel Desktop Sandy Bridge

OS and Compiler

  • How can I call an OpenBLAS function in Microsoft Visual Studio?

Please read this page.

  • How can I use CBLAS and LAPACKE without C99 complex number support (e.g. in Visual Studio)?

Zaheer has fixed this bug. You can now use the structure instead of C99 complex numbers. Please read this issue page for details.

This issue is for using LAPACKE in Visual Studio.

  • I get a SEGFAULT with multi-threading on Linux. What's wrong?

This may be related to a bug in the Linux kernel 2.6.32 (?). Try applying the patch segaults.patch to disable mbind using

 patch < segfaults.patch

and see if the crashes persist. Note that this patch will lead to many compiler warnings.

  • When I make the library, there is no such instruction: `xgetbv' error. What's wrong?

Please use GCC 4.4 and later version. This version supports xgetbv instruction. If you use the library for Sandy Bridge with AVX instructions, you should use GCC 4.6 and later version.

On Mac OS X, please use Clang 3.1 and later version. For example, make CC=clang

For the compatibility with old compilers (GCC < 4.4), you can enable NO_AVX flag. For example, make NO_AVX=1

  • My build fails due to the linker error "multiple definition of `dlamc3_'". What is the problem?

This linker error occurs if GNU patch is missing or if our patch for LAPACK fails to apply.

Background: OpenBLAS implements optimized versions of some LAPACK functions, so we need to disable the reference versions. If this process fails we end with duplicated implementations of the same function.

  • My build worked fine and passed all tests, but running make lapack-test ends with segfaults

Some of the LAPACK tests, notably in xeigtstz, try to allocate around 10MB on the stack. You may need to use ulimit -s to change the default limits on your system to allow this.

  • How could I disable OpenBLAS threading affinity on runtime?

You can define the OPENBLAS_MAIN_FREE or GOTOBLAS_MAIN_FREE environment variable to disable threading affinity on runtime. For example, before the running,


Alternatively, you can disable affinity feature with enabling NO_AFFINITY=1 in Makefile.rule.

  • How to solve undefined reference errors when statically linking against libopenblas.a

On Linux, if OpenBLAS was compiled with threading support (USE_THREAD=1 by default), custom programs statically linked against libopenblas.a should also link to the pthread library e.g.:

gcc -static -I/opt/OpenBLAS/include -L/opt/OpenBLAS/lib -o my_program my_program.c -lopenblas -lpthread

Failing to add the -lpthread flag will cause errors such as:

/opt/OpenBLAS/libopenblas.a(memory.o): In function `_touch_memory':
memory.c:(.text+0x15): undefined reference to `pthread_mutex_lock'
memory.c:(.text+0x41): undefined reference to `pthread_mutex_unlock'
/opt/OpenBLAS/libopenblas.a(memory.o): In function `openblas_fork_handler':
memory.c:(.text+0x440): undefined reference to `pthread_atfork'
/opt/OpenBLAS/libopenblas.a(memory.o): In function `blas_memory_alloc':
memory.c:(.text+0x7a5): undefined reference to `pthread_mutex_lock'
memory.c:(.text+0x825): undefined reference to `pthread_mutex_unlock'
/opt/OpenBLAS/libopenblas.a(memory.o): In function `blas_shutdown':
memory.c:(.text+0x9e1): undefined reference to `pthread_mutex_lock'
memory.c:(.text+0xa6e): undefined reference to `pthread_mutex_unlock'
/opt/OpenBLAS/libopenblas.a(blas_server.o): In function `blas_thread_server':
blas_server.c:(.text+0x273): undefined reference to `pthread_mutex_lock'
blas_server.c:(.text+0x287): undefined reference to `pthread_mutex_unlock'
blas_server.c:(.text+0x33f): undefined reference to `pthread_cond_wait'
/opt/OpenBLAS/libopenblas.a(blas_server.o): In function `blas_thread_init':
blas_server.c:(.text+0x416): undefined reference to `pthread_mutex_lock'
blas_server.c:(.text+0x4be): undefined reference to `pthread_mutex_init'
blas_server.c:(.text+0x4ca): undefined reference to `pthread_cond_init'
blas_server.c:(.text+0x4e0): undefined reference to `pthread_create'
blas_server.c:(.text+0x50f): undefined reference to `pthread_mutex_unlock'

The -lpthread is not required when linking dynamically against

  • Building OpenBLAS for Haswell or Dynamic Arch on RHEL-6, CentOS-6, Rocks-6.1,Scientific Linux 6

Minimum requirement to actually run AVX2-enabled software like OpenBLAS is kernel-2.6.32-358, shipped with EL6U4 in 2013

The binutils package from RHEL6 does not know the instruction vpermpd or any other AVX2 instruction. You can download a newer binutils package from Enterprise Linux software collections, following instructions here:
After configuring repository you need to install devtoolset-?-binutils to get later usable binutils package

$ yum search devtoolset-\?-binutils
$ sudo yum install devtoolset-4-binutils

once packages are installed check the correct name for SCL redirection set to enable new version

$ scl --list

Now just prefix your build commands with respective redirection:

$ scl enable devtoolset-4 -- make DYNAMIC_ARCH=1
  • Building OpenBLAS in QEMU/KVM

By default, QEMU reports the CPU as "QEMU Virtual CPU version 2.2.0", which OpenBLAS recognizes as PENTIUM2. Depending on the exact combination of CPU features the hypervisor choses to expose, this may not correspond to any CPU that exists, and OpenBLAS will error when trying to build. To fix this, pass -cpu host to QEMU, or another CPU model.

  • Building OpenBLAS for MIPS

For mips targets you will need latest toolchains P5600 - MTI GNU/Linux Toolchain I6400, P6600 - IMG GNU/Linux Toolchain

The download link is below (

You can use following commandlines for builds

IMG_TOOLCHAIN_DIR={full IMG GNU/Linux Toolchain path including "bin" directory -- for example, /opt/linux_toolchain/bin}

I6400 Build (n32):

I6400 Build (n64):

P6600 Build (n32):

P6600 Build (n64):

MTI_TOOLCHAIN_DIR={full MTI GNU/Linux Toolchain path including "bin" directory -- for example, /opt/linux_toolchain/bin}

P5600 Build:

  • Building OpenBLAS on POWER fails with IBM XL

    Trying to compile OpenBLAS with IBM XL ends with error messages about unknown register names like "vs32". Working around these by using known alternate names for the vector registers only leads to another assembler error about unsupported constraints. This is a known deficiency in the IBM compiler at least up to and including 16.1.0 (and in the POWER version of clang, from which it is derived) - use gcc instead. (See issues #1078 and #1699 for related discussions)

  • Replacing system BLAS/updating APT OpenBLAS in Ubuntu/Debian

Debian and Ubuntu LTS versions provide OpenBLAS package which is not updated after initial release, and under circumstances one might want to use more recent version of OpenBLAS e.g. to get support for newer CPUs

Ubuntu and Debian provides 'alternatives' mechanism to comfortably replace BLAS and LAPACK libraries systemwide.

After successful build of OpenBLAS (with DYNAMIC_ARCH set to 1)

$ make clean
$ sudo make DYNAMIC_ARCH=1 install

One can redirect BLAS and LAPACK alternatives to point to source-built OpenBLAS First you have to install NetLib LAPACK reference implementation (to have alternatives to replace):

$ sudo apt install libblas-dev liblapack-dev

Then we can set alternative to our freshly-built library:

$ sudo update-alternatives --add /usr/lib/ /opt/OpenBLAS/lib/ 41 \
   --slave /usr/lib/ /opt/OpenBLAS/lib/

Or remove redirection and switch back to APT-provided BLAS implementation order:

$ sudo update-alternatives --remove /opt/OpenBLAS/lib/


  • Program is Terminated. Because you tried to allocate too many memory regions

In OpenBLAS, we mange a pool of memory buffers and allocate the number of buffers as the following.


This error indicates that the program exceeded the number of buffers.

Please build OpenBLAS with larger NUM_THREADS. For example, make NUM_THREADS=32 or make NUM_THREADS=64. In Makefile.system, we will set MAX_CPU_NUMBER=NUM_THREADS.

  • How to choose TARGET manually at runtime when compiled with DYNAMIC_ARCH

The environment variable which control the kernel selection is OPENBLAS_CORETYPE (see driver/others/dynamic.c) e.g. export OPENBLAS_CORETYPE=Haswell. And the function char* openblas_get_corename() returns the used target.

  • After updating the installed OpenBLAS, a program complains about "undefined symbol gotoblas"

This symbol gets defined only when OpenBLAS is built with "make DYNAMIC_ARCH=1" (which is what distributors will choose to ensure support for more than just one CPU type).

  • How can I find out at runtime what options the library was built with ?

OpenBLAS has two utility functions that may come in here:

openblas_get_parallel() will return 0 for a single-threaded library, 1 if multithreading without OpenMP, 2 if built with USE_OPENMP=1

openblas_get_config() will return a string containing settings such as USE64BITINT or DYNAMIC_ARCH that were active at build time, as well as the target cpu (or in case of a dynamic_arch build, the currently detected one).

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.