Faq

Martin Kroeker edited this page Mar 5, 2017 · 36 revisions

[Home] [Document] [FAQ] [Publications] [Download] [Mailing List] [Donation]

General questions

OS and Compiler

Usage


General questions

  • What is BLAS? Why is it important?

BLAS stands for Basic Linear Algebra Subprograms. BLAS provides standard interfaces for linear algebra, including BLAS1 (vector-vector operations), BLAS2 (matrix-vector operations), and BLAS3 (matrix-matrix operations). In general, BLAS is the computational kernel ("the bottom of the food chain") in linear algebra or scientific applications. Thus, if BLAS implementation is highly optimized, the whole application can get substantial benefit.

  • What is OpenBLAS? Why did you create this project?

OpenBLAS is an open source BLAS library forked from the GotoBLAS2-1.13 BSD version. Since Mr. Kazushige Goto left TACC, GotoBLAS is no longer being maintained. Thus, we created this project to continue developing OpenBLAS/GotoBLAS.

  • What's the difference between OpenBLAS and GotoBLAS?

In OpenBLAS 0.2.0, we optimized level 3 BLAS on the Intel Sandy Bridge 64-bit OS. We obtained a performance comparable with that Intel MKL.

We optimized level 3 BLAS performance on the ICT Loongson-3A CPU. It outperformed GotoBLAS by 135% in a single thread and 120% in 4 threads.

We fixed some GotoBLAS bugs including a SEGFAULT bug on the new Linux kernel, MingW32/64 bugs, and a ztrmm computing error bug on Intel Nehalem.

We also added some minor features, e.g. supporting "make install", compiling without LAPACK and upgrading the LAPACK version to 3.4.2.

You can find the full list of modifications in Changelog.txt.

  • How can I report a bug?

Please file an issue at this issue page or send mail to the OpenBLAS mailing list.

Please provide the following information: CPU, OS, compiler, and OpenBLAS compiling flags (Makefile.rule). In addition, please describe how to reproduce this bug.

  • How to reference OpenBLAS.

You can reference our papers in this page. Alternatively, you can cite the OpenBLAS homepage http://www.openblas.net.

  • How can I use OpenBLAS in multi-threaded applications?

If your application is already multi-threaded, it will conflict with OpenBLAS multi-threading. Thus, you must set OpenBLAS to use single thread as following.

  • export OPENBLAS_NUM_THREADS=1 in the environment variables. Or
  • Call openblas_set_num_threads(1) in the application on runtime. Or
  • Build OpenBLAS single thread version, e.g. make USE_THREAD=0

If the application is parallelized by OpenMP, please build OpenBLAS with USE_OPENMP=1

  • What's the plan for Intel Sandy Bridge, Haswell, AMD Bulldozer, Piledriver?

    • We already completed the BLAS3 tuning of these architectures on x86-64 OS.
  • How about the level 3 BLAS performance on Intel Sandy Bridge?

We obtained a performance comparable with Intel MKL that actually outperformed Intel MKL in some cases. Here is the result of the DGEMM subroutine's performance on Intel Core i5-2500K Windows 7 SP1 64-bit: Single Thread DGEMM Performance on Intel Desktop Sandy Bridge


OS and Compiler

  • How can I call an OpenBLAS function in Microsoft Visual Studio?

Please read this page.

  • How can I use CBLAS and LAPACKE without C99 complex number support (e.g. in Visual Studio)?

Zaheer has fixed this bug. You can now use the structure instead of C99 complex numbers. Please read this issue page for details.

This issue is for using LAPACKE in Visual Studio.

  • I get a SEGFAULT with multi-threading on Linux. What's wrong?

This may be related to a bug in the Linux kernel 2.6.32 (?). Try applying the patch segaults.patch to disable mbind using

 patch < segfaults.patch

and see if the crashes persist. Note that this patch will lead to many compiler warnings.

  • When I make the library, there is no such instruction: `xgetbv' error. What's wrong?

Please use GCC 4.4 and later version. This version supports xgetbv instruction. If you use the library for Sandy Bridge with AVX instructions, you should use GCC 4.6 and later version.

On Mac OS X, please use Clang 3.1 and later version. For example, make CC=clang

For the compatibility with old compilers (GCC < 4.4), you can enable NO_AVX flag. For example, make NO_AVX=1

  • My build fails due to the linker error "multiple definition of `dlamc3_'". What is the problem?

This linker error occurs if GNU patch is missing or if our patch for LAPACK fails to apply.

Background: OpenBLAS implements optimized versions of some LAPACK functions, so we need to disable the reference versions. If this process fails we end with duplicated implementations of the same function.

  • How could I disable OpenBLAS threading affinity on runtime?

You can define the OPENBLAS_MAIN_FREE or GOTOBLAS_MAIN_FREE environment variable to disable threading affinity on runtime. For example, before the running,

export OPENBLAS_MAIN_FREE=1

Alternatively, you can disable affinity feature with enabling NO_AFFINITY=1 in Makefile.rule.

  • How to solve undefined reference errors when statically linking against libopenblas.a

On Linux, if OpenBLAS was compiled with threading support (USE_THREAD=1 by default), custom programs statically linked against libopenblas.a should also link to the pthread library e.g.:

gcc -static -I/opt/OpenBLAS/include -L/opt/OpenBLAS/lib -o my_program my_program.c -lopenblas -lpthread

Failing to add the -lpthread flag will cause errors such as:

/opt/OpenBLAS/libopenblas.a(memory.o): In function `_touch_memory':
memory.c:(.text+0x15): undefined reference to `pthread_mutex_lock'
memory.c:(.text+0x41): undefined reference to `pthread_mutex_unlock'
/opt/OpenBLAS/libopenblas.a(memory.o): In function `openblas_fork_handler':
memory.c:(.text+0x440): undefined reference to `pthread_atfork'
/opt/OpenBLAS/libopenblas.a(memory.o): In function `blas_memory_alloc':
memory.c:(.text+0x7a5): undefined reference to `pthread_mutex_lock'
memory.c:(.text+0x825): undefined reference to `pthread_mutex_unlock'
/opt/OpenBLAS/libopenblas.a(memory.o): In function `blas_shutdown':
memory.c:(.text+0x9e1): undefined reference to `pthread_mutex_lock'
memory.c:(.text+0xa6e): undefined reference to `pthread_mutex_unlock'
/opt/OpenBLAS/libopenblas.a(blas_server.o): In function `blas_thread_server':
blas_server.c:(.text+0x273): undefined reference to `pthread_mutex_lock'
blas_server.c:(.text+0x287): undefined reference to `pthread_mutex_unlock'
blas_server.c:(.text+0x33f): undefined reference to `pthread_cond_wait'
/opt/OpenBLAS/libopenblas.a(blas_server.o): In function `blas_thread_init':
blas_server.c:(.text+0x416): undefined reference to `pthread_mutex_lock'
blas_server.c:(.text+0x4be): undefined reference to `pthread_mutex_init'
blas_server.c:(.text+0x4ca): undefined reference to `pthread_cond_init'
blas_server.c:(.text+0x4e0): undefined reference to `pthread_create'
blas_server.c:(.text+0x50f): undefined reference to `pthread_mutex_unlock'
...

The -lpthread is not required when linking dynamically against libopenblas.so.0.

  • Building OpenBLAS for Haswell or Dynamic Arch on RHEL-6, CentOS-6, Rocks-6.1,Scientific Linux 6

Minimum requirement to actually run AVX2-enabled software like OpenBLAS is kernel-2.6.32-358, shipped with EL6U4 in 2013

The binutils package from RHEL6 does not know the instruction vpermpd or any other AVX2 instruction. You can download a newer binutils package from Enterprise Linux software collections, following instructions here:
https://www.softwarecollections.org/en/scls/rhscl/devtoolset-3/
After configuring repository you need to install devtoolset-?-binutils to get later usable binutils package

$ yum search devtoolset-\?-binutils
$ sudo yum install devtoolset-4-binutils

once packages are installed check the correct name for SCL redirection set to enable new version

$ scl --list
devtoolset-4
rh-python35

Now just prefix your build commands with respective redirection:

$ scl enable devtoolset-4 -- make DYNAMIC_ARCH=1
  • Building OpenBLAS in QEMU/KVM

By default, QEMU reports the CPU as "QEMU Virtual CPU version 2.2.0", which OpenBLAS recognizes as PENTIUM2. Depending on the exact combination of CPU features the hypervisor choses to expose, this may not correspond to any CPU that exists, and OpenBLAS will error when trying to build. To fix this, pass -cpu host to QEMU, or another CPU model.

  • Building OpenBLAS for MIPS

For mips targets you will need latest toolchains P5600 - MTI GNU/Linux Toolchain I6400, P6600 - IMG GNU/Linux Toolchain

The download link is below (http://codescape-mips-sdk.imgtec.com/components/toolchain/2016.05-03/downloads.html)

You can use following commandlines for builds

IMG_TOOLCHAIN_DIR={full IMG GNU/Linux Toolchain path including "bin" directory -- for example, /opt/linux_toolchain/bin}
IMG_GCC_PREFIX=mips-img-linux-gnu
IMG_TOOLCHAIN=${IMG_TOOLCHAIN_DIR}/${IMG_GCC_PREFIX}

I6400 Build (n32):
make BINARY=32 BINARY32=1 CC=$IMG_TOOLCHAIN-gcc AR=$IMG_TOOLCHAIN-ar FC="$IMG_TOOLCHAIN-gfortran -EL -mabi=n32" RANLIB=$IMG_TOOLCHAIN-ranlib HOSTCC=gcc CFLAGS="-EL" FFLAGS=$CFLAGS LDFLAGS=$CFLAGS TARGET=I6400

I6400 Build (n64):
make BINARY=64 BINARY64=1 CC=$IMG_TOOLCHAIN-gcc AR=$IMG_TOOLCHAIN-ar FC="$IMG_TOOLCHAIN-gfortran -EL" RANLIB=$IMG_TOOLCHAIN-ranlib HOSTCC=gcc CFLAGS="-EL" FFLAGS=$CFLAGS LDFLAGS=$CFLAGS TARGET=I6400

P6600 Build (n32):
make BINARY=32 BINARY32=1 CC=$IMG_TOOLCHAIN-gcc AR=$IMG_TOOLCHAIN-ar FC="$IMG_TOOLCHAIN-gfortran -EL -mabi=n32" RANLIB=$IMG_TOOLCHAIN-ranlib HOSTCC=gcc CFLAGS="-EL" FFLAGS=$CFLAGS LDFLAGS=$CFLAGS TARGET=P6600

P6600 Build (n64):
make BINARY=64 BINARY64=1 CC=$IMG_TOOLCHAIN-gcc AR=$IMG_TOOLCHAIN-ar FC="$IMG_TOOLCHAIN-gfortran -EL" RANLIB=$IMG_TOOLCHAIN-ranlib HOSTCC=gcc CFLAGS="-EL" FFLAGS="$CFLAGS" LDFLAGS="$CFLAGS" TARGET=P6600

MTI_TOOLCHAIN_DIR={full MTI GNU/Linux Toolchain path including "bin" directory -- for example, /opt/linux_toolchain/bin}
MTI_GCC_PREFIX=mips-mti-linux-gnu
MTI_TOOLCHAIN=${IMG_TOOLCHAIN_DIR}/${IMG_GCC_PREFIX}

P5600 Build:

make BINARY=32 BINARY32=1 CC=$MTI_TOOLCHAIN-gcc AR=$MTI_TOOLCHAIN-ar FC="$MTI_TOOLCHAIN-gfortran -EL"    RANLIB=$MTI_TOOLCHAIN-ranlib HOSTCC=gcc CFLAGS="-EL" FFLAGS=$CFLAGS LDFLAGS=$CFLAGS TARGET=P5600

Usage

  • Program is Terminated. Because you tried to allocate too many memory regions

In OpenBLAS, we mange a pool of memory buffers and allocate the number of buffers as the following.

#define NUM_BUFFERS (MAX_CPU_NUMBER * 2)

This error indicates that the program exceeded the number of buffers.

Please build OpenBLAS with larger NUM_THREADS. For example, make NUM_THREADS=32 or make NUM_THREADS=64. In Makefile.system, we will set MAX_CPU_NUMBER=NUM_THREADS.

  • How to choose TARGET manually at runtime when compiled with DYNAMIC_ARCH

The environment variable which control the kernel selection is OPENBLAS_CORETYPE (see driver/others/dynamic.c) e.g. export OPENBLAS_CORETYPE=Haswell. And the function char* openblas_get_corename() returns the used target.

  • After updating the installed OpenBLAS, a program complains about "undefined symbol gotoblas"

This symbol gets defined only when OpenBLAS is built with "make DYNAMIC_ARCH=1" (which is what distributors will choose to ensure support for more than just one CPU type).