OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
Latest commit 71c6dee Oct 18, 2018
Permalink
Failed to load latest commit information.
benchmark Disable scal to benchmark zgemv separately by default Aug 9, 2018
cmake Merge pull request #1812 from martin-frbg/issue1806-2 Oct 11, 2018
ctest Handle special case of gfortran+clang+OpenMP Jun 19, 2018
driver ARM64: Use THUNDERX2T99 Neon Kernels for ARMV8 Oct 17, 2018
exports Add `$(LDFLAGS)` to `$(CC)` and `$(FC)` invocations within `exports/M… Sep 21, 2018
interface ARM64: Use THUNDERX2T99 Neon Kernels for ARMV8 Oct 17, 2018
kernel ARM64: Use THUNDERX2T99 Neon Kernels for ARMV8 Oct 17, 2018
lapack-netlib fix parallel build issues with APFS/HFS+/ext2/3 in netlib-lapack Oct 6, 2018
lapack Change _STDC_VERSION__ to __STDC_VERSION__ May 11, 2018
reference Remove all trailing whitespace except lapack-netlib Jun 27, 2014
relapack Add cmake build list file for ReLAPACK Oct 12, 2017
test Handle special case of gfortran+clang+OpenMP Jun 19, 2018
utest Fix unknown type name __WAIT_STATUS on RHEL5 Oct 4, 2018
.gitignore Don't change timestamps Aug 1, 2017
.travis.yml update travis alpine chroot with avx512 intrinsics headers Oct 5, 2018
BACKERS.md Added backers. Sep 5, 2013
CMakeLists.txt Add SYMBOLPREFIX and -SUFFIX options and improve help output Oct 6, 2018
CONTRIBUTORS.md Optimized standard Blas Level-1,2 (excluding nrm2 functions) for z13 … Sep 6, 2017
Changelog.txt Update with the changes from 0.3.3 Aug 30, 2018
GotoBLAS_00License.txt rename documents in GotoBLAS. Jan 24, 2011
GotoBLAS_01Readme.txt Remove all trailing whitespace except lapack-netlib Jun 27, 2014
GotoBLAS_02QuickInstall.txt Remove all trailing whitespace except lapack-netlib Jun 27, 2014
GotoBLAS_03FAQ.txt Remove all trailing whitespace except lapack-netlib Jun 27, 2014
GotoBLAS_04FAQ.txt rename documents in GotoBLAS. Jan 24, 2011
GotoBLAS_05LargePage.txt Correct typo /proc/ instead of /pros/ Mar 20, 2015
GotoBLAS_06WeirdPerformance.txt Remove all trailing whitespace except lapack-netlib Jun 27, 2014
LICENSE Update organization info. Nov 25, 2014
Makefile Merge pull request #1799 from martin-frbg/issue1796 Oct 9, 2018
Makefile.alpha Remove all trailing whitespace except lapack-netlib Jun 27, 2014
Makefile.arm arm: Determine the abi from compiler if not specified on command line Jun 30, 2017
Makefile.arm64 arm64: Change mtune/mcpu options for THUNDERX2T99 target Jul 1, 2017
Makefile.generic Respect user's LDFLAGS Jul 25, 2013
Makefile.ia64 Remove all trailing whitespace except lapack-netlib Jun 27, 2014
Makefile.install Haiku supporting patches Aug 2, 2018
Makefile.mips MIPS P5600(32 bit) and I6400(64 bit) cores support added. Apr 22, 2016
Makefile.mips64 Import GotoBLAS2 1.13 BSD version codes. Jan 24, 2011
Makefile.power build: fix libxlmass errors building on Power CPU May 24, 2017
Makefile.prebuild Add mips32r2 api target May 2, 2018
Makefile.rule Document the stub status of the QUAD_PRECiSION code (#1772) Sep 22, 2018
Makefile.sparc Remove all trailing whitespace except lapack-netlib Jun 27, 2014
Makefile.system Merge pull request #1799 from martin-frbg/issue1796 Oct 9, 2018
Makefile.tail Remove all trailing whitespace except lapack-netlib Jun 27, 2014
Makefile.x86 Remove all trailing whitespace except lapack-netlib Jun 27, 2014
Makefile.x86_64 Use cygwin compilation workaround for avx512 on msys2/mingw64 as well Oct 9, 2018
Makefile.zarch dtrmm and dgemm for z13 Jan 4, 2017
README.md add short blurb about avx512 and needed compiler to README Aug 11, 2018
TargetList.txt Initial support for SkylakeX / AVX512 Jun 3, 2018
USAGE.md Underline importance of NUM_THREADS setting for BUFFER allocation Apr 4, 2018
appveyor.yml Appveyor: enable building fortran with ninja Dec 30, 2017
c_check Check availability of immintrin.h in the AVX512 compatibility test Oct 4, 2018
cblas.h just make CBLAS_LAYOUT an alias of the existing CBLAS_ORDER Sep 6, 2018
common.h fix blasabs for windows Aug 5, 2018
common_alpha.h add fallback blas_lock implementation Aug 16, 2015
common_arm.h arm: Determine the abi from compiler if not specified on command line Jun 30, 2017
common_arm64.h build: LLVM: Add Flang compiler support and enable OpenMP for Clang May 25, 2017
common_c.h Improved Ximatcopy when lda==ldb. Sep 7, 2015
common_d.h Improved Ximatcopy when lda==ldb. Sep 7, 2015
common_ia64.h add fallback blas_lock implementation Aug 16, 2015
common_interface.h Add ATLAS-style ?geadd function Feb 16, 2015
common_lapack.h Import GotoBLAS2 1.13 BSD version codes. Jan 24, 2011
common_level1.h Changed _Complex types in common_level1.h to use the typedef. Feb 11, 2015
common_level2.h Remove all trailing whitespace except lapack-netlib Jun 27, 2014
common_level3.h Improved Ximatcopy when lda==ldb. Sep 7, 2015
common_linux.h Init IBM z system (s390x) porting. Apr 15, 2016
common_macro.h ARM64: Add the VULCAN Target Jan 10, 2017
common_mips.h mips: remove incorrect blas_lock implementations May 5, 2017
common_mips64.h Update common_mips64.h Oct 9, 2018
common_param.h Correct zgeadd_k prototype Nov 29, 2017
common_power.h optimized dgemm for 20 threads May 16, 2016
common_q.h Import GotoBLAS2 1.13 BSD version codes. Jan 24, 2011
common_reference.h Update organization info. Nov 25, 2014
common_s.h Improved Ximatcopy when lda==ldb. Sep 7, 2015
common_sparc.h add fallback blas_lock implementation Aug 16, 2015
common_stackalloc.h Avoid declaring arrays of size 0 when making large stack allocations. Jun 20, 2018
common_thread.h Remove all trailing whitespace except lapack-netlib Jun 27, 2014
common_x.h Import GotoBLAS2 1.13 BSD version codes. Jan 24, 2011
common_x86.h Merge pull request #1542 from martin-frbg/quickdiv64 May 2, 2018
common_x86_64.h make WMB / MB safer on x86-64 Jun 17, 2018
common_z.h Improved Ximatcopy when lda==ldb. Sep 7, 2015
common_zarch.h dtrmm and dgemm for z13 Jan 4, 2017
cpuid.S Remove all trailing whitespace except lapack-netlib Jun 27, 2014
cpuid.h Initial support for SkylakeX / AVX512 Jun 3, 2018
cpuid_alpha.c Remove all trailing whitespace except lapack-netlib Jun 27, 2014
cpuid_arm.c Fix for issue #1024: arm-linux-androideabi-g++ Compiler Error in /cpu… Dec 2, 2016
cpuid_arm64.c ARM64: Enable Auto Detection of ThunderX2T99 Apr 19, 2018
cpuid_ia64.c Remove all trailing whitespace except lapack-netlib Jun 27, 2014
cpuid_mips.c Make cpuid_mips compile again and add 1004K cpu May 2, 2018
cpuid_mips64.c Added mips I6500 core Sep 22, 2017
cpuid_power.c Add cpu identification via mfpvr call for the BSDs Jul 12, 2018
cpuid_sparc.c Fix my copypaste blunder with get_corename Feb 1, 2018
cpuid_x86.c Add cpuid for AMD Ryzen 2 Jul 3, 2018
cpuid_zarch.c detect z14 arch on s390x Aug 14, 2018
ctest.c Haiku supporting patches Aug 2, 2018
ctest1.c Import GotoBLAS2 1.13 BSD version codes. Jan 24, 2011
ctest2.c Import GotoBLAS2 1.13 BSD version codes. Jan 24, 2011
f_check Fixes for ifort 2018 May 8, 2018
ftest.f Remove all trailing whitespace except lapack-netlib Jun 27, 2014
ftest2.f Import GotoBLAS2 1.13 BSD version codes. Jan 24, 2011
ftest3.f Remove all trailing whitespace except lapack-netlib Jun 27, 2014
gen_config_h.c Add 64bit support for Microsoft Visual Studio Jun 21, 2017
getarch.c Enable parallel make on MS Windows by default Jun 9, 2018
getarch_2nd.c Delete LOCAL_BUFFER_SIZE for other architectures. Apr 12, 2016
l1param.h Added BULLDOZER target. So far it uses barcelona kernels. Dec 6, 2012
l2param.h Support AMD Piledriver by bulldozer kernels. Jul 6, 2013
make.inc (Plain make) build system fixes for AIX Sep 17, 2017
openblas.pc.in Rename blas.pc.in to openblas.pc.in Feb 12, 2017
openblas_config_template.h Fix complex support for MSVC headers Jul 28, 2017
param.h ARM64: Use THUNDERX2T99 Neon Kernels for ARMV8 Oct 17, 2018
quickbuild.32bit Import GotoBLAS2 1.13 BSD version codes. Jan 24, 2011
quickbuild.64bit Import GotoBLAS2 1.13 BSD version codes. Jan 24, 2011
quickbuild.win32 Added the tip for Windows. Aug 9, 2012
quickbuild.win64 Refs #63. delete prefix for mingw64 toolchain. Apr 27, 2014
segfaults.patch Remove all trailing whitespace except lapack-netlib Jun 27, 2014
symcopy.h Changed a number of inline calls to use __inline. Feb 11, 2015
version.h Update organization info. Nov 25, 2014

README.md

OpenBLAS

Join the chat at https://gitter.im/xianyi/OpenBLAS

Travis CI: Build Status

AppVeyor: Build status

Introduction

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.

Please read the documentation on the OpenBLAS wiki pages: http://github.com/xianyi/OpenBLAS/wiki.

Binary Packages

We provide official binary packages for the following platform:

  • Windows x86/x86_64

You can download them from file hosting on sourceforge.net.

Installation from Source

Download from project homepage, http://xianyi.github.com/OpenBLAS/, or check out the code using Git from https://github.com/xianyi/OpenBLAS.git.

Dependencies

Building OpenBLAS requires the following to be installed:

  • GNU Make
  • A C compiler, e.g. GCC or Clang
  • A Fortran compiler (optional, for LAPACK)
  • IBM MASS (optional, see below)

Normal compile

Simply invoking make (or gmake on BSD) will detect the CPU automatically. To set a specific target CPU, use make TARGET=xxx, e.g. make TARGET=NEHALEM. The full target list is in the file TargetList.txt.

Cross compile

Set CC and FC to point to the cross toolchains, and set HOSTCC to your host C compiler. The target must be specified explicitly when cross compiling.

Examples:

  • On an x86 box, compile this library for a loongson3a CPU:

    make BINARY=64 CC=mips64el-unknown-linux-gnu-gcc FC=mips64el-unknown-linux-gnu-gfortran HOSTCC=gcc TARGET=LOONGSON3A
  • On an x86 box, compile this library for a loongson3a CPU with loongcc (based on Open64) compiler:

    make CC=loongcc FC=loongf95 HOSTCC=gcc TARGET=LOONGSON3A CROSS=1 CROSS_SUFFIX=mips64el-st-linux-gnu-   NO_LAPACKE=1 NO_SHARED=1 BINARY=32

Debug version

A debug version can be built using make DEBUG=1.

Compile with MASS support on Power CPU (optional)

The IBM MASS library consists of a set of mathematical functions for C, C++, and Fortran applications that are are tuned for optimum performance on POWER architectures. OpenBLAS with MASS requires a 64-bit, little-endian OS on POWER. The library can be installed as shown:

  • On Ubuntu:

    wget -q http://public.dhe.ibm.com/software/server/POWER/Linux/xl-compiler/eval/ppc64le/ubuntu/public.gpg -O- | sudo apt-key add -
    echo "deb http://public.dhe.ibm.com/software/server/POWER/Linux/xl-compiler/eval/ppc64le/ubuntu/ trusty main" | sudo tee /etc/apt/sources.list.d/ibm-xl-compiler-eval.list
    sudo apt-get update
    sudo apt-get install libxlmass-devel.8.1.5
  • On RHEL/CentOS:

    wget http://public.dhe.ibm.com/software/server/POWER/Linux/xl-compiler/eval/ppc64le/rhel7/repodata/repomd.xml.key
    sudo rpm --import repomd.xml.key
    wget http://public.dhe.ibm.com/software/server/POWER/Linux/xl-compiler/eval/ppc64le/rhel7/ibm-xl-compiler-eval.repo
    sudo cp ibm-xl-compiler-eval.repo /etc/yum.repos.d/
    sudo yum install libxlmass-devel.8.1.5

After installing the MASS library, compile OpenBLAS with USE_MASS=1. For example, to compile on Power8 with MASS support: make USE_MASS=1 TARGET=POWER8.

Install to a specific directory (optional)

Use PREFIX= when invoking make, for example

make install PREFIX=your_installation_directory

The default installation directory is /opt/OpenBLAS.

Supported CPUs and Operating Systems

Please read GotoBLAS_01Readme.txt.

Additional supported CPUs

x86/x86-64

  • Intel Xeon 56xx (Westmere): Used GotoBLAS2 Nehalem codes.
  • Intel Sandy Bridge: Optimized Level-3 and Level-2 BLAS with AVX on x86-64.
  • Intel Haswell: Optimized Level-3 and Level-2 BLAS with AVX2 and FMA on x86-64.
  • Intel Skylake: Optimized Level-3 and Level-2 BLAS with AVX512 and FMA on x86-64.
  • AMD Bobcat: Used GotoBLAS2 Barcelona codes.
  • AMD Bulldozer: x86-64 ?GEMM FMA4 kernels. (Thanks to Werner Saar)
  • AMD PILEDRIVER: Uses Bulldozer codes with some optimizations.
  • AMD STEAMROLLER: Uses Bulldozer codes with some optimizations.

MIPS64

  • ICT Loongson 3A: Optimized Level-3 BLAS and the part of Level-1,2.
  • ICT Loongson 3B: Experimental

ARM

  • ARMv6: Optimized BLAS for vfpv2 and vfpv3-d16 (e.g. BCM2835, Cortex M0+)
  • ARMv7: Optimized BLAS for vfpv3-d32 (e.g. Cortex A8, A9 and A15)

ARM64

  • ARMv8: Experimental
  • ARM Cortex-A57: Experimental

PPC/PPC64

  • POWER8: Optmized Level-3 BLAS and some Level-1, only with USE_OPENMP=1

IBM zEnterprise System

  • Z13: Optimized Level-3 BLAS and Level-1,2 (double precision)

Supported OS

Usage

Statically link with libopenblas.a or dynamically link with -lopenblas if OpenBLAS was compiled as a shared library.

Setting the number of threads using environment variables

Environment variables are used to specify a maximum number of threads. For example,

export OPENBLAS_NUM_THREADS=4
export GOTO_NUM_THREADS=4
export OMP_NUM_THREADS=4

The priorities are OPENBLAS_NUM_THREADS > GOTO_NUM_THREADS > OMP_NUM_THREADS.

If you compile this library with USE_OPENMP=1, you should set the OMP_NUM_THREADS environment variable; OpenBLAS ignores OPENBLAS_NUM_THREADS and GOTO_NUM_THREADS when compiled with USE_OPENMP=1.

Setting the number of threads at runtime

We provide the following functions to control the number of threads at runtime:

void goto_set_num_threads(int num_threads);
void openblas_set_num_threads(int num_threads);

If you compile this library with USE_OPENMP=1, you should use the above functions too.

Reporting bugs

Please submit an issue in https://github.com/xianyi/OpenBLAS/issues.

Contact

Change log

Please see Changelog.txt to view the differences between OpenBLAS and GotoBLAS2 1.13 BSD version.

Troubleshooting

  • Please read the FAQ first.
  • Please use GCC version 4.6 and above to compile Sandy Bridge AVX kernels on Linux/MinGW/BSD.
  • Please use Clang version 3.1 and above to compile the library on Sandy Bridge microarchitecture. Clang 3.0 will generate the wrong AVX binary code.
  • Please use GCC version 6 or LLVM version 6 and above to compile Skyalke AVX512 kernels.
  • The number of CPUs/cores should less than or equal to 256. On Linux x86_64 (amd64), there is experimental support for up to 1024 CPUs/cores and 128 numa nodes if you build the library with BIGNUMA=1.
  • OpenBLAS does not set processor affinity by default. On Linux, you can enable processor affinity by commenting out the line NO_AFFINITY=1 in Makefile.rule. However, note that this may cause a conflict with R parallel.
  • On Loongson 3A, make test may fail with a pthread_create error (EAGAIN). However, it will be okay when you run the same test case on the shell.

Contributing

  1. Check for open issues or open a fresh issue to start a discussion around a feature idea or a bug.
  2. Fork the OpenBLAS repository to start making your changes.
  3. Write a test which shows that the bug was fixed or that the feature works as expected.
  4. Send a pull request. Make sure to add yourself to CONTRIBUTORS.md.

Donation

Please read this wiki page.