Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
FORTRAN Assembly C C++ Makefile CMake Other
Failed to load latest commit information.
benchmark added benchmark scripts for numpy, octave and R
ctest bugfix for gemm3m tests
driver add missing barriers
exports 1) Refs #575. Remove g77 from compiler list.
interface use only 1 thread if m or n < 2*GEMM_MULTITHREAD_THRESHOLD
kernel Use C kernels for s/dgemv on x86.
lapack-netlib added optimized dsymv kernels for haswell
lapack Remove all trailing whitespace except lapack-netlib
reference Remove all trailing whitespace except lapack-netlib
test bugfix for GEMM3M functions
utest Add test for drotmg bug fixed by 692b14c
.gitignore .gitignore: add some more entries concerned with kernel
.travis.yml Add notification.
BACKERS.md Added backers.
CONTRIBUTORS.md Fix issue #508
Changelog.txt Update the doc for 0.2.14.
GotoBLAS_00License.txt rename documents in GotoBLAS.
GotoBLAS_01Readme.txt Remove all trailing whitespace except lapack-netlib
GotoBLAS_02QuickInstall.txt Remove all trailing whitespace except lapack-netlib
GotoBLAS_03FAQ.txt Remove all trailing whitespace except lapack-netlib
GotoBLAS_04FAQ.txt rename documents in GotoBLAS.
GotoBLAS_05LargePage.txt Correct typo /proc/ instead of /pros/
GotoBLAS_06WeirdPerformance.txt Remove all trailing whitespace except lapack-netlib
LICENSE Update organization info.
Makefile 1) Refs #575. Remove g77 from compiler list.
Makefile.alpha Remove all trailing whitespace except lapack-netlib
Makefile.arm use real armv5 support
Makefile.arm64 added experimental support for ARMV8
Makefile.generic Respect user's LDFLAGS
Makefile.ia64 Remove all trailing whitespace except lapack-netlib
Makefile.install install OpenBLASConfigVersion.cmake
Makefile.mips64 Import GotoBLAS2 1.13 BSD version codes.
Makefile.power Remove all trailing whitespace except lapack-netlib
Makefile.prebuild get rid of the generated cblas_noconst.h file
Makefile.rule Enable MAX_STACK_ALLOC by default.
Makefile.sparc Remove all trailing whitespace except lapack-netlib
Makefile.system Use pure C generic target on x86 and x86_64.
Makefile.tail Remove all trailing whitespace except lapack-netlib
Makefile.x86 Remove all trailing whitespace except lapack-netlib
Makefile.x86_64 Remove all trailing whitespace except lapack-netlib
README.md Added Gitter badge
TargetList.txt Add POWER7/POWER8 as targets
c_check Support Android NDK armeabi-v7a-hard ABI. (-mfloat-abi=hard)
cblas.h Add ATLAS-style ?geadd function
cblas_noconst.h Add ATLAS-style ?geadd function
common.h correct a minor mistake
common_alpha.h add fallback blas_lock implementation
common_arm.h use real armv5 support
common_arm64.h really fix ARM64 locking
common_c.h Add ATLAS-style ?geadd function
common_d.h Add ATLAS-style ?geadd function
common_ia64.h add fallback blas_lock implementation
common_interface.h Add ATLAS-style ?geadd function
common_lapack.h Import GotoBLAS2 1.13 BSD version codes.
common_level1.h Remove all trailing whitespace except lapack-netlib
common_level2.h Remove all trailing whitespace except lapack-netlib
common_level3.h Add ATLAS-style ?geadd function
common_linux.h Remove all trailing whitespace except lapack-netlib
common_macro.h Add ATLAS-style ?geadd function
common_mips64.h add fallback blas_lock implementation
common_param.h Add ATLAS-style ?geadd function
common_power.h add fallback blas_lock implementation
common_q.h Import GotoBLAS2 1.13 BSD version codes.
common_reference.h Update organization info.
common_s.h Add ATLAS-style ?geadd function
common_sparc.h add fallback blas_lock implementation
common_thread.h Remove all trailing whitespace except lapack-netlib
common_x.h Import GotoBLAS2 1.13 BSD version codes.
common_x86.h add fallback blas_lock implementation
common_x86_64.h add fallback blas_lock implementation
common_z.h Refs #509. Fixed geadd building bug with DYNAMIC_ARCH=1.
cpuid.S Remove all trailing whitespace except lapack-netlib
cpuid.h Add AMD Excavator target.
cpuid_alpha.c Remove all trailing whitespace except lapack-netlib
cpuid_arm.c set ARMV7 for Cortex-A9 and Cortex-A15
cpuid_arm64.c # The first commit's message is:
cpuid_ia64.c Remove all trailing whitespace except lapack-netlib
cpuid_mips.c Update organization info.
cpuid_power.c Add POWER7/POWER8 as targets
cpuid_sparc.c refs #55. Added DTB_ENTRIES into dynamic arch setting parameters. Now…
cpuid_x86.c Add AMD Excavator target.
ctest.c Support Android NDK armeabi-v7a-hard ABI. (-mfloat-abi=hard)
ctest1.c Import GotoBLAS2 1.13 BSD version codes.
ctest2.c Import GotoBLAS2 1.13 BSD version codes.
f_check Fix f_check bug.
ftest.f Remove all trailing whitespace except lapack-netlib
ftest2.f Import GotoBLAS2 1.13 BSD version codes.
ftest3.f Remove all trailing whitespace except lapack-netlib
getarch.c use real armv5 support
getarch_2nd.c Remove all trailing whitespace except lapack-netlib
l1param.h Added BULLDOZER target. So far it uses barcelona kernels.
l2param.h Support AMD Piledriver by bulldozer kernels.
lapack-devel.log Remove all trailing whitespace except lapack-netlib
make.inc added optimized dsymv kernels for haswell
openblas_config_template.h Fixed #315. Added OPENBLAS_ prefix to openblas_config.h.
param.h modified haswell parameter dgemm_unroll_n
quickbuild.32bit Import GotoBLAS2 1.13 BSD version codes.
quickbuild.64bit Import GotoBLAS2 1.13 BSD version codes.
quickbuild.win32 Added the tip for Windows.
quickbuild.win64 Refs #63. delete prefix for mingw64 toolchain.
segfaults.patch Remove all trailing whitespace except lapack-netlib
symcopy.h Remove all trailing whitespace except lapack-netlib
version.h Update organization info.

README.md

OpenBLAS

Join the chat at https://gitter.im/xianyi/OpenBLAS

Build Status

Introduction

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.

Please read the documents on OpenBLAS wiki pages http://github.com/xianyi/OpenBLAS/wiki.

Binary Packages

We provide binary packages for the following platform.

  • Windows x86/x86_64

You can download them from file hosting on sourceforge.net.

Installation from Source

Download from project homepage. http://xianyi.github.com/OpenBLAS/

Or, check out codes from git://github.com/xianyi/OpenBLAS.git

Normal compile

  • type "make" to detect the CPU automatically. or
  • type "make TARGET=xxx" to set target CPU, e.g. "make TARGET=NEHALEM". The full target list is in file TargetList.txt.

Cross compile

Please set CC and FC with the cross toolchains. Then, set HOSTCC with your host C compiler. At last, set TARGET explicitly.

Examples:

On X86 box, compile this library for loongson3a CPU.

make BINARY=64 CC=mips64el-unknown-linux-gnu-gcc FC=mips64el-unknown-linux-gnu-gfortran HOSTCC=gcc TARGET=LOONGSON3A

On X86 box, compile this library for loongson3a CPU with loongcc (based on Open64) compiler.

make CC=loongcc FC=loongf95 HOSTCC=gcc TARGET=LOONGSON3A CROSS=1 CROSS_SUFFIX=mips64el-st-linux-gnu-   NO_LAPACKE=1 NO_SHARED=1 BINARY=32

Debug version

make DEBUG=1

Install to the directory (optional)

Example:

make install PREFIX=your_installation_directory

The default directory is /opt/OpenBLAS

Support CPU & OS

Please read GotoBLAS_01Readme.txt

Additional support CPU:

x86/x86-64:

  • Intel Xeon 56xx (Westmere): Used GotoBLAS2 Nehalem codes.
  • Intel Sandy Bridge: Optimized Level-3 and Level-2 BLAS with AVX on x86-64.
  • Intel Haswell: Optimized Level-3 and Level-2 BLAS with AVX2 and FMA on x86-64.
  • AMD Bobcat: Used GotoBLAS2 Barcelona codes.
  • AMD Bulldozer: x86-64 ?GEMM FMA4 kernels. (Thank Werner Saar)
  • AMD PILEDRIVER: Uses Bulldozer codes with some optimizations.
  • AMD STEAMROLLER: Uses Bulldozer codes with some optimizations.

MIPS64:

  • ICT Loongson 3A: Optimized Level-3 BLAS and the part of Level-1,2.
  • ICT Loongson 3B: Experimental

ARM:

  • ARMV6: Optimized BLAS for vfpv2 and vfpv3-d16 ( e.g. BCM2835, Cortex M0+ )
  • ARMV7: Optimized BLAS for vfpv3-d32 ( e.g. Cortex A8, A9 and A15 )

ARM64:

  • ARMV8: Experimental

Support OS:

Usages

Link with libopenblas.a or -lopenblas for shared library.

Set the number of threads with environment variables.

Examples:

export OPENBLAS_NUM_THREADS=4

or

export GOTO_NUM_THREADS=4

or

export OMP_NUM_THREADS=4

The priorities are OPENBLAS_NUM_THREADS > GOTO_NUM_THREADS > OMP_NUM_THREADS.

If you compile this lib with USE_OPENMP=1, you should set OMP_NUM_THREADS environment variable. OpenBLAS ignores OPENBLAS_NUM_THREADS and GOTO_NUM_THREADS with USE_OPENMP=1.

Set the number of threads on runtime.

We provided the below functions to control the number of threads on runtime.

void goto_set_num_threads(int num_threads);

void openblas_set_num_threads(int num_threads);

If you compile this lib with USE_OPENMP=1, you should use the above functions, too.

Report Bugs

Please add a issue in https://github.com/xianyi/OpenBLAS/issues

Contact

ChangeLog

Please see Changelog.txt to obtain the differences between GotoBLAS2 1.13 BSD version.

Troubleshooting

  • Please read Faq at first.
  • Please use gcc version 4.6 and above to compile Sandy Bridge AVX kernels on Linux/MingW/BSD.
  • Please use Clang version 3.1 and above to compile the library on Sandy Bridge microarchitecture. The Clang 3.0 will generate the wrong AVX binary code.
  • The number of CPUs/Cores should less than or equal to 256. On Linux x86_64(amd64), there is experimental support for up to 1024 CPUs/Cores and 128 numa nodes if you build the library with BIGNUMA=1.
  • OpenBLAS does not set processor affinity by default. On Linux, you can enable processor affinity by commenting the line NO_AFFINITY=1 in Makefile.rule. But this may cause the conflict with R parallel.
  • On Loongson 3A. make test would be failed because of pthread_create error. The error code is EAGAIN. However, it will be OK when you run the same testcase on shell.

Contributing

  1. Check for open issues or open a fresh issue to start a discussion around a feature idea or a bug.
  2. Fork the OpenBLAS repository to start making your changes.
  3. Write a test which shows that the bug was fixed or that the feature works as expected.
  4. Send a pull request. Make sure to add yourself to CONTRIBUTORS.md.

Donation

Please read this wiki page.

Something went wrong with that request. Please try again.