Be notified of new releases
Create your free GitHub account today to subscribe to this repository for new releases and build software alongside 31 million developers.Sign up
- loop unrolling in TRMV has been enabled again.
- A domain error in the thread workload distribution for SYRK
has been fixed.
- gmake builds will now automatically add -fPIC to the build
options if the platform requires it.
- a pthreads key leakage (and associate crash on dlclose) in
the USE_TLS codepath was fixed.
- building of the utest cases on systems that do not provide
an implementation of complex.h was fixed.
- the SkylakeX code was changed to compile on OSX.
- unwanted application of the -march=skylake-avx512 option
to the common code parts of a DYNAMIC_ARCH build was fixed.
- improved performance of SGEMM for small workloads on Skylake X.
- performance of SGEMM and DGEMM was improved on Haswell.
- a configuration error that broke the CNRM2 kernel was corrected.
- compilation of the GEMM kernels with CMAKE was fixed.
- DYNAMIC_ARCH builds are now available with CMAKE as well.
- using CMAKE for cross-compilation to the new cpu TARGETs
introduced in 0.3.4 now works.
- a problem in cpu autodetection for AIX has been corrected.
- the new, experimental thread-local memory allocation had
inadvertently been left enabled for gmake builds in 0.3.3
despite the announcement. It is now disabled by default, and
single-threaded builds will keep using the old allocator even
if the USE_TLS option is turned on.
- OpenBLAS will now provide enough buffer space for at least 50
threads by default.
- The output of openblas_get_config() now contains the version
- A serious thread safety bug in GEMV operation with small M and
large N size has been fixed.
- The code will now automatically call blas_thread_init after a
fork if needed before handling a call to openblas_set_num_threads
- Accesses to parallelized level3 functions from multiple callers
are now serialized to avoid thread races (unless using OpenMP).
This should provide better performance than the known-threadsafe
(but non-default) USE_SIMPLE_THREADED_LEVEL3 option.
- When building LAPACK with gfortran, -frecursive is now (again)
enabled by default to ensure correct behaviour.
- The OpenBLAS version cblas.h now supports both CBLAS_ORDER and
CBLAS_LAYOUT as the name of the matrix row/column order option.
- Externally set LDFLAGS are now passed through to the final compile/link
steps to facilitate setting platform-specific linker flags.
- A potential race condition during the build of LAPACK (that would
usually manifest itself as a failure to build TESTING/MATGEN) has been
- xHEMV has been changed to stay single-threaded for small input sizes
where the overhead of multithreading exceeds any possible gains
- CSWAP and ZSWAP have been limited to a single thread except on ARMV8 or
ThunderX hardware with sizable input.
- Linker flags for the PGI compiler have been updated
- Behaviour of AXPY with zero increments is now handled in the C interface,
correcting the result on at least Intel Atom.
- The result matrix from calling SGELSS with an all-zero input matrix is
now zeroed completely.
- Autodetection of AMD Ryzen2 has been fixed (again).
- CMAKE builds now support labeling of an INTERFACE64=1 build of
the library with the _64 suffix.
- AVX512 version of DGEMM has been added and the AVX512 SGEMM kernel
has been sped up by rewriting with C intrinsics
- Fixed compilation on RHEL5/CENTOS5 (issue with typename __WAIT_STATUS)
- added support for building on AIX (with gcc and GNU tools from AIX Toolbox).
- CPU type detection has been implemented for AIX.
- CPU type detection has been fixed for NETBSD.
- AXPY on LOONGSON3A has been corrected to pass "zero increment" utest.
- DSDOT on LOONGSON3A has been fixed.
- the SGEMM microkernel has been hardened against potential data loss.
- DYNAMic_ARCH support is now available for 64bit ARM
- cross-compiling for ARMV8 under iOS now works.
- cpu-specific code has been rearranged to make better use of both
hardware commonalities and model-specific compiler optimizations.
- XGENE1 has been removed as a TARGET, superseded by the improved generic
- Older assembly mnemonics have been converted to UAL form to allow
building with clang 7.0
- Cross compiling LAPACKE for Android has been fixed again (broken by
update to LAPACK 3.7.0 some while ago).
- thread memory allocation has been switched back to the method
used before version 0.3.1 due to unexpected problems caused by
the new code under some circumstances. A new compile-time option
USE_TLS has been added to allow enabling the new code instead,
and it is hoped that this can become the default again in the next version.
- LAPACK PR272 has been integrated, which fixes spurious errors
in DSYEVR and related functions caused by missing conversion
from ILAENV to ILAENV_2STAGE in several _2stage routines.
- the cmake-generated OpenBLASConfig.cmake now uses correct case
for the name of the library
- added support for Haiku OS
- added AVX512 implementations of SDOT, DDOT, SAXPY, DAXPY,
DSCAL, DGEMVN and DSYMVL
- added a workaround for a cygwin issue that prevented compilation
of AVX512 code
- added autodetection of Z14
- fixed TRMM errors in the generic target
- fixes for regressions caused by the rewrite of the thread initialization code in 0.3.1
- added autodetection of AMD Ryzen 2
- fixed build with older versions of MSVC
- fixed cpu autodetection for the BSDs
- fixed utest errors in AXPY, DSDOT, ROT and SWAP
- rewritten thread initialization code with significantly reduced overhead
- added CBLAS interfaces to the IxAMIN BLAS extension functions
- fixed the lapack-test target
- CMAKE builds now create an OpenBLASConfig.cmake file
- ZAXPY now uses a single thread for small input sizes
- the LAPACK code was updated from Reference-LAPACK/lapack#253
- corrected CROT and ZROT behaviour with zero INC_X
- corrected xDOT behaviour with zero INC_X or INC_Y
- retired some older targets of DYNAMIC_ARCH builds to a new option DYNAMIC_OLDER,
this affects PENRYN,DUNNINGTON,OPTERON,OPTERON_SSE3,BOBCAT,ATOM and NANO
(which will still be supported via the slower PRESCOTT kernels when this option is not set)
- added an option DYNAMIC_LIST that (used in conjunction with DYNAMIC_ARCH) allows
to specify the list of x86_64 targets to include. Any target not on the list will be supported by
the Sandybridge or Nehalem kernels if available, or by Prescott.
- improved SWITCH_RATIO on Haswell for increased GEMM throughput
- added initial support for Intel Skylake X, including an AVX512 SGEMM kernel
- added autodetection of Intel Cannon Lake series as Skylake X
- added a default L2 cache size for hypervisors that return zero here (Chromebook)
- fixed a name clash with recent Windows10 headers that broke the build with (at least)
recent mingw from MSYS2
- fixed a link error in mixed clang/gfortran builds with OpenMP
- updated the OSX deployment target to 10.8
- switched on parallel make for builds on MS Windows by default
- fixed SSWAP and DSWAP behaviour with zero INC_X and INC_Y
* fixed some more thread race and locking bugs * added preliminary support for calling an OpenMP build of the library from multiple threads * removed performance impact of thread locks added in 0.2.20 on OpenMP code * general code cleanup * optimized DSDOT implementation * improved thread distribution for GEMM * corrected IMATCOPY/OMATCOPY implementation * fixed out-of-bounds accesses in the multithreaded xBMV/xPMV and SYMV implementations * cmake build improvements * pkgconfig file now contains build options * openblas_get_config() now reports USE_OPENMP and NUM_THREADS settings used for the build * corrections and improvements for systems with more than 64 cpus * LAPACK code updated to 3.8.0 including later fixes * added ReLAPACK, a recursive implementation of several LAPACK functions * Rewrote ROTMG to handle cases that the netlib code failed to address * Disabled (broken) multithreading code for xTRMV * corrected prototypes of complex CBLAS functions to make our cblas.h match the generally accepted standard * shared memory access failures on startup are now handled more gracefully * restored utests from earlier releases (and made them pass on all affected systems)
* several fixes for cpu autodetection
* corrected vector register overwriting in several Power8 kernels * optimized additional BLAS functions
* added support for CortexA53 and A72 * added autodetection for ThunderX2T99 * made most optimized kernels the default for generic ARMv8 targets
* parallelized DDOT kernel for Haswell * changed alignment directives in assembly kernels to boost performance on OSX * fixed register handling in the GEMV microkernels (bug exposed by gcc7) * added support for building on OpenBSD and Dragonfly * updated compiler options to work with Intel release 2018 * support fully optimized build with clang/flang on Microsoft Windows * fixed building on AIX
* added optimized BLAS 1/2 functions
* fixed cpu autodetection helper code * added mips32 1004K cpu (Mediatek MT7621 and similar SoC) * added mips64 I6500 cpu
* Improved CMake support * Fixed several thread race and locking bugs * Fixed default LAPACK optimization level * Updated LAPACK to 3.7.0 * Added ReLAPACK (https://github.com/HPAC/ReLAPACK), make BUILD_RELAPACK=1
* Optimizations for Power9 * Fixed several Power8 assembly bugs
* New optimized Vulcan and ThunderX2T99 targets * Support for ARMV7 SOFT_FP ABI (make ARM_SOFTFP_ABI=1) * Detect all cpu cores including offline ones * Fix compilation with CLANG * Support building a shared library for Android
* Fixed several threading issues * Fix compilation with CLANG
* Detect Intel Bay Trail and Apollo Lake * Detect Intel Sky Lake and Kaby Lake * Detect Intel Knights Landing * Detect AMD A8, A10, A12 and Ryzen * Support 64bit builds with Visual Studio * Fix building with Intel and PGI compilers * Fix building with MINGW and TDM-GCC * Fix cmake builds for Haswell and related cpus * Fix building for Sandybridge with CLANG 3.9 * Add support for the FLANG compiler
* New target z13 with BLAS3 optimizations
(https://sourceforge.net/projects/openblas/files/v0.2.20/OpenBLAS 0.2.20 version.zip/download)
* Improved cross compiling. * Fix the bug on musl libc.
* Optimize BLAS on Power8 * Fixed Julia+OpenBLAS bugs on Power8
* Optimize BLAS on MIPS P5600 and I6400 (Thanks, Shivraj Patil, Kaustubh Raste)
* Improved on ARM Cortex-A57. (Thanks, Ashwin Sekhar T K)
(https://sourceforge.net/projects/openblas/files/v0.2.19/OpenBLAS 0.2.19 version.zip/download)
- If you set MAKE_NB_JOBS flag less or equal than zero, make will be without -j.
- Support building Visual Studio static library. (#813, Thanks, theoractice)
- Fix bugs to pass buidbot CI tests (http://build.openblas.net)
- Provide DGEMM 8x4 kernel for Cortex-A57 (Thanks, Ashwin Sekhar T K)
- Optimize S and C BLAS3 on Power8
- Optimize BLAS2/1 on Power8
(https://sourceforge.net/projects/openblas/files/v0.2.18/OpenBLAS 0.2.18 version.zip/download)
- Enable BUILD_LAPACK_DEPRECATED=1 by default.
(https://sourceforge.net/projects/openblas/files/v0.2.17/OpenBLAS 0.2.17 version.zip/download)