Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Info & discussion: NumPy and BLAS #271

Open
amitdo opened this issue Dec 9, 2017 · 2 comments
Open

Info & discussion: NumPy and BLAS #271

amitdo opened this issue Dec 9, 2017 · 2 comments

Comments

@amitdo
Copy link
Contributor

amitdo commented Dec 9, 2017

NumPy can use the BLAS library to do some of its operations.

On Debian 9, if you install numpy globally via apt-get, libblas3 will also be installed.
There are 2 other open source BLAS implementations which are generally more optimized for speed.
https://wiki.debian.org/DebianScience/LinearAlgebraLibraries

According to my testing you can get a nice speedup when using an accelerated BLAS lib with ocropy.

@Crabat
Copy link

Crabat commented Dec 15, 2017

I did some tests on a MacBook (late 2013, 2,8 GHz Intel Core i7, macOS High Sierra: 10.13.2, python-2.7.14, numpy-1.13.3, scipy-1.0.0).

Test suite: 50 MB real world training data, ocropus-rtrain -N 30000.

  1. original code: 398min (100%)
  2. with @amitdo Replace native code with regular functions #265 changes: 259min (65%)
  3. numpy.einsum(): optimize=True: 266min
  4. numpy.einsum(): optimize='optimal': 264min

Numpy is linked to the Accelerate Framework:

$ python2 -c 'import numpy; numpy.show_config()'
lapack_opt_info:
extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
extra_compile_args = ['-msse3']
define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]
openblas_lapack_info:
NOT AVAILABLE
atlas_3_10_blas_threads_info:
NOT AVAILABLE
atlas_threads_info:
NOT AVAILABLE
atlas_3_10_threads_info:
NOT AVAILABLE
atlas_blas_info:
NOT AVAILABLE
atlas_3_10_blas_info:
NOT AVAILABLE
atlas_blas_threads_info:
NOT AVAILABLE
openblas_info:
NOT AVAILABLE
blas_mkl_info:
NOT AVAILABLE
blas_opt_info:
extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
extra_compile_args = ['-msse3', '-I/System/Library/Frameworks/vecLib.framework/Headers']
define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]
blis_info:
NOT AVAILABLE
atlas_info:
NOT AVAILABLE
atlas_3_10_info:
NOT AVAILABLE
lapack_mkl_info:
NOT AVAILABLE

35% speed improvement. Thank you, @amitdo!

@amitdo
Copy link
Contributor Author

amitdo commented Dec 15, 2017

You're welcome!

Apple's Accelerate Framework contains a highly tuned blas implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants