Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release 2015-04 follow-up #73

Closed
32 tasks done
stonebig opened this issue Mar 28, 2015 · 17 comments
Closed
32 tasks done

release 2015-04 follow-up #73

stonebig opened this issue Mar 28, 2015 · 17 comments

Comments

@stonebig
Copy link
Contributor

WinPython 2015-04 (in 2015-05-12: IPython-3.1+, Qt5 minimal , mingwpy, Theano, Flask, ...)

features:

Infrastructure:

Other consideration:

  • WEB Gui helpers: Vispy 0.4 and Qt 5.5 are hoped for next version,
  • Deep Learning: theano is included (but not the big graphic card),
  • Qt4 -> Qt5 move: PyQtGraph is added as preparatory step to remove un-maintained PyQwt.
MD5 SHA-1 Binary
aa82fa67756bd1880ee7c20df1aecb66 249320fb396023ad4182fd89bf49792af0cc3965 winpython-32bit-3.4.3.3.exe
21837dda642c6c1ae3d011bf13b383f4 da6222a79e58bd9b5759b7be926b80776d423b3e winpython-64bit-3.4.3.3.exe
04849a7f9209fb6bdb05f9cf2f2ba50a 24c3a5d80e698a013d0c2e2792885373df4caf8b winpython-32bit-2.7.9.5.exe
cf4c4d064ddafeec898ab6100203beb4 834d7659ea5495528b2209d70be900740252f23c winpython-64bit-2.7.9.5.exe
48cbc498565492a1c21d53c573187818 17eaf6da91f812294401a1f9dc16c0eb22ac2265 winpython-32bit-3.3.5.8.exe
01c8c48709b65a74682a710cfb1908f0 4b0bbdf70ece3ddbdd74b02c68d2078be6e14dff winpython-64bit-3.3.5.8.exe

History of changes for WinPython 3.4.3.3

The following changes were made to WinPython distribution since version 3.4.3.2.

Python packages

New packages:

  • Babel 1.3 (Internationalization utilities)
  • Flask 0.10.1 (A microframework based on Werkzeug, Jinja2 and good intentions)
  • Theano 0.7.0 (Optimizing compiler for evaluating mathematical expressions on CPUs and GPUs.)
  • Werkzeug 0.10.4 (The Swiss Army knife of Python web development)
  • adodbapi 2.6.0.7 (A pure Python package implementing PEP 249 DB-API using Microsoft ADO.)
  • alabaster 0.7.3 (A configurable sidebar-enabled Sphinx theme)
  • click 4.0 (A simple wrapper around optparse for powerful command line utilities.)
  • docopt 0.6.2 (Pythonic argument parser, that will make you smile)
  • itsdangerous 0.24 (Various helpers to pass trusted data to untrusted environments and back.)
  • jedi 0.8.1 (An autocompletion tool for Python that can be used for text editors)
  • pkginfo 1.2.1 (Query metadatdata from sdists / bdists / installed packages.)
  • pymongo 3.0.1 (Python driver for MongoDB http://www.mongodb.org)
  • pyqtgraph 0.9.10 (Scientific Graphics and GUI Library for Python)
  • redis 2.10.3 (Python client for Redis key-value store)
  • snowballstemmer 1.2.0 (This package provides 16 stemmer algorithms (15 + Poerter English stemmer) generated from Snowball algorithms.)
  • sphinx_rtd_theme 0.1.8 (ReadTheDocs.org theme for Sphinx, 2013 version.)
  • twine 1.5.0 (Collection of utilities for interacting with PyPI)

Upgraded packages:

  • Pillow 2.7.0 → 2.8.1 (Python Imaging Library (fork))
  • PuLP 1.5.6 → 1.5.9 (PuLP is an LP modeler written in python. PuLP can generate MPS or LP files and call GLPK, COIN CLP/CBC, CPLEX, and GUROBI to solve linear problems)
  • SQLAlchemy 0.9.9 → 1.0.4 (SQL Toolkit and Object Relational Mapper)
  • XlsxWriter 0.7.1 → 0.7.2 (A Python module for creating Excel XLSX files.)
  • certifi 14.5.14 → 2015.4.28 (Python package for providing Mozilla's CA Bundle.)
  • h5py 2.4.0 → 2.5.0 (General-purpose Python interface to HDF5 files (unlike PyTables, h5py provides direct access to the full HDF5 C library))
  • husl 4.0.1 → 4.0.2 (Human-friendly HSL (Hue-Saturation-Lightness))
  • ipython 3.0.0 → 3.1.0 (Enhanced Python shell)
  • llvmlite 0.2.2 → 0.4.0 (lightweight wrapper around basic LLVM functionality)
  • lxml 3.4.2 → 3.4.4 (Powerful and Pythonic XML processing library combining libxml2/libxslt with the ElementTree API.)
  • nose 1.3.4 → 1.3.6 (nose is a discovery-based unittest extension (e.g. NumPy test module is using nose))
  • numba 0.17.0 → 0.18.2 (compiling Python code using LLVM)
  • numexpr 2.4 → 2.4.3 (Fast evaluation of array expressions elementwise by using a vector-based virtual machine)
  • pandas 0.16.0 → 0.16.1 (Powerful data structures for data analysis, time series and statistics)
  • pg8000 1.10.1 → 1.10.2 (PostgreSQL interface library)
  • pip 6.0.8 → 6.1.1 (A tool for installing and managing Python packages)
  • pycparser 2.10 → 2.12 (C parser in Python)
  • pyodbc 3.0.7 → 3.0.9 (DB API Module for ODBC)
  • python_dateutil 2.4.0 → 2.4.2 (Powerful extensions to the standard datetime module)
  • pyzmq 14.5.0 → 14.6.0 (Lightweight and super-fast messaging based on ZeroMQ library (required for IPython Qt console))
  • requests 2.6.0 → 2.7.0 (Requests is an Apache2 Licensed HTTP library, written in Python, for human beings.)
  • scikit_image 0.11.2 → 0.11.3 (Image processing toolbox for SciPy)
  • scikit_learn 0.16.0 → 0.16.1 (A set of Python modules for machine learning and data mining)
  • setuptools 14.3.1 → 15.2 (Download, build, install, upgrade, and uninstall Python packages - easily)
  • sqlite_bro 0.8.7.4 → 0.8.8 (a graphic SQLite Client in 1 Python file)
  • sqlparse 0.1.14 → 0.1.15 (Non-validating SQL parser)
  • tables 3.1.1 → 3.2.0 (Package based on HDF5 library for managing hierarchical datasets (extremely large amounts of data))

@stonebig stonebig added this to the 2015-04 milestone Mar 28, 2015
@stonebig
Copy link
Contributor Author

build1 (2015-04-19: IPython3.1, WPPM rework, mingwpy Compiler Toolchain)

feedback whish: does Winpython Package Manager still work well for non-latin character windows PC ? (because of this pip workaround fix 850c257)

MD5 SHA-1 Binary
4570294a17cb9a15da2a7bbc5b07314e de488ee968377521e20aa6094e20f43c220fe39b winpython-64bit-3.4.3.3_build1.exe

History of changes for WinPython 3.4.3.3

The following changes were made to WinPython distribution since version 3.4.3.2.

Python packages

New packages:

  • Babel 1.3 (Internationalization utilities)
  • Flask 0.10.1 (A microframework based on Werkzeug, Jinja2 and good intentions)
  • Theano 0.7.0 (Optimizing compiler for evaluating mathematical expressions on CPUs and GPUs.)
  • Werkzeug 0.10.4 (The Swiss Army knife of Python web development)
  • alabaster 0.7.3 (A configurable sidebar-enabled Sphinx theme)
  • click 4.0 (A simple wrapper around optparse for powerful command line utilities.)
  • itsdangerous 0.24 (Various helpers to pass trusted data to untrusted environments and back.)
  • pkginfo 1.2.1 (Query metadatdata from sdists / bdists / installed packages.)
  • pymongo 3.0 ()
  • redis 2.10.3 (Python client for Redis key-value store)
  • snowballstemmer 1.2.0 (This package provides 16 stemmer algorithms (15 + Poerter English stemmer) generated from Snowball algorithms.)
  • sphinx_rtd_theme 0.1.7 (ReadTheDocs.org theme for Sphinx, 2013 version.)
  • twine 1.5.0 (Collection of utilities for interacting with PyPI)

Upgraded packages:

  • Pillow 2.7.0 → 2.8.1 (Python Imaging Library (fork))
  • PuLP 1.5.6 → 1.5.8 (PuLP is an LP modeler written in python. PuLP can generate MPS or LP files and call GLPK, COIN CLP/CBC, CPLEX, and GUROBI to solve linear problems)
  • SQLAlchemy 0.9.9 → 1.0.0 (SQL Toolkit and Object Relational Mapper)
  • XlsxWriter 0.7.1 → 0.7.2 (A Python module for creating Excel XLSX files.)
  • h5py 2.4.0 → 2.5.0 (General-purpose Python interface to HDF5 files (unlike PyTables, h5py provides direct access to the full HDF5 C library))
  • ipython 3.0.0 → 3.1.0 (Enhanced Python shell)
  • llvmlite 0.2.2 → 0.4.0 (lightweight wrapper around basic LLVM functionality)
  • lxml 3.4.2 → 3.4.3 (Powerful and Pythonic XML processing library combining libxml2/libxslt with the ElementTree API.)
  • nose 1.3.4 → 1.3.6 (nose is a discovery-based unittest extension (e.g. NumPy test module is using nose))
  • numba 0.17.0 → 0.18.2 (compiling Python code using LLVM)
  • numexpr 2.4 → 2.4.1 (Fast evaluation of array expressions elementwise by using a vector-based virtual machine)
  • python_dateutil 2.4.0 → 2.4.2 (Powerful extensions to the standard datetime module)
  • scikit_image 0.11.2 → 0.11.3 (Image processing toolbox for SciPy)
  • scikit_learn 0.16.0 → 0.16.1 (A set of Python modules for machine learning and data mining)
  • setuptools 14.3.1 → 15.1 (Download, build, install, upgrade, and uninstall Python packages - easily)
  • sqlite_bro 0.8.7.4 → 0.8.8 (a graphic SQLite Client in 1 Python file)

@hiccup7
Copy link

hiccup7 commented Apr 19, 2015

When you built OpenBLAS v0.2.14, did you use MAX_STACK_ALLOC=2048? See JuliaLang/julia#10780

@stonebig
Copy link
Contributor Author

Hi @hiccup7,

Julia uses a different OpenBLAS dll, because of a numpy compatibility issue, so Python OpenBLAS won't speed-up your Julia code unless you shift that part back to Python.

Carl's release pdf ( https://bitbucket.org/carlkl/mingw-w64-for-python/downloads/mingwpy-2015-04-readme.pdf) points to this commit of OpenBLAS: OpenMathLib/OpenBLAS@fb02cb0

This is 5 commits after the one you are waiting for: OpenMathLib/OpenBLAS@6c3a0b5

So this build of OpenBLAS may have MAX_STACK_ALLOC=2048

(but the commit after could mean troubles: OpenMathLib/OpenBLAS@a4c96ec , see OpenMathLib/OpenBLAS#543)

@hiccup7
Copy link

hiccup7 commented Apr 19, 2015

The Julia team is creating the libopenblas.dll I need for Julia. So my needs are well taken care of.

I brought up the issue of MAX_STACK_ALLOC=2048 because you are upgrading OpenBLAS to v0.2.14, and perhaps neither you nor @carlkl were aware of the need for this build option with that specific release version. I am trying to prevent a bug in theano lasagne in the next WinPython release.

@carlkl
Copy link

carlkl commented Apr 19, 2015

The latest openblas build for mingwpy:

https://bitbucket.org/carlkl/mingw-w64-for-python/downloads/openblas-fb02cb0_amd64.7z
https://bitbucket.org/carlkl/mingw-w64-for-python/downloads/openblas-fb02cb0-win32.7z

have been build with commit 37b9033 (wrong name of the archive files) and MAX_STACK_ALLOC=2048

the DLLs have been renamed to libopenblaspy.dll to avoid a name clash with julialang's openblas build

@stonebig
Copy link
Contributor Author

So I just need to replace the previous ones and I'm good ?

mingw32\bin\libopenblaspy.dll
mingw32\x86_64-w64-mingw32\lib\libopenblaspy.dll.a
mingw32\x86_64-w64-mingw32\lib\libopenblaspy.a
mingw32\x86_64-w64-mingw32\include\openblas\f77blas.h (and friends)

@carlkl
Copy link

carlkl commented Apr 20, 2015

that's correct.

@hiccup7
Copy link

hiccup7 commented Apr 20, 2015

@carlkl , I glad to know that you already knew about the MAX_STACK_ALLOC=2048 detail. Thanks for your big contribution to the Python community!

@stonebig
Copy link
Contributor Author

build2 (2015-04-23: mingwpy + theano ok , lasagne)

feedback whish:

MD5 SHA-1 Binary
0587704f33923f8df10b76eca6cd922c e80703d0b711551ad6b02b6009be8edf677dda20 winpython-64bit-3.4.3.3_build2.exe

History of changes for WinPython 3.4.3.3

The following changes were made to WinPython distribution since version 3.4.3.2.

Python packages

New packages:

  • Babel 1.3 (Internationalization utilities)
  • Flask 0.10.1 (A microframework based on Werkzeug, Jinja2 and good intentions)
  • Lasagne 0.0.1 ()
  • Theano 0.7.0 (Optimizing compiler for evaluating mathematical expressions on CPUs and GPUs.)
  • Werkzeug 0.10.4 (The Swiss Army knife of Python web development)
  • adodbapi 2.6.0.7 (A pure Python package implementing PEP 249 DB-API using Microsoft ADO.)
  • alabaster 0.7.3 (A configurable sidebar-enabled Sphinx theme)
  • click 4.0 (A simple wrapper around optparse for powerful command line utilities.)
  • itsdangerous 0.24 (Various helpers to pass trusted data to untrusted environments and back.)
  • pkginfo 1.2.1 (Query metadatdata from sdists / bdists / installed packages.)
  • pymongo 3.0.1 (Python driver for MongoDB http://www.mongodb.org)
  • redis 2.10.3 (Python client for Redis key-value store)
  • snowballstemmer 1.2.0 (This package provides 16 stemmer algorithms (15 + Poerter English stemmer) generated from Snowball algorithms.)
  • sphinx_rtd_theme 0.1.7 (ReadTheDocs.org theme for Sphinx, 2013 version.)
  • twine 1.5.0 (Collection of utilities for interacting with PyPI)

Upgraded packages:

  • Pillow 2.7.0 → 2.8.1 (Python Imaging Library (fork))
  • PuLP 1.5.6 → 1.5.9 (PuLP is an LP modeler written in python. PuLP can generate MPS or LP files and call GLPK, COIN CLP/CBC, CPLEX, and GUROBI to solve linear problems)
  • SQLAlchemy 0.9.9 → 1.0.0 (SQL Toolkit and Object Relational Mapper)
  • XlsxWriter 0.7.1 → 0.7.2 (A Python module for creating Excel XLSX files.)
  • h5py 2.4.0 → 2.5.0 (General-purpose Python interface to HDF5 files (unlike PyTables, h5py provides direct access to the full HDF5 C library))
  • husl 4.0.1 → 4.0.2 (Human-friendly HSL (Hue-Saturation-Lightness))
  • ipython 3.0.0 → 3.1.0 (Enhanced Python shell)
  • llvmlite 0.2.2 → 0.4.0 (lightweight wrapper around basic LLVM functionality)
  • lxml 3.4.2 → 3.4.3 (Powerful and Pythonic XML processing library combining libxml2/libxslt with the ElementTree API.)
  • nose 1.3.4 → 1.3.6 (nose is a discovery-based unittest extension (e.g. NumPy test module is using nose))
  • numba 0.17.0 → 0.18.2 (compiling Python code using LLVM)
  • numexpr 2.4 → 2.4.1 (Fast evaluation of array expressions elementwise by using a vector-based virtual machine)
  • pg8000 1.10.1 → 1.10.2 (PostgreSQL interface library)
  • pip 6.0.8 → 6.1.1 (A tool for installing and managing Python packages)
  • python_dateutil 2.4.0 → 2.4.2 (Powerful extensions to the standard datetime module)
  • scikit_image 0.11.2 → 0.11.3 (Image processing toolbox for SciPy)
  • scikit_learn 0.16.0 → 0.16.1 (A set of Python modules for machine learning and data mining)
  • setuptools 14.3.1 → 15.1 (Download, build, install, upgrade, and uninstall Python packages - easily)
  • sqlite_bro 0.8.7.4 → 0.8.8 (a graphic SQLite Client in 1 Python file)
  • sqlparse 0.1.14 → 0.1.15 (Non-validating SQL parser)

@carlkl
Copy link

carlkl commented Apr 24, 2015

@stonebig, build2: in settings there is no .theanorc file. You should include it with the following input:

[blas]
ldflags = -lopenblaspy

With that settings theano links blas/lapack dependant binary extensions automagically to <..>\tools\mingw32\bin\libopenblaspy.dll

You can check it with

python check_blas.py   # run this script located in <...>\Lib\site-packages\theano\misc

@stonebig
Copy link
Contributor Author

ok, I'll do.

@stonebig
Copy link
Contributor Author

on the MNIST dataset lasagne test, it doesn't seem to make a difference with previous NOparameters #82 (comment) , on my i3-350m limited and old cpu.

Starting training...
Epoch 1 of 500 took 21.039s
  training loss:        1.382395
  validation loss:      0.479808
  validation accuracy:      87.24 %%
Epoch 2 of 500 took 20.673s
  training loss:        0.595778
  validation loss:      0.335485
  validation accuracy:      90.45 %%
Epoch 3 of 500 took 20.156s
  training loss:        0.469481
  validation loss:      0.282895
  validation accuracy:      91.88 %%

@stonebig
Copy link
Contributor Author

python ..\Lib\site-packages\theano\misc\check_blas.py

        Some results that you can compare against. They were 10 executions
        of gemm in float64 with matrices of shape 2000x2000 (M=N=K=2000).
        All memory layout was in C order.

        CPU tested: Xeon E5345(2.33Ghz, 8M L2 cache, 1333Mhz FSB),
                    Xeon E5430(2.66Ghz, 12M L2 cache, 1333Mhz FSB),
                    Xeon E5450(3Ghz, 12M L2 cache, 1333Mhz FSB),
                    Xeon X5560(2.8Ghz, 12M L2 cache, hyper-threads?)
                    Core 2 E8500, Core i7 930(2.8Ghz, hyper-threads enabled),
                    Core i7 950(3.07GHz, hyper-threads enabled)
                    Xeon X5550(2.67GHz, 8M l2 cache?, hyper-threads enabled)


        Libraries tested:
            * numpy with ATLAS from distribution (FC9) package (1 thread)
            * manually compiled numpy and ATLAS with 2 threads
            * goto 1.26 with 1, 2, 4 and 8 threads
            * goto2 1.13 compiled with multiple threads enabled

                          Xeon   Xeon   Xeon  Core2 i7    i7     Xeon   Xeon
        lib/nb threads    E5345  E5430  E5450 E8500 930   950    X5560  X5550

        numpy 1.3.0 blas                                                775.92s
        numpy_FC9_atlas/1 39.2s  35.0s  30.7s 29.6s 21.5s 19.60s
        goto/1            18.7s  16.1s  14.2s 13.7s 16.1s 14.67s
        numpy_MAN_atlas/2 12.0s  11.6s  10.2s  9.2s  9.0s
        goto/2             9.5s   8.1s   7.1s  7.3s  8.1s  7.4s
        goto/4             4.9s   4.4s   3.7s  -     4.1s  3.8s
        goto/8             2.7s   2.4s   2.0s  -     4.1s  3.8s
        openblas/1                                        14.04s
        openblas/2                                         7.16s
        openblas/4                                         3.71s
        openblas/8                                         3.70s
        mkl 11.0.083/1            7.97s
        mkl 10.2.2.025/1                                         13.7s
        mkl 10.2.2.025/2                                          7.6s
        mkl 10.2.2.025/4                                          4.0s
        mkl 10.2.2.025/8                                          2.0s
        goto2 1.13/1                                                     14.37s
        goto2 1.13/2                                                      7.26s
        goto2 1.13/4                                                      3.70s
        goto2 1.13/8                                                      1.94s
        goto2 1.13/16                                                     3.16s

        Test time in float32

        cuda version      6.5    6.0    5.5    5.0    4.2    4.1    4.0    3.2
  3.0   # note
        gpu
        K6000/NOECC       0.06s         0.06s
        K40                             0.07s
        K20m/ECC          0.08s 0.08s          0.07s
        K20/NOECC                              0.07s
        M2090                           0.19s
        C2075                                         0.25s
        M2075                                  0.25s
        M2070                                  0.25s         0.27s         0.32s

        M2070-Q                                0.48s         0.27s         0.32s

        M2050(Amazon)                          0.25s
        C1060                                                              0.46s

        K600                            1.04s

        GTX Titan Black                 0.05s
        GTX Titan(D15U-50)              0.06s  0.06s  don't work
        GTX 780                         0.06s
        GTX 980           0.06s
        GTX 970           0.08s
        GTX 680                         0.11s  0.12s  0.154s               0.218
s
        GRID K520         0.14s
        GTX 580                         0.16s  0.16s  0.164s               0.203
s
        GTX 480                         0.19s  0.19s  0.192s               0.237
s 0.27s
        GTX 750 Ti        0.20s
        GTX 470                         0.23s  0.23s  0.238s               0.297
s 0.34s
        GTX 660                         0.18s  0.20s  0.23s
        GTX 560                                       0.30s
        GTX 650 Ti                             0.27s
        GTX 765M                 0.27s
        GTX 460                                0.37s                0.45s
        GTX 285                         0.42s         0.452s        0.452s
  0.40s # cuda 3.0 seems faster? driver version?
        750M                                   0.49s
        GT 610            2.38s
        GTX 550 Ti                                                  0.57s
        GT 520                                        2.68s                3.06s

        520M                                   2.44s                       3.19s
        # with bumblebee on Ubuntu 12.04
        GT 220                                                             3.80s

        GT 210                                                      6.35s
        8500 GT
  10.68s

Some Theano flags:
    blas.ldflags= -lopenblaspy
    compiledir= C:\Users\famille\AppData\Local\Theano\compiledir_Windows-7-6.1.7
601-SP1-Intel64_Family_6_Model_37_Stepping_2_GenuineIntel-3.4.3-64
    floatX= float64
    device= cpu
Some OS information:
    sys.platform= win32
    sys.version= 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:44:40) [MSC v.1600
64 bit (AMD64)]
    sys.prefix= D:\WinPython\basedir34\build\winpython-3.4.3.amd64\python-3.4.3.
amd64
Some environment variables:
    MKL_NUM_THREADS= None
    OMP_NUM_THREADS= None
    GOTO_NUM_THREADS= None

Numpy config: (used when the Theano flag "blas.ldflags" is empty)
blas_mkl_info:
    libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_
intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
    define_macros = [('SCIPY_MKL_H', None)]
    library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
    include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
blas_opt_info:
    libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_
intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
    define_macros = [('SCIPY_MKL_H', None)]
    library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
    include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
lapack_opt_info:
    libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_
intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd', 'mkl_lapack95_lp64', 'mk
l_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md',
'libifportmd']
    define_macros = [('SCIPY_MKL_H', None)]
    library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
    include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
lapack_mkl_info:
    libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_
intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd', 'mkl_lapack95_lp64', 'mk
l_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md',
'libifportmd']
    define_macros = [('SCIPY_MKL_H', None)]
    library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
    include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
openblas_lapack_info:
  NOT AVAILABLE
mkl_info:
    libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_
intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
    define_macros = [('SCIPY_MKL_H', None)]
    library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
    include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
Numpy dot module: numpy.core._dotblas
Numpy location: D:\WinPython\basedir34\build\winpython-3.4.3.amd64\python-3.4.3.
amd64\lib\site-packages\numpy\__init__.py
Numpy version: 1.9.2

We executed 10 calls to gemm with a and b matrices of shapes (2000, 2000) and (2
000, 2000).

Total execution time: 12.95s on CPU (with direct Theano binding to blas).

Try to run this script a few times. Experience shows that the first time is not
as fast as followings calls. The difference is not big, but consistent.

@stonebig
Copy link
Contributor Author

without .theanorc:

python ..\Lib\site-packages\theano\misc\check_blas.py

        Some results that you can compare against. They were 10 executions
        of gemm in float64 with matrices of shape 2000x2000 (M=N=K=2000).
        All memory layout was in C order.

        CPU tested: Xeon E5345(2.33Ghz, 8M L2 cache, 1333Mhz FSB),
                    Xeon E5430(2.66Ghz, 12M L2 cache, 1333Mhz FSB),
                    Xeon E5450(3Ghz, 12M L2 cache, 1333Mhz FSB),
                    Xeon X5560(2.8Ghz, 12M L2 cache, hyper-threads?)
                    Core 2 E8500, Core i7 930(2.8Ghz, hyper-threads enabled),
                    Core i7 950(3.07GHz, hyper-threads enabled)
                    Xeon X5550(2.67GHz, 8M l2 cache?, hyper-threads enabled)


        Libraries tested:
            * numpy with ATLAS from distribution (FC9) package (1 thread)
            * manually compiled numpy and ATLAS with 2 threads
            * goto 1.26 with 1, 2, 4 and 8 threads
            * goto2 1.13 compiled with multiple threads enabled

                          Xeon   Xeon   Xeon  Core2 i7    i7     Xeon   Xeon
        lib/nb threads    E5345  E5430  E5450 E8500 930   950    X5560  X5550

        numpy 1.3.0 blas                                                775.92s
        numpy_FC9_atlas/1 39.2s  35.0s  30.7s 29.6s 21.5s 19.60s
        goto/1            18.7s  16.1s  14.2s 13.7s 16.1s 14.67s
        numpy_MAN_atlas/2 12.0s  11.6s  10.2s  9.2s  9.0s
        goto/2             9.5s   8.1s   7.1s  7.3s  8.1s  7.4s
        goto/4             4.9s   4.4s   3.7s  -     4.1s  3.8s
        goto/8             2.7s   2.4s   2.0s  -     4.1s  3.8s
        openblas/1                                        14.04s
        openblas/2                                         7.16s
        openblas/4                                         3.71s
        openblas/8                                         3.70s
        mkl 11.0.083/1            7.97s
        mkl 10.2.2.025/1                                         13.7s
        mkl 10.2.2.025/2                                          7.6s
        mkl 10.2.2.025/4                                          4.0s
        mkl 10.2.2.025/8                                          2.0s
        goto2 1.13/1                                                     14.37s
        goto2 1.13/2                                                      7.26s
        goto2 1.13/4                                                      3.70s
        goto2 1.13/8                                                      1.94s
        goto2 1.13/16                                                     3.16s

        Test time in float32

        cuda version      6.5    6.0    5.5    5.0    4.2    4.1    4.0    3.2
  3.0   # note
        gpu
        K6000/NOECC       0.06s         0.06s
        K40                             0.07s
        K20m/ECC          0.08s 0.08s          0.07s
        K20/NOECC                              0.07s
        M2090                           0.19s
        C2075                                         0.25s
        M2075                                  0.25s
        M2070                                  0.25s         0.27s         0.32s

        M2070-Q                                0.48s         0.27s         0.32s

        M2050(Amazon)                          0.25s
        C1060                                                              0.46s

        K600                            1.04s

        GTX Titan Black                 0.05s
        GTX Titan(D15U-50)              0.06s  0.06s  don't work
        GTX 780                         0.06s
        GTX 980           0.06s
        GTX 970           0.08s
        GTX 680                         0.11s  0.12s  0.154s               0.218
s
        GRID K520         0.14s
        GTX 580                         0.16s  0.16s  0.164s               0.203
s
        GTX 480                         0.19s  0.19s  0.192s               0.237
s 0.27s
        GTX 750 Ti        0.20s
        GTX 470                         0.23s  0.23s  0.238s               0.297
s 0.34s
        GTX 660                         0.18s  0.20s  0.23s
        GTX 560                                       0.30s
        GTX 650 Ti                             0.27s
        GTX 765M                 0.27s
        GTX 460                                0.37s                0.45s
        GTX 285                         0.42s         0.452s        0.452s
  0.40s # cuda 3.0 seems faster? driver version?
        750M                                   0.49s
        GT 610            2.38s
        GTX 550 Ti                                                  0.57s
        GT 520                                        2.68s                3.06s

        520M                                   2.44s                       3.19s
        # with bumblebee on Ubuntu 12.04
        GT 220                                                             3.80s

        GT 210                                                      6.35s
        8500 GT
  10.68s

Some Theano flags:
    blas.ldflags=
    compiledir= C:\Users\famille\AppData\Local\Theano\compiledir_Windows-7-6.1.7
601-SP1-Intel64_Family_6_Model_37_Stepping_2_GenuineIntel-3.4.3-64
    floatX= float64
    device= cpu
Some OS information:
    sys.platform= win32
    sys.version= 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:44:40) [MSC v.1600
64 bit (AMD64)]
    sys.prefix= D:\WinPython\basedir34\build\winpython-3.4.3.amd64\python-3.4.3.
amd64
Some environment variables:
    MKL_NUM_THREADS= None
    OMP_NUM_THREADS= None
    GOTO_NUM_THREADS= None

Numpy config: (used when the Theano flag "blas.ldflags" is empty)
lapack_mkl_info:
    library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
    libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_
intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd', 'mkl_lapack95_lp64', 'mk
l_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md',
'libifportmd']
openblas_lapack_info:
  NOT AVAILABLE
blas_opt_info:
    library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
    libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_
intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
lapack_opt_info:
    library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
    libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_
intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd', 'mkl_lapack95_lp64', 'mk
l_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md',
'libifportmd']
mkl_info:
    library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
    libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_
intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
blas_mkl_info:
    library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
    libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_
intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
Numpy dot module: numpy.core._dotblas
Numpy location: D:\WinPython\basedir34\build\winpython-3.4.3.amd64\python-3.4.3.
amd64\lib\site-packages\numpy\__init__.py
Numpy version: 1.9.2

We executed 10 calls to gemm with a and b matrices of shapes (2000, 2000) and (2
000, 2000).

Total execution time: 12.03s on CPU (without direct Theano binding to blas but w
ith numpy/scipy binding to blas).

Try to run this script a few times. Experience shows that the first time is not
as fast as followings calls. The difference is not big, but consistent.````

@carlkl
Copy link

carlkl commented Apr 30, 2015

Theano uses scipy's gemv if no blas library is found. Linking to an external Blas library seems to be necessary only if scipy isn't installed. In the context of WinPython and Theano you may exclude OpenBLAS again, but to be sure ask the theano ML.

@stonebig
Copy link
Contributor Author

ok. I'll remove the .theanorc file from next build

@stonebig
Copy link
Contributor Author

stonebig commented May 7, 2015

build3 (2015-05-07: 32bit with latest updates, + jedi + seaborn demo - lasagne)

feedback whish:

  • issues (if you find some),
  • jedi in spyder.
MD5 SHA-1 Binary
4dca5dabec277f7fcf18075d2275b2bb 4c5c1ef9b07e3c2f169190780a313f2e3a21afe9 winpython-32bit-3.4.3.3_build3.exe

History of changes for WinPython 3.4.3.3

The following changes were made to WinPython distribution since version 3.4.3.2.

Python packages

New packages:

  • Babel 1.3 (Internationalization utilities)
  • Flask 0.10.1 (A microframework based on Werkzeug, Jinja2 and good intentions)
  • Theano 0.7.0 (Optimizing compiler for evaluating mathematical expressions on CPUs and GPUs.)
  • Werkzeug 0.10.4 (The Swiss Army knife of Python web development)
  • adodbapi 2.6.0.7 (A pure Python package implementing PEP 249 DB-API using Microsoft ADO.)
  • alabaster 0.7.3 (A configurable sidebar-enabled Sphinx theme)
  • click 4.0 (A simple wrapper around optparse for powerful command line utilities.)
  • docopt 0.6.2 (Pythonic argument parser, that will make you smile)
  • itsdangerous 0.24 (Various helpers to pass trusted data to untrusted environments and back.)
  • jedi 0.8.1 (An autocompletion tool for Python that can be used for text editors)
  • pkginfo 1.2.1 (Query metadatdata from sdists / bdists / installed packages.)
  • pymongo 3.0.1 (Python driver for MongoDB http://www.mongodb.org)
  • redis 2.10.3 (Python client for Redis key-value store)
  • snowballstemmer 1.2.0 (This package provides 16 stemmer algorithms (15 + Poerter English stemmer) generated from Snowball algorithms.)
  • sphinx_rtd_theme 0.1.8 (ReadTheDocs.org theme for Sphinx, 2013 version.)
  • twine 1.5.0 (Collection of utilities for interacting with PyPI)

Upgraded packages:

  • Pillow 2.7.0 → 2.8.1 (Python Imaging Library (fork))
  • PuLP 1.5.6 → 1.5.9 (PuLP is an LP modeler written in python. PuLP can generate MPS or LP files and call GLPK, COIN CLP/CBC, CPLEX, and GUROBI to solve linear problems)
  • SQLAlchemy 0.9.9 → 1.0.3 (SQL Toolkit and Object Relational Mapper)
  • XlsxWriter 0.7.1 → 0.7.2 (A Python module for creating Excel XLSX files.)
  • certifi 14.5.14 → 2015.4.28 (Python package for providing Mozilla's CA Bundle.)
  • h5py 2.4.0 → 2.5.0 (General-purpose Python interface to HDF5 files (unlike PyTables, h5py provides direct access to the full HDF5 C library))
  • husl 4.0.1 → 4.0.2 (Human-friendly HSL (Hue-Saturation-Lightness))
  • ipython 3.0.0 → 3.1.0 (Enhanced Python shell)
  • llvmlite 0.2.2 → 0.4.0 (lightweight wrapper around basic LLVM functionality)
  • lxml 3.4.2 → 3.4.4 (Powerful and Pythonic XML processing library combining libxml2/libxslt with the ElementTree API.)
  • nose 1.3.4 → 1.3.6 (nose is a discovery-based unittest extension (e.g. NumPy test module is using nose))
  • numba 0.17.0 → 0.18.2 (compiling Python code using LLVM)
  • numexpr 2.4 → 2.4.3 (Fast evaluation of array expressions elementwise by using a vector-based virtual machine)
  • pg8000 1.10.1 → 1.10.2 (PostgreSQL interface library)
  • pip 6.0.8 → 6.1.1 (A tool for installing and managing Python packages)
  • pycparser 2.10 → 2.12 (C parser in Python)
  • pyodbc 3.0.7 → 3.0.9 (DB API Module for ODBC)
  • python_dateutil 2.4.0 → 2.4.2 (Powerful extensions to the standard datetime module)
  • pyzmq 14.5.0 → 14.6.0 (Lightweight and super-fast messaging based on ZeroMQ library (required for IPython Qt console))
  • requests 2.6.0 → 2.7.0 (Requests is an Apache2 Licensed HTTP library, written in Python, for human beings.)
  • scikit_image 0.11.2 → 0.11.3 (Image processing toolbox for SciPy)
  • scikit_learn 0.16.0 → 0.16.1 (A set of Python modules for machine learning and data mining)
  • setuptools 14.3.1 → 15.2 (Download, build, install, upgrade, and uninstall Python packages - easily)
  • sqlite_bro 0.8.7.4 → 0.8.8 (a graphic SQLite Client in 1 Python file)
  • sqlparse 0.1.14 → 0.1.15 (Non-validating SQL parser)
  • tables 3.1.1 → 3.2.0 (Package based on HDF5 library for managing hierarchical datasets (extremely large amounts of data))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants