Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
threaded blas performance on windows is terrible #284
using threaded blas on windows (32 or 64) results in a blas that performs very, very poorly. for example, the lapack tests take several hours each to run.
for example, the julia blas performance test has the following timing results:
This is on the julia.mit.edu machine (Intel(R) Xeon(R) CPU E7- 8850 @ 2.00GHz, 80 cores). Hyper-threading is disabled. Note that I have only done the tests in wine and VirtualBox/Win7, but I got the same performance on both.
My suspicion is that BLAS is running just fine, but is getting stuck at some checkpoints for an exceptionally long time, resulting in certain tests performing very poorly, but other tests running just fine. If the lapack test works fine for you, then perhaps it is a generic problem with emulators (I didn't have ready access to a windows box before for doing broader testing).
rerunning the test in the primary os on the computer (ubuntu 64) shows roughly the same results as above:
This comment has been minimized.
This comment has been minimized.Show comment Hide comment
I was able to instead use my mobile dual core i7 (plus hyper-threading) running Windows 7, in VMware and natively, and the vast performance differential disappeared.
While the single-threaded version was still faster (by about a factor of 2.5), repeating this test on linux gave identical performance. My suspicion is that memory allocation growth (for spawning threads) is implemented poorly in both virtualbox and wine, and thus this "bug" had nothing to do with openblas.
thanks for looking into this. sorry for wasting your time.