Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in sgesdd under Windows with the Barcelona architecture #603

Closed
ogrisel opened this issue Jul 11, 2015 · 41 comments
Closed

Segfault in sgesdd under Windows with the Barcelona architecture #603

ogrisel opened this issue Jul 11, 2015 · 41 comments

Comments

@ogrisel
Copy link
Contributor

ogrisel commented Jul 11, 2015

I used openblas build for Python / Numpy with mingwpy by @carlkl. You can install it embedded in numpy with:

C:\Python3.4_x64\python -m pip install -i https://pypi.binstar.org/carlkl/simple numpy

I installed the 64 bit version of Python using the official installer from python.org.
If you want to install the compilers for debugging on that plateform you can use:

C:\Python3.4_x64\python -m pip install -i https://pypi.binstar.org/carlkl/simple mingwpy

Carl said on the numpy mailing list that the OpenBLAS version embedded in his numpy package has digest: fb02cb0 (from April).

The problem can be triggered with the following call:

C:\Python34_x64\python -c"import numpy as np; print(np.linalg.svd(np.ones((129, 129), dtype=np.float64)))"

I used this script and apparently the Barcelona architecture is detected by OpenBLAS on this VM.

With smaller data, e.g. (128, 128) it works as expected and if I force the Nehalem core type it works as well (it prints the results of the SVD):

$env:OPENBLAS_CORETYPE="Nehalem"
C:\Python34_x64\python -c"import numpy as np; print(np.linalg.svd(np.ones((129, 129), dtype=np.float64)))"

To reproduce this I used the 2GB Standard instance on rackspace cloud. I can give you access to such a VM in private message if that helps.

I also tried to reproduce this issue by building OpenBLAS and numpy on the same instance type under Ubuntu 15.04 instead and I cannot reproduce the crash under Linux although I checked that the Barcelona core is detected there as well.

@jeromerobert
Copy link
Contributor

This is probably because of 6c3a0b5. It was fixed here a4c96ec and here ab567d8 so you should just upgrade to the current develop branch.

@carlkl
Copy link

carlkl commented Jul 12, 2015

I didn't use the latest develop branch for x86-64 branch due another problem with numpy and Haswell I have to track down. In the meantime I recreate numpy with a more recent OpenBLAS revision.

@ogrisel
Copy link
Contributor Author

ogrisel commented Jul 13, 2015

@carlkl let me known when you have a new build of openblas with mingwpy / numpy ready so that I can update this issue accordingly.

@carlkl
Copy link

carlkl commented Jul 13, 2015

@ogrisel, I have a temporary upload on https://bitbucket.org/carlkl/mingw-w64-for-python/downloads/openblas-3f1b576_2015-07-13_amd64.7z. This is not a debug build so far, but the latest xiany develop trunk. Compiled without HASWELL kernel (NO_AVX2 flag) due to problems I encountered.
I added 3 libopenblaspy.dll variants with linkage against msvcrt, msvcr90 and mscvr100. Just use it as a drop-in replacement for numyp/core/libopenblaspy.dll.

@ogrisel
Copy link
Contributor Author

ogrisel commented Jul 13, 2015

I tried replacing the libopenblaspy.dll from the site-packages/numpy/core folder installed from the binstar repo with the one from the bin folder of the openblas-3f1b576_2015-07-13_amd64.7z archive and I still get the segfault when calling SVD on (129, 129) shaped data (while it still works on (128, 128) shaped data).

@carlkl
Copy link

carlkl commented Jul 13, 2015

I upload the debug build to bitbucket as well after the building.
For the next numpy/scipy build I decided to build OpenBLAS again without
BARCELONA and HASWELL kernels

2015-07-13 14:58 GMT+02:00 Olivier Grisel notifications@github.com:

I tried replacing the libopenblaspy.dll from the site-packages/numpy/core
folder installed from the binstar repo with the one from the bin folder
of the openblas-3f1b576_2015-07-13_amd64.7z archive and I still get the
segfault when calling SVD on (129, 129) shaped data (while it still works
on (128, 128) shaped data).


Reply to this email directly or view it on GitHub
#603 (comment).

@ogrisel
Copy link
Contributor Author

ogrisel commented Jul 13, 2015

I upload the debug build to bitbucket as well after the building.

Alright let me know when you have new stuff for me to test.

Alternatively I could try to write a pure C reproduction case but I am not very familiar with the C lapacke API and windows-based development environments so it's not a trivial task for me. @jeromerobert do you think that necessary for you or other OpenBLAS developer to reproduce and understand the cause of the crash?

For the next numpy/scipy build I decided to build OpenBLAS again without BARCELONA and HASWELL kernels

Sounds reasonable :)

@carlkl
Copy link

carlkl commented Jul 13, 2015

See https://bitbucket.org/carlkl/mingw-w64-for-python/downloads/openblas-3f1b576_2015-07-13_amd64_NO_AVX2_debug.7z (Barcelona kernels are included). Please consider using_backtrace.txt inside the archive or use gdb if available.

@ogrisel
Copy link
Contributor Author

ogrisel commented Jul 13, 2015

I did that (using backtrace.dll), here is the result:

0x3a32830 : C:\Python34_x64\lib\site-packages\numpy\core\libopenblaspy.dll : dcabs1_ 
0x2f0d30f : C:\Python34_x64\lib\site-packages\numpy\core\libopenblaspy.dll : dnrm2 
0x45c16a8 : C:\Python34_x64\lib\site-packages\numpy\core\libopenblaspy.dll : DLARFG 
0x459dc44 : C:\Python34_x64\lib\site-packages\numpy\core\libopenblaspy.dll : dlabrd 
0x4560397 : C:\Python34_x64\lib\site-packages\numpy\core\libopenblaspy.dll : dgebrd_ 
0x4578d90 : C:\Python34_x64\lib\site-packages\numpy\core\libopenblaspy.dll : dgesdd_ 
Failed to init bfd from (C:\Python34_x64\lib\site-packages\numpy\linalg\_umath_linalg.pyd): 1 1 0
0x666090d1 : C:\Python34_x64\lib\site-packages\numpy\linalg\_umath_linalg.pyd : [unknown file] 
Failed to init bfd from (C:\Python34_x64\lib\site-packages\numpy\core\umath.pyd): 1 1 0
0x6c045683 : C:\Python34_x64\lib\site-packages\numpy\core\umath.pyd : PyInit_umath 
Failed to init bfd from (C:\Python34_x64\lib\site-packages\numpy\core\umath.pyd): 1 1 0
0x6c045c0d : C:\Python34_x64\lib\site-packages\numpy\core\umath.pyd : PyInit_umath 
Failed to init bfd from (C:\Python34_x64\lib\site-packages\numpy\core\umath.pyd): 1 1 0
0x6c047146 : C:\Python34_x64\lib\site-packages\numpy\core\umath.pyd : PyInit_umath 
Failed to init bfd from (C:\Windows\SYSTEM32\python34.dll): 1 1 0
0x6ceb86a1 : C:\Windows\SYSTEM32\python34.dll : PyOS_URandom 
Failed to init bfd from (C:\Windows\SYSTEM32\python34.dll): 1 1 0
0x6cf73124 : C:\Windows\SYSTEM32\python34.dll : PyOS_URandom 
Failed to init bfd from (C:\Windows\SYSTEM32\python34.dll): 1 1 0
0x6cf738ce : C:\Windows\SYSTEM32\python34.dll : PyOS_URandom 
Failed to init bfd from (C:\Windows\SYSTEM32\python34.dll): 1 1 0
0x6cf75c89 : C:\Windows\SYSTEM32\python34.dll : PyOS_URandom 
Failed to init bfd from (C:\Windows\SYSTEM32\python34.dll): 1 1 0
0x6cf7770c : C:\Windows\SYSTEM32\python34.dll : PyOS_URandom 
Failed to init bfd from (C:\Windows\SYSTEM32\python34.dll): 1 1 0
0x6cf779fd : C:\Windows\SYSTEM32\python34.dll : PyOS_URandom 
Failed to init bfd from (C:\Windows\SYSTEM32\python34.dll): 1 1 0
0x6cf738c1 : C:\Windows\SYSTEM32\python34.dll : PyOS_URandom 
Failed to init bfd from (C:\Windows\SYSTEM32\python34.dll): 1 1 0
0x6cf75c89 : C:\Windows\SYSTEM32\python34.dll : PyOS_URandom 
Failed to init bfd from (C:\Windows\SYSTEM32\python34.dll): 1 1 0
0x6cf7770c : C:\Windows\SYSTEM32\python34.dll : PyOS_URandom 
Failed to init bfd from (C:\Windows\SYSTEM32\python34.dll): 1 1 0
0x6cf77aae : C:\Windows\SYSTEM32\python34.dll : PyOS_URandom 
Failed to init bfd from (C:\Windows\SYSTEM32\python34.dll): 1 1 0
0x6cfb14e3 : C:\Windows\SYSTEM32\python34.dll : PyOS_URandom 
Failed to init bfd from (C:\Windows\SYSTEM32\python34.dll): 1 1 0
0x6cfb2a9c : C:\Windows\SYSTEM32\python34.dll : PyOS_URandom 
Failed to init bfd from (C:\Windows\SYSTEM32\python34.dll): 1 1 0
0x6cfb3ef1 : C:\Windows\SYSTEM32\python34.dll : PyOS_URandom 
Failed to init bfd from (C:\Windows\SYSTEM32\python34.dll): 1 1 0
0x6ce5c855 : C:\Windows\SYSTEM32\python34.dll : PyOS_URandom 
Failed to init bfd from (C:\Windows\SYSTEM32\python34.dll): 1 1 0
0x6ce5d083 : C:\Windows\SYSTEM32\python34.dll : PyOS_URandom 
Failed to init bfd from (C:\Python34_x64\python.exe): 1 1 0
0x1c2c11ae : C:\Python34_x64\python.exe : [unknown file] 
Failed to init bfd from (C:\Windows\system32\KERNEL32.DLL): 1 1 0
0xa12b13d2 : C:\Windows\system32\KERNEL32.DLL : BaseThreadInitThunk 
Failed to init bfd from (C:\Windows\SYSTEM32\ntdll.dll): 1 1 0
0xa3aee954 : C:\Windows\SYSTEM32\ntdll.dll : RtlUserThreadStart 
Mon Jul 13 14:13:00 2015
---------------

backtrace.dll is neat. What is the license? Is it compatible with the numpy license? I think it would be good to include it by default in you numpy builds so that users can report informative issues on github. Also does building with the debug symbols causes a performance penalty? If not I think you should keep them buy default to make debug and profiling easier.

@carlkl
Copy link

carlkl commented Jul 13, 2015

I took backtrace from http://code.google.com/p/backtrace-mingw/, and later on from http://dukeworld.duke4.net/eduke32/synthesis (64bit support). It has BSD licence and is forseen as addendum to mingwpy (as is OpenBLAS and other suppl. libraries).
Another tool to emit stacktraces is Dr:Mingw: https://github.com/jrfonseca/drmingw and it's ExcHndl library. I wonder why the stacktraces has this Failed to init bfd from lines. I have to take a look. Usually you got linenumbers.

@carlkl
Copy link

carlkl commented Jul 13, 2015

@ogrisel, the Barcelona kernels are about one year old. Can you test against target OPTERON_SSE3, as this is identical to BARCELONA but without barcelona specific kernels.

@xianyi
Copy link
Collaborator

xianyi commented Jul 13, 2015

@ogrisel , is it dgesdd function? In your topic, it's sgesdd, but dtype=np.float64 in your code.

I think I can write the C code to test it on windows.

@ogrisel
Copy link
Contributor Author

ogrisel commented Jul 13, 2015

Indeed @xianyi this was double precision data. Although I get the same segfault and backtrace by using np.ones(shape=(129, 129), dtype=np.float32).

@ogrisel
Copy link
Contributor Author

ogrisel commented Jul 13, 2015

@carlkl OPTERON_SSE3 does not seem to be a valid target:

$ OPENBLAS_CORETYPE='OPTERON_SSE3' /c/Python34_x64/python openblas_coretype.py
Prescott

(I am using the MSYS2 bash console to set the environment variable).

@ogrisel
Copy link
Contributor Author

ogrisel commented Jul 13, 2015

I think I can write the C code to test it on windows.

@xianyi great! let me know if you want me to compile your C file and run it on the machine where I observed the segfault in the first place.

@carlkl
Copy link

carlkl commented Jul 13, 2015

@ogrisel, I recompiled backtrace: https://bitbucket.org/carlkl/mingw-w64-for-python/downloads/backtrace_eduke32.zip. This version should be able to show lineno as well.
BTW: You may use OPTERON as well. I have no idea, why OPTERON_SSE3 doesn't work.
HINT: I was wrong, due to the fact that BARCELONA itself is the fallback for Opteron CPUs without AVX. @xianyi: is there a fallback of BARCELONA? Use NEHALEM instead as a temporary workaround?

@carlkl
Copy link

carlkl commented Jul 14, 2015

@ogrisel, @xianyi: a new build of libopenblaspy.dll is (amd64 for now) available as debug and release build: https://bitbucket.org/carlkl/mingw-w64-for-python/downloads/openblas-3f1b576_2015-07-13_amd64-NO_AVX2-NO_BARCELONA_ALL.7z. I compiled with NO_AVX2 (now Haswell kernels) and I exchanged manually the fallback Barcelona target with Prescott.

@carlkl
Copy link

carlkl commented Jul 20, 2015

@ogrisel
Copy link
Contributor Author

ogrisel commented Jul 21, 2015

@carlkl I get a similar segfault with Prescott. Nehalem works for some reason I don't understand.

Here is the output of cpuid_features.exe on that host:

Vendor    = AuthenticAMD
Processor = AMD Opteron(tm) Processor 4332 HE

YES - FPU           (Floating-point Unit on-chip)
YES - VME           (Virtual Mode Extension)
YES - DE            (Debugging Extension)
YES - PSE           (Page Size Extension)
YES - TSC           (Time Stamp Counter)
YES - MSR           (Model Specific Registers)
YES - PAE           (Physical Address Extension)
YES - MCE           (Machine Check Exception)
YES - CX8           (CMPXCHG8 Instructions)
YES - APIC          (On-chip APIC hardware)
YES - SEP           (Fast System Call)
YES - MTRR          (Memory type Range Registers)
YES - PGE           (Page Global Enable)
YES - MCA           (Machine Check Architecture)
YES - CMOV          (Conditional Move Instruction)
YES - PAT           (Page Attribute Table)
NO  - PSE36         (36bit Page Size Extension
NO  - PSN           (Processor Serial Number)
YES - CLFSH         (CFLUSH Instruction)
NO  - DS            (Debug Store)
NO  - ACPI          (Thermal Monitor & Software Controlled Clock)
YES - MMX           (Multi-Media Extension)
YES - FXSR          (Fast Floating Point Save & Restore)
YES - SSE           (Streaming SIMD Extension 1)
YES - SSE2          (Streaming SIMD Extension 2)
NO  - SS            (Self Snoop)
NO  - HTT           (Hyper Threading Technology)
NO  - TM            (Thermal Monitor)
NO  - PBE           (Pend Break Enabled)
YES - SSE3          (Streaming SMD Extension 3)
NO  - MW            (Monitor Wait Instruction
NO  - CPL           (CPL-qualified Debug Store)
NO  - VMX           (Virtual Machine Extensions)
NO  - EST           (Enchanced Speed Test)
NO  - TM2           (Thermal Monitor 2)
YES - SSSE3         (Supplemental Streaming SIMD Extensions 3)
NO  - L1            (L1 Context ID)
NO  - FMA3          (Fused Multiply-Add 3-operand Form)
YES - CAE           (Compare And Exchange 16B)
YES - SSE41         (Streaming SIMD Extensions 4.1)
YES - SSE42         (Streaming SIMD Extensions 4.2)
YES - POPCNT        (Advanced Bit Manipulation - Bit Population Count Instruction)
YES - AES           (Advanced Encryption Standard)
NO  - AVX           (Advanced Vector Extensions)
NO  - RDRAND        (Random Number Generator)
NO  - AVX2          (Advanced Vector Extensions 2)
NO  - BMI1          (Bit Manipulations Instruction Set 1)
NO  - BMI2          (Bit Manipulations Instruction Set 2)
NO  - ADX           (Multi-Precision Add-Carry Instruction Extensions)
NO  - AVX512F       (512-bit extensions to Advanced Vector Extensions Foundation)
NO  - AVX512PFI     (512-bit extensions to Advanced Vector Extensions Prefetch Instructions)
NO  - AVX512ERI     (512-bit extensions to Advanced Vector Extensions Exponential and Reciprocal Instructions)
NO  - AVX512CDI     (512-bit extensions to Advanced Vector Extensions Conflict Detection Instructions)
NO  - SHA           (Secure Hash Algorithm)
YES - X64           (64-bit Extensions/Long mode)
YES - LZCNT         (Advanced Bit Manipulation - Leading Zero Bit Count Instruction)
YES - SSE4A         (Streaming SIMD Extensions 4a)
YES - FMA4          (Fused Multiply-Add 4-operand Form)
YES - XOP           (Extended Operations)
YES - TBM           (Trailing Bit Manipulation Instruction)
NO  - LWP           (Light Weight Profiling Support)
NO  - WDT           (Watchdog Timer Support)
NO  - IBS           (Instruction Based Sampling)
YES - 3DNOWPREFETCH (PREFETCH and PREFETCHW instruction support)
YES - MISALIGNSSE   (Misaligned SSE mode)
NO  - SVM           (Secure Virtual Machine)
YES - LAHFSAHF      (LAHF and SAHF instruction support in 64-bit mode)

@ogrisel
Copy link
Contributor Author

ogrisel commented Jul 21, 2015

From: http://www.cpu-world.com/CPUs/Bulldozer/AMD-Opteron%204332%20HE%20-%20OS4332OFU6KHK.html

Microarchitecture   Piledriver
Platform    San Marino / Adelaide
Processor core  Seoul

@carlkl
Copy link

carlkl commented Jul 26, 2015

@ogrisel, I have new numpy, scipy builds available at anaconda.org:

pip install -i https://pypi.anaconda.org/carlkl/simple numpy
pip install -i https://pypi.anaconda.org/carlkl/simple scipy

I used @wernsaar's latest OpenBLAS trunk with some patches to use NEHALEM instead of BARCELONA as AMD kernel fallback and excluded three non-SSE2 kernels for X86.

The HASWELL segfaults I encountered are now gone away with this build.

BTW: Your processor should use the PILEDRIVER kernel, if not one has to check driver/others/dynamic.c IMHO.

@ogrisel
Copy link
Contributor Author

ogrisel commented Aug 16, 2015

Hi @carlkl , I tried again with the new numpy / scipy builds that you did and now I can do a SVD on my rackspace VM. If I set OPENBLAS_CORETYPE='PILEDRIVER' I get a crash with "Illegal instruction".

@xianyi
Copy link
Collaborator

xianyi commented Aug 16, 2015

@ogrisel , If you set OPENBLAS_CORETYPE=PILEDRIVER, OpenBLAS will use AVX kernels by default. However, your VM didn't support AVX instructions.

NO - AVX (Advanced Vector Extensions)

Therefore, illegal instruction.

@ogrisel
Copy link
Contributor Author

ogrisel commented Aug 17, 2015

Alright, any idea which kernel type is the right for this host based on the output of cpuid_features.exe?

@carlkl
Copy link

carlkl commented Aug 17, 2015

@ogrisel, I guess the right kernel should be the one chosen by OpenBLAS itself. Check with openblas_get_corename().
I wonder why AVX is checked by cpuid_features.exe as unsupported. It should be supported by the CPU according to the CPU specifications.
Just a wild guess, is it due to problems with the VM?: See i.e. http://www.aidanfinn.com/2011/08/kb2568088-hyper-v-vm-wont-start-on-amd-cpu-with-avx/

@ogrisel
Copy link
Contributor Author

ogrisel commented Aug 17, 2015

I checked with openblas_get_corename() using this script: https://gist.github.com/ogrisel/ad4e547a32d0eb18b4ff and it originally returned Barcelona (before using the latest build from @carlkl) but that causes the crash whereas when I force Nehalem the code works (albeit probably slower than with an AVX kernel).

@carlkl
Copy link

carlkl commented Aug 17, 2015

I guess in https://github.com/xianyi/OpenBLAS/blob/develop/driver/others/dynamic.c support_avx() returns 0 and therefore BARCELONA is used as fallback kernel.

@ogrisel
Copy link
Contributor Author

ogrisel commented Aug 17, 2015

The question then is why Barcelona is a bad fallback for this host: what instruction set is used by the Barcelona kernel that is not available on this machine?

@carlkl
Copy link

carlkl commented Aug 17, 2015

@ogrisel, concerning #603 (comment): does the segfault still happen with the newer numpy/scipy wheels from https://anaconda.org/carlkl?

I can again make a OpenBLAS debug build (I used https://github.com/wernsaar/OpenBLAS due to newer haswell kernels) for these builds. Is 64bit only ok for you?

@ogrisel
Copy link
Contributor Author

ogrisel commented Aug 17, 2015

@carlkl your latest build of openblas / numpy / scipy on https://anaconda.org/carlkl hides the problem successfully by making openblas detect the Nehalem architecture instead of the Barcelona architecture that causes the crash on that machine.

@carlkl
Copy link

carlkl commented Aug 17, 2015

@ogrisel, that's correct. For these builds I used a patched wernsaar repo. I will put these patches for that build on https://github.com/carlkl/OpenBLAS.
Later patches will go to https://github.com/mingwpy/OpenBLAS.

@ogrisel
Copy link
Contributor Author

ogrisel commented Aug 17, 2015

Ideally it would be great to have the upstream OpenBLAS architecture detection mechanism robust to such a wonky platform (probably caused by the virtualization layer that hides the support for the AVX and maybe other instruction sets).

@xianyi
Copy link
Collaborator

xianyi commented Aug 19, 2015

Does sgesdd call gemv? I suspect gemv may cause this segfault problem.

@ogrisel
Copy link
Contributor Author

ogrisel commented Aug 19, 2015

Does sgesdd call gemv? I suspect gemv may cause this segfault problem.

I tried to reproduce the crash with a direct matrix vector multiplication but I cannot reproduce it. Here is what I a tried (among variants):

OPENBLAS_CORETYPE=Barcelona python -c"import numpy as np; print(np.random.randn(1000, 10000), np.random.randn(10000))"

@carlkl
Copy link

carlkl commented Jan 21, 2019

@ogrisel, I guess this issue can be closed?

@martin-frbg
Copy link
Collaborator

@carlkl it is not clear to me what (if anything) was fixed in OpenBLAS to avoid this problem, and I do not have any Barcelona or similar older AMD hardware. BARCELONA still appears to be the default fallback for any AMD target that lacks AVX capability, although this ticket suggests this may be problematic. If I am reading this correctly, you have/had some private fork for anaconda where you
replaced some of these instances with NEHALEM as a workaround ?

@carlkl
Copy link

carlkl commented Jan 21, 2019

@martin-frbg, my fork you mentioned is not used anymore. In this fork I replaced the BARCELONA fallback with NEHALEM.

However, this issue is quite old now. If the problem still occurs with a newer version of numpy+openblas tested against Barcelona (@ogrisel, is it possible to test this combination?) a new issue should be created IMHO.

@martin-frbg
Copy link
Collaborator

I'm actually quite happy to get rid of these old tickets, I just did not want to close anything when I have no means of verifying. If you still can test on Barcelona, perhaps we could also kill #494 (a supposedly benign valgrind warning for ddot). Not sure about #607 as that was on Bulldozer, but I suspect that it may have been solved by later fixes as well.

@martin-frbg
Copy link
Collaborator

@ogrisel @carlkl so do you see any chance to test this in the near future, or should this old issue be closed due to obsolete hardware ?

@carlkl
Copy link

carlkl commented Feb 6, 2019

@martin-frbg, if no one has the technical requirements for testing this configuration (AMD on AVX disabled VM) against a recent OpenBLAS version I propose to close this issue. A new issue should be opened if this "BARCELONA against NEHALEM fallback" problem pops up again.
Another option would be to exchange BARCELONA_FALLBACK with NEHALEM_FALLBACK right now as it seems not harmful to do that.

@martin-frbg
Copy link
Collaborator

Closing without changes to the fallback as the issue appears to have been seen only in some unspecified VM that did not expose the actual Piledriver hardware, and only when running windows in this VM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants