Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illegal instruction on Opteron (detects SSE3 when no sse3 present) #2794

Closed
clemej opened this issue Aug 25, 2020 · 39 comments
Closed

Illegal instruction on Opteron (detects SSE3 when no sse3 present) #2794

clemej opened this issue Aug 25, 2020 · 39 comments
Labels

Comments

@clemej
Copy link

clemej commented Aug 25, 2020

I run MLC@Home, and recently recompiled the client to use OpenBLAS instead of MKL. However, volunteers running on older Opterons reported crashes with SIGILL. This is easy to reproduce, simply launch a new VM (virt-manager/kvm includes a CPU profile for an opteron 240 (gen 1) ) . Here's the lscpu output for this VM:

$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       40 bits physical, 48 bits virtual
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  1
Core(s) per socket:  1
Socket(s):           4
NUMA node(s):        1
Vendor ID:           AuthenticAMD
CPU family:          15
Model:               6
Model name:          AMD Opteron 240 (Gen 1 Class Opteron)
Stepping:            1
CPU MHz:             3593.248
BogoMIPS:            7186.49
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           64K
L1i cache:           64K
L2 cache:            512K
L3 cache:            16384K
NUMA node0 CPU(s):   0-3
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm rep_good nopl cpuid extd_apicid tsc_known_freq pni x2apic hypervisor 3dnowprefetch vmmcall

Then you run my client with libopenblas compiled into pytorch under GDB, here's the output:

Thread 1 "mlds" received signal SIGILL, Illegal instruction.                                                                                        
0x00007ffff4d9fe04 in sgemm_oncopy_OPTERON_SSE3 () from /home/build/squashfs-root/usr/bin/../lib/libtorch_cpu.so

Obviously, there's no SSE3 in this generation of opteron. Setting OPENBLAS_CORETYPE=GENERIC allows it to run fine. However, setting OPENBLAS_CORETYPE=OPTERON, still crashes with in the same OPTERON_SSE3 function. So something is very messed up with opteron detection.

OpenBLAS master is compiled with BINARY=64 TARGET=GENERIC USE_THREAD=1 USE_OPENMP=1 DYNAMIC_ARCH=1 DYNAMIC_OLDER=1 MAX_THREADS=64 NO_AFFFINITY=1 NO_WARMUP=1 NO_SHARED=1.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Aug 25, 2020

Probably not with detection as such - at the very least it is consistent between compile-time (cpuid_x86.c) and runtime (driver/others/dynamic.c), both using ecx&1 as the flag for SSE3 capability. More likely an SSE3 instruction crept into an OPTERON-only BLAS kernel, or the same SSE3-using kernel is configured for both OPTERON and OPTERON-SSE3. Does
the backtrace for the crash with OPENBLAS_CORETYPE=OPTERON actually lead back to the exact same sgemm_oncopy_OPTERON_SSE3 function (then perhaps "only" the string matching for that particular option is broken) ?

@martin-frbg
Copy link
Collaborator

martin-frbg commented Aug 25, 2020

Hmm. WRT detection, it would not be the first time a VM environment does not emulate all details of cpuid correctly.
Also both OPTERON and OPTERON_SSE3 use the same gemm_ncopy_4_opteron.S from the original GotoBLAS, and I see nothing that looks like SSE3 in there. (In fact the only difference in kernels appears to be in ZGEMV, using the MOVDDUP instruction on newer Opterons).

@clemej
Copy link
Author

clemej commented Aug 25, 2020

Ok, this is weird..... I could have sworn it did, but its the reverse...?

OPENBLAS_CORETYPE=OPTERON_SSE3 : SIGILL in sgemm_oncopy_OPTERON ()
OPENBLAS_CORETYPE=OPTERON : SIGILL in sgemm_oncopy_OPTERON_SSE3 ()
OPENBLAS_CORETYPE=PENRYN : Works fine.
GENERIC, of course, also works fine.

So you're right, its correctly interpreting the system as opteron, but running sse3 code regardless... why the reverse still gives a sigill, I don't know. Opteron isn't the only machine having SIGILL issues after we switched from MKL to OpenBLAS, Phenom and Core2 Duos also fail, but I don't have time or hardware to get backtraces on those. There's a thread over in our forums with the full list which includes a DB dump of all the CPUs I've seen errors on: https://www.mlcathome.org/mlcathome/forum_thread.php?id=63 .

@clemej
Copy link
Author

clemej commented Aug 25, 2020

The systems listed on the forum from our internal DB were from a client linked against the libopenblas.so shipped with Ubuntu 14.04 (so very old), so perhaps those issues are fixed in a newer release. What you see above is with the opteron testing is a new client I'm testing which compiles in the latest libopenblas (from git) statically into the binary in the hopes of fixing these issues. This opteron issue in this is the only one I can test and verify its still an issue.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Aug 25, 2020

Thanks. Seems the logic (if one can even call it that) which maps the environment variable to the list of supported core types simply has the two Opteron models reversed. This does not explain the SIGILL though, as the sgemm_oncopy maps to the same code in both. An unmet alignment requirement seems more likely than an actual unsupported instruction, I will set up a qemu VM as I do not have the actual hardware. (Same goes for Phenom, I may be able to revive a Core2Duo later - chances are that the OpenBLAS code for these is unchanged from the original GotoBLAS of ten years ago while compilers have advanced to expose formerly harmless mistakes.

@clemej
Copy link
Author

clemej commented Aug 25, 2020

An alignment issue would make sense. I'm building with gcc9 and gfortran9 . Good luck and let me know if I can help.

The code is open source https://gitlab.com/clemej/mlds , the embedded openblas build is in theopenblas-build branch, but be prepared because it downloads and compiles all of pytorch so its a pain to build. Also, I'd need to post/send you a data file.

Thanks again for looking into it.

@brada4
Copy link
Contributor

brada4 commented Aug 25, 2020

CPUID is from 2nd generation
Authentic SSE3-less is here:
https://www.linuxjournal.com/article/6711

It has 3dnow, so another kind of sigill will (maybe) hit.
You can use only GEN2 from libvirt, due to absence of 3DNOW and 3DNOWEXT ISA-s
3DNOWPREFETCH is a partial thing that does no computation

It is a bug in libvirt that

  • does not force 3DNOW ISA from the host CPU like it does with IBRS or AVX2
  • Gives false CPUID that originally has SSE3

@brada4
Copy link
Contributor

brada4 commented Aug 25, 2020

/usr/share/libviert/cpu_map/x86_Opteron_G1.xml
Should have
model="5"
To be detected as SSE3-less piece

and to be pedantic (and not start on later Opterons)
feature="3dnow"
feature="3dnowext"

@martin-frbg
Copy link
Collaborator

martin-frbg commented Aug 25, 2020

@brada4 not sure if any of this is actually relevant when (a) it does not look like the SIGILL involves SSE3 instructions at all and (b) as I understand it the problem was/is observed on actual hardware.

@brada4
Copy link
Contributor

brada4 commented Aug 25, 2020

Just an observation that emulator has wrong bits and may not reflect the real hardware, though opteron and crash is heard in both sides.
CPUID and backtrace from the real thing is needed. Maybe it is wrong for last 15 years till notices now.

@brada4
Copy link
Contributor

brada4 commented Aug 25, 2020

Actually toplevel CPUID never detects OPTERON_SSE3
So no idea where is the truth.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Aug 25, 2020

The only non-portable instruction in that code is an mmx or 3dnow "emms/femms" instruction towards the end of sgemm_ncopy_4_opteron.S - Ubuntu happens to have a stale ticket with something similar https://answers.launchpad.net/ubuntu/+question/284671 where disassembling the crash address in gdb pointed at just that statement.
Unfortunately I cannot reproduce the SIGILL in qemu with the BLAS-Tester testsuite, which would have made things much easier.

@clemej
Copy link
Author

clemej commented Aug 25, 2020

I'm posting the binary and instructions on how to run it if that would help. It's a multi-step process since it's an appimage bundle, but in case its helpful:

To debug under gdb, run:

  • ./mlds.appimage --appimage-extract to extract the files into the directory "squashfs-root".
  • cd /squashfs-root/usr/lib
  • copy or link dataset.hdf5 to this directory with all the libs .
  • gdb ../bin/mlds

NOTE: This binary includes libopenblas 0.2.18 as a shared library, not the latest git embedded version compiled statically. Since you said the code hadn't changed, I hope that's not an issue. Library is in the same lib runtime directory.

I'm now experiencing another issue where I the build sometimes(?) fails to compile with a segv when compiling openblas master as part of my full build, but I haven't determined if that's pilot error or an actual openblas issue (binutils in ubuntu 16.04 might be too old).

@martin-frbg
Copy link
Collaborator

martin-frbg commented Aug 25, 2020

Depends - is it a segv in the compiler, or in one of the tests that are run at the end of the build ? (also this might just be semantics, but "master" branch is outdated, you'll want "develop").
I'll see later if getting gdb to disassemble the highlighted address in your libopenblas tells me anything - ideally it will also point to the femms instruction.

@clemej
Copy link
Author

clemej commented Aug 26, 2020

The segv is in the tests I'm pretty sure, but I'm too distracted to run that down at the moment. I'll open another issue if that turns out to be a problem. And yes, I meant (and have been using) the default develop branch.

@brada4
Copy link
Contributor

brada4 commented Aug 26, 2020

zen 2300U with Opteron_G1 VM - no crash to get backtrace from, and detects prescott, EDIT no crash with OPTERON_SSE3 forced either

@brada4
Copy link
Contributor

brada4 commented Aug 26, 2020

@clemej - can you run mlds inside gdb i.e gdb usr/bin/mlds
then when it crashes capture following
t a a bt
t a a disa

@clemej
Copy link
Author

clemej commented Aug 26, 2020

Might take a day or so to get back to you on this. day job needs attention at the moment.

@brada4
Copy link
Contributor

brada4 commented Aug 26, 2020

No rush, if there is a bug, it is 15+ years old.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Aug 26, 2020

@clemej if/when you get back to this and you are in gdb, could you please run disassemble /r 0x00007ffff4d9fe04,+32 similar to what was done in the Ubuntu ticket I linked ? This should tell which assembly instruction inside sgemm_ncopy failed. (My qemu experiment has not turned up anything useful so far, and unfortunately the old distribution I picked to get something that still boots on Opteron is too old to run your appimage so I will have to redo this)

@clemej
Copy link
Author

clemej commented Aug 29, 2020

Hi, sorry fro the delay. Yes, the instruction points directly to the femms instruction.

[2020-08-29 12:46:29                    main:448]       :       INFO    :       Starting Training

Thread 1 "mlds" received signal SIGILL, Illegal instruction.
0x00007ffff4da0004 in sgemm_oncopy_OPTERON_SSE3 () from /home/john/squashfs-root/usr/bin/../lib/libtorch_cpu.so
(gdb) disassemble /r 0x00007ffff4da0004,+32
Dump of assembler code from 0x7ffff4da0004 to 0x7ffff4da0024:
=> 0x00007ffff4da0004 <sgemm_oncopy_OPTERON_SSE3+4>:    0f 0e   femms  
   0x00007ffff4da0006 <sgemm_oncopy_OPTERON_SSE3+6>:    48 8d 0c 8d 00 00 00 00 lea    0x0(,%rcx,4),%rcx
   0x00007ffff4da000e <sgemm_oncopy_OPTERON_SSE3+14>:   49 89 f2        mov    %rsi,%r10
   0x00007ffff4da0011 <sgemm_oncopy_OPTERON_SSE3+17>:   49 c1 fa 02     sar    $0x2,%r10
   0x00007ffff4da0015 <sgemm_oncopy_OPTERON_SSE3+21>:   0f 8e 65 01 00 00       jle    0x7ffff4da0180 <sgemm_oncopy_OPTERON_SSE3+384>
   0x00007ffff4da001b <sgemm_oncopy_OPTERON_SSE3+27>:   0f 1f 44 00 00  nopl   0x0(%rax,%rax,1)
   0x00007ffff4da0020 <sgemm_oncopy_OPTERON_SSE3+32>:   49 89 d3        mov    %rdx,%r11
   0x00007ffff4da0023 <sgemm_oncopy_OPTERON_SSE3+35>:   4c 8d 24 0a     lea    (%rdx,%rcx,1),%r12
End of assembler dump.

This leads to an interesting (if frustrating) question:

  • I get a report that users are getting crashes (illegal instruction) when running openblas on their opteron
  • I load up a KVM VM in virt manager and set the cpu to emulate an opteron (yet it is still running on a Ryzen CPU)
  • I run my code and get an illegal instruction, and assume it is the same sigill the user sees on real hardware.
  • Is this sigill a result my emulation environment, or is it the same one they see on real hardware?

Note that I pushed out an update to the client that sets OPENBLAS_CORETYPE=generic, and users are still reporting sigill issues, this time with CORE2-based systems.

I'm still swamped with my dayjob, so I'll be sporadic in responding over the next few days, but thank you again for looking into it.

@clemej
Copy link
Author

clemej commented Aug 29, 2020

I've dug a little deeper and we might be chasing a bit of a red herring here. Here's a list of the CPUs from my database that have shown SIGILL instructions, note the opterons I see are not G1 based. Also, obviously not all are cause by this issue. Perhaps these systems are too new for 3dnow?

+------+---------------------------------------------------------------------------------+
| id   | p_model                                                                         |
+------+---------------------------------------------------------------------------------+
|  293 | Six-Core AMD Opteron(tm) Processor 2431 [Family 16 Model 8 Stepping 0]          |
|  345 | Intel(R) Core(TM)2 Duo CPU T5870 @ 2.00GHz [Family 6 Model 15 Stepping 13]      |
|  349 | AMD Phenom(tm) II X6 1090T Processor [Family 16 Model 10 Stepping 0]            |
|  477 | AMD Opteron(tm) Processor 6128 HE [Family 16 Model 9 Stepping 1]                |
|  573 | AMD A8-3820 APU with Radeon(tm) HD Graphics [Family 18 Model 1 Stepping 0]      |
| 1359 | AMD Phenom(tm) II X6 1090T Processor [Family 16 Model 10 Stepping 0]            |
| 1473 | Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz [Family 6 Model 15 Stepping 6]            |
| 1588 | Intel(R) Pentium(R) 4 CPU 3.00GHz [Family 15 Model 4 Stepping 9]                |
| 1646 | Six-Core AMD Opteron(tm) Processor 8425 HE [Family 16 Model 8 Stepping 0]       |
| 1988 | Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz [Family 6 Model 15 Stepping 11]      |
| 2032 | Six-Core AMD Opteron(tm) Processor 2431 [Family 16 Model 8 Stepping 0]          |
| 2066 | Intel(R) Core(TM)2 Duo CPU     T7500  @ 2.20GHz [Family 6 Model 15 Stepping 11] |
| 2169 | AMD Phenom(tm) II X6 1055T Processor [Family 16 Model 10 Stepping 0]            |
| 2170 | AMD Phenom(tm) II X6 1055T Processor [Family 16 Model 10 Stepping 0]            |
| 2171 | AMD Phenom(tm) II X6 1055T Processor [Family 16 Model 10 Stepping 0]            |
| 2172 | AMD Phenom(tm) II X6 1055T Processor [Family 16 Model 10 Stepping 0]            |
| 2173 | AMD Phenom(tm) II X6 1055T Processor [Family 16 Model 10 Stepping 0]            |
| 2190 | Intel(R) Core(TM)2 Duo CPU     T7300  @ 2.00GHz [Family 6 Model 15 Stepping 11] |
| 2214 | Intel(R) Core(TM)2 Duo CPU     T7300  @ 2.00GHz [Family 6 Model 15 Stepping 10] |
| 2221 | AMD Athlon(tm) 64 X2 Dual Core Processor 4000+ [Family 15 Model 107 Stepping 1] |
| 2230 | Intel(R) Core(TM)2 Duo CPU T5870 @ 2.00GHz [Family 6 Model 15 Stepping 13]      |
| 2232 | Intel(R) Core(TM)2 Duo CPU     T7500  @ 2.20GHz [Family 6 Model 15 Stepping 11] |
+------+---------------------------------------------------------------------------------+

I apologize for not looking this deep sooner.. I just assumed the sigill I was seeing was the same. Now I suspect the one i see if a matter of using KVM to emulate an old opteron on a system without 3dnow.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Aug 29, 2020

Yes it is frustrating - I cannot answer that question as I do not have old Opteron hardware. Ideally "we" would coerce one of your users into running your software under gdb to see if they get the same location in their traceback. or you could supply them a modified build with the #define EMMS in common_x86_64.h changed to return either emms or an empty string and see what happens.
Of course the SIGILL on Core2 is likely to have a different cause (but I notice that while KERNEL.CORE2 names a different source file for the sgemm_oncopy operation, the copy.S used for scopy references the EMMS macro as well.)

@clemej
Copy link
Author

clemej commented Aug 29, 2020

Agreed. without a stack trace on hardware its likely impossible. I own a core2 system I can try on (thanks thinkpad t400!), and I may have access to one of the newer opterons to get a stack trace and experiment a little on real hardware. Give me another day or so.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Aug 29, 2020

The Opteron 2431 in your list looks new enough to expect it to handle all of MMX, 3DNOW and SSE3, in fact it has its separate ISTANBUL target in OpenBLAS. The gcc compile farm has an Opteron 2212 system that would probably correspond to OpenBLAS' OPTERON_SSE3, I can see if I can repeat the SIGILL there (hope their OS is new enough to have glibc >=2.14 as required by your appimage).

@clemej
Copy link
Author

clemej commented Aug 29, 2020

Also related: https://community.amd.com/thread/159993

@clemej
Copy link
Author

clemej commented Aug 29, 2020

Not that it matters for this bug, but I can't reproduce an issue on my penryn-based Core2 system. However, I note that all the core2 systems in that list are Merom, one (tiny) generation behind.

@brada4
Copy link
Contributor

brada4 commented Aug 29, 2020

FEMMS is 3dnow instruction. It works on authentic old opteron, but not on emulator. So what instruction fails on that old opteron?

@martin-frbg
Copy link
Collaborator

martin-frbg commented Aug 29, 2020

@clemej interesting find but 3DNOW is probed via cpuid instructions in a dedicated build, and only assumed to be available in Athlon and Opteron cpus in DYNAMIC_ARCH builds.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Aug 29, 2020

Unfortunately one of the two Opteron-2212 hosts in the gcc compile farm runs ancient debian5 (glibc 2.7, no problems seen there with BLAS-Tester though), the other currently does not accept my login.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Aug 29, 2020

Update - user error w.r.t. my gcc login on the other Opteron. However mlds.appimage --extract fails with a SIGILL in _static_initialization_and_destruction_0 from its libtorch.so.0, without any apparent involvement of OpenBLAS:
/proc/cpuinfo on that system claims sse, sse2, 3dnow, 3dnowext 3dnowprefetch capabilities, the OpenBLAS build system detects it as "OPTERON".

@clemej
Copy link
Author

clemej commented Aug 29, 2020

you mean mlds.appimage --appimage-extract, right?

@martin-frbg
Copy link
Collaborator

martin-frbg commented Aug 29, 2020

Right, just caught this mistake.. However trying to run the mlds binary from gdb after successful extraction, with dataset.hdf5 placed in squash-fs/usr/lib, again results in a SIGILL in the same location.
0x00007ffff1da3420 in __static_initialization_and_destruction_0 () from /home/<myusername>/squashfs-root/usr/bin/../lib/libtorch_cpu.so

@clemej
Copy link
Author

clemej commented Aug 29, 2020

Alright, then this might not actually be an libopenblas bug. If you would be so kind as to post the disassembler output and then close this as not-a-bug, I would appreciate it. And my sincerest apologies leading you on a wild goose chase.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Aug 29, 2020

Full backtrace (same in single-threaded i..e OMP_NUM_THREADS=1, and with OPENBLAS_CORETYPE=GENERIC)

Program received signal SIGILL, Illegal instruction.
0x00007ffff1da3420 in __static_initialization_and_destruction_0 () from /home/<username>/squashfs-root/usr/bin/../lib/libtorch_cpu.so
(gdb) bt
#0  0x00007ffff1da3420 in __static_initialization_and_destruction_0 () from /home/<username>/squashfs-root/usr/bin/../lib/libtorch_cpu.so
#1  0x00007ffff7de879a in call_init (l=<optimized out>, argc=argc@entry=1, argv=argv@entry=0x7fffffffe558, env=env@entry=0x7fffffffe568) at dl-init.c:72
#2  0x00007ffff7de88ab in call_init (env=0x7fffffffe568, argv=0x7fffffffe558, argc=1, l=<optimized out>) at dl-init.c:30
#3  _dl_init (main_map=0x7ffff7ffe170, argc=1, argv=0x7fffffffe558, env=0x7fffffffe568) at dl-init.c:120
#4  0x00007ffff7dd9c5a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#5  0x0000000000000001 in ?? ()
#6  0x00007fffffffe781 in ?? ()
#7  0x0000000000000000 in ?? ()
(gdb) quit

gdb disassemble /r at this point gives

Dump of assembler code for function _Z41__static_initialization_and_destruction_0ii.constprop.0:
   0x00007ffff1da33c0 <+0>:     41 57   push   %r15
   0x00007ffff1da33c2 <+2>:     48 8d 0d e7 d6 39 03    lea    0x339d6e7(%rip),%rcx        # 0x7ffff5140ab0 <_ZN4dnnl4impl3cpu16simple_reorder_tIL16dnnl_data_type_t3EL17dnnl_format_tag_t1ELS3_2ELS4_52ELb1EvE4pd_t6createEPPNS0_12reorder_pd_tEP11dnnl_enginePK19dnnl_primitive_attrSB_PK18dnnl_memory_desc_tSB_SH_>
   0x00007ffff1da33c9 <+9>:     48 8d 15 a0 bf 39 03    lea    0x339bfa0(%rip),%rdx        # 0x7ffff513f370 <_ZN4dnnl4impl3cpu16simple_reorder_tIL16dnnl_data_type_t3EL17dnnl_format_tag_t5ELS3_2ELS4_68ELb1EvE4pd_t6createEPPNS0_12reorder_pd_tEP11dnnl_enginePK19dnnl_primitive_attrSB_PK18dnnl_memory_desc_tSB_SH_>
   0x00007ffff1da33d0 <+16>:    41 56   push   %r14
   0x00007ffff1da33d2 <+18>:    41 55   push   %r13
   0x00007ffff1da33d4 <+20>:    41 54   push   %r12
   0x00007ffff1da33d6 <+22>:    55      push   %rbp
   0x00007ffff1da33d7 <+23>:    53      push   %rbx
   0x00007ffff1da33d8 <+24>:    48 8d 1d d1 c7 39 03    lea    0x339c7d1(%rip),%rbx        # 0x7ffff513fbb0 <_ZN4dnnl4impl3cpu16simple_reorder_tIL16dnnl_data_type_t3EL17dnnl_format_tag_t5ELS3_2ELS4_74ELb1EvE4pd_t6createEPPNS0_12reorder_pd_tEP11dnnl_enginePK19dnnl_primitive_attrSB_PK18dnnl_memory_desc_tSB_SH_>
   0x00007ffff1da33df <+31>:    48 81 ec 68 0e 00 00    sub    $0xe68,%rsp
   0x00007ffff1da33e6 <+38>:    64 48 8b 04 25 28 00 00 00      mov    %fs:0x28,%rax
   0x00007ffff1da33ef <+47>:    48 89 84 24 58 0e 00 00 mov    %rax,0xe58(%rsp)
   0x00007ffff1da33f7 <+55>:    31 c0   xor    %eax,%eax
   0x00007ffff1da33f9 <+57>:    48 8d 05 50 da 39 03    lea    0x339da50(%rip),%rax        # 0x7ffff5140e50 <_ZN4dnnl4impl3cpu21rnn_weights_reorder_tIL16dnnl_data_type_t3ELS3_2EE4pd_t6createEPPNS0_12reorder_pd_tEP11dnnl_enginePK19dnnl_primitive_attrSA_PK18dnnl_memory_desc_tSA_SG_>
   0x00007ffff1da3400 <+64>:    48 8d ac 24 70 02 00 00 lea    0x270(%rsp),%rbp
   0x00007ffff1da3408 <+72>:    48 c7 84 24 08 05 00 00 00 00 00 00     movq   $0x0,0x508(%rsp)
   0x00007ffff1da3414 <+84>:    66 48 0f 6e c0  movq   %rax,%xmm0
   0x00007ffff1da3419 <+89>:    48 8d 05 f0 d2 39 03    lea    0x339d2f0(%rip),%rax        # 0x7ffff5140710 <_ZN4dnnl4impl3cpu16simple_reorder_tIL16dnnl_data_type_t3EL17dnnl_format_tag_t1ELS3_2ELS4_52ELb0EvE4pd_t6createEPPNS0_12reorder_pd_tEP11dnnl_enginePK19dnnl_primitive_attrSB_PK18dnnl_memory_desc_tSB_SH_>
=> 0x00007ffff1da3420 <+96>:    66 48 0f 3a 22 05 1d 1c d6 05 01        pinsrq $0x1,0x5d61c1d(%rip),%xmm0        # 0x7ffff7b05048
   0x00007ffff1da342b <+107>:   48 89 ef        mov    %rbp,%rdi
   0x00007ffff1da342e <+110>:   4c 8d a4 24 90 04 00 00 lea    0x490(%rsp),%r12
   0x00007ffff1da3436 <+118>:   4c 8d ac 24 10 0c 00 00 lea    0xc10(%rsp),%r13
   0x00007ffff1da343e <+126>:   0f 29 84 24 90 04 00 00 movaps %xmm0,0x490(%rsp)
   0x00007ffff1da3446 <+134>:   66 48 0f 6e c1  movq   %rcx,%xmm0
   0x00007ffff1da344b <+139>:   4c 89 e6        mov    %r12,%rsi
   0x00007ffff1da344e <+142>:   48 8d 0d 1b cf 39 03    lea    0x339cf1b(%rip),%rcx        # 0x7ffff5140370 <_ZN4dnnl4impl3cpu16simple_reorder_tIL16dnnl_data_type_t3EL17dnnl_format_tag_t1ELS3_2ELS4_91ELb1EvE4pd_t6createEPPNS0_12reorder_pd_tEP11dnnl_enginePK19dnnl_primitive_attrSB_PK18dnnl_memory_desc_tSB_SH_>

no idea what is going wrong, but I do notice that you ship a lot of libraries but rely on the system libpthread.so.0
Just let me know if and when you want me to try any modified appimage on that system.

@clemej
Copy link
Author

clemej commented Aug 29, 2020

that's an sse4_1 instruction. that is being emitted by my compiler, not openblas. :(. thank you for the help, i''ll take it from here.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Aug 29, 2020

Ugh. gcc assuming -march=native or something ? Did not really expect that so did not bother to look up the mnemonic.

@clemej
Copy link
Author

clemej commented Aug 29, 2020

It's from the intel DNNL library, which is now opeAPI.. which is embedded in the pytorch build by default. I'm gonna assume they just don't bother to a) check and b) support CPUs that don't have SSE4.1 or higher.. which was introduces with... drumrolll... penryn core2duos! which would explain all the merom and p4 failures in the above list, and probably most of the K10 amd failures too.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Aug 29, 2020

Ah ok. Assuming oneDNN is basically the same thing, the build options section of its documentation mentions a build option DNNL_ARCH_OPT_FLAGS which indeed defaults to requiring sse4.1
https://oneapi-src.github.io/oneDNN/dev_guide_build_options.html (see Warning just above headline for "Runtime CPU dispatcher control" in the middle of that page)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants