New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenBLAS on dSPACE realtime hardware #2102
Comments
This file is included from common.h when _MSC_VER is not defined (that is, when the compiler is not recognized as MSVC, some kind of Unix-like environment like mingw/msys is assumed). You can try to |
Matlab includes MKL BLAS. You do not need OpenBLAS at that point. |
Answering @martin-frbg
When using #define _MSC_VER I now get the following when compiling the code for PPC.
I did some more searching and the full compiler name is "Microtec PowerPC C/C++ Compiler 3.7"
For @brada4's question
From what I know MKL is really only optimized for x86 systems and I did not get much if any performance benefit between reference BLAS and MKL on this PPC system. I am trying to push the limits of this system and I am hoping OpenBLAS will help me do so. |
You build (cross-build linux to win32) on your laptop for use with 5 years old last 32bit matlab release - what does it have to do with some BSP compiler failing in a completely independent build? |
It does have something to do with the previous build. The mex file is used to construct a simulink model that the PPC compiler uses in combination with the original source code to create the final compiled software that is sent to the real-time system. This is standard for the system, the dSPACE software uses MATLAB to interface with the dSPACE hardware. The only thing I am doing differently than normal is trying to incorporate OpenBLAS. The 5 year old 32 bit matlab release is required (I would love to upgrade if I could). I have included the output and the makefile that is called when running |
I see no openblas built in your log nor the error you encounter.... |
Line 73 and 77 in output.txt |
Does it compile when you simply change the |
|
@martin-frbg Setting
@brada4 This is an error message from the PPC compiler
Also linking with the MATLAB libmwlapack and libmwblas works fine on their own (this is the built in MATLAB MKL). I am going step by step when implementing OpenBLAS by first replacing the BLAS before trying LAPACK. |
Could you try cross-building PPC component outside matlab? One single line error for unknown command is not really helpful. |
I see now that the |
@martin-frbg @brada4 |
It should be possible to compile using common |
The "new" errors from common.h appear to be over use of the phrase |
The errors say you are building x86 assembly code for supposedly powerpc with a powerpc compiler.
|
@brada4 mind your words please... |
Maybe I am confused about what you are trying to compile here with the PPC compiler - is it OpenBLAS itself, or is it your own code ? (I expect you will need to compile both, as you will probably need a PPC version of the library if you want the BLAS calls to be executed on the PPC440 board - the mingw compile on your laptop created the windows-style library for the Xeon cpu only). In your own code, you would probably include openblas_config.h and cblas.h, but not OpenBLAS' internal |
I have tried making some diagrams to show what is going on This is what I am currently trying to do and if I am understanding correctly @martin-frbg , you are proposing this? I would update the library for the new target? |
I am not sure if I understand the mex workflow - where you currently have "Reference BLAS", is this the source code or a binary ? (I expect this would have to be either sourcecode or a precompiled ppc binary if you want the BLAS calls to be executed on the dSPACE board) |
It is the source |
OK, then you would probably start from the OpenBLAS source as well, and build it for TARGET=PPC440 rather than relying on the autodetection that would only see the Xeon and Windows. But whatever you do you should not include common.h in your own MYCODE.c |
In principle at the point OpenBLAS is mixed in in the picture it could be blob produced outside mex just like the "other code" in same static library format. Almost same as with win32 dll you produced earlier. |
Ok, so I am now trying to make a static library for the powerpc portion of the compilation. Would this be the correct command?
I receive the following output. Compilation fails at line 1784 complaining about junk at the end of the line |
CFLAGS should be replaced with CCOMMON_OPT and FFLAGS alike, the "standard" names are changed internally e.g. different set for building LAPACK etc. I think you need to use your dspace compiler (gcc will work if target is really kind of Linux) OpenBLAS does not use libstdc++ , and it actually builds static library, see Makefile.rule for all possible options (technically "ar" archive on Linux, but check with "file" command so it is same as dspace static components) |
The "junk at end of line" messages are from the assembler apparently not understanding register names like r14 (or more likely, misinterpreting the DCBT macro that is defined (as a L1_PREFETCH) in common_power.h. Not sure if this is due to "wrong" compiler (or rather, assembler) or some other problem. |
Probably it hits C comments that normally are stripped out by C compiler, but maybe not so everywhere in long dormant codes... like: |
@mohseninima could you try building with |
@brada4 no, it is stumbling over instructions like |
I'm hitting this issue trying to build with target PPC970. After looking at this carefully, and running the affected files through the preprocessor (ex: The IBM Documentation claims:
The example assembly code on that page goes on to show them using the extended mnemonic, but does not make use of the basic mnemonic. My assembler
I'm not sure which assembler this code was written for or tested on, but it appears that GNU assembler is not one of them. Additionally, I question the macro definition of I have hardware that I can readily test fixes on. Any help that can be provided in fixing this issue would be greatly appreciated! For reference, my host machine is an IBM Power9, which should be backwards compatible with the PPC970. |
I actually have an update! I found this bug in sourceware's tracker. They mention that the 3 argument form of the I've applied some changes to my source tree which seems to cause the package to build successfully at least, but now tests are failing:
Here is a complete build log: buildlog.txt I haven't traced The patch I provided in the link above is fairly naive, and I'd imagine that there's other older PowerPC chips that might require the processor revision to be specifically set. I'm also still curious about the set of defines that determines the format for the DCBT instruction. It seems to indicate that the 2 argument format should be used on PPC970 ONLY on FreeBSD or Darwin. Why is this the case? |
I've determined that the test failure is likely related to issue #1469. I believe that the implementation that is failing on PPC970 is contained in I've patched KERNEL.PPC970 with some modifications that are similar to this commit following the rationale from the previously mentioned issue, and it seems to build and pass tests. It would be fantastic to have the Assembly intrinsics for the older PowerPC chips; considering that the hardware is less powerful than modern chips, getting more performance out of the software would be a nice benefit. I should probably add that our Power9 build host runs in Big Endian mode. Please let me know how you're going to proceed with fixing the issue with the PPC970 target. As I said before, I'm happy to help out in any way that I can! :) |
Seems you are already on the right track with your changes. There is no telling if/how/where the DCBT macro ever worked on PPC970 - |
@martin-frbg Interestingly enough, the POWER8 kernel seems to build and pass all tests on Big Endian. The utest log is here. However, the POWER9 kernel does not pass the tests. If you'd like, I could open another issue for the POWER9 test failures on Big Endian. Despite IBM's messaging, there are distributions out there that use and actively develop Big Endian POWER9. Strangely enough, I seem to run into an intermittent test failure on some ( |
If you could re-run failing sample in gdb and |
For completeness, here is the full patch that I'm currently using to build and test the PPC970 target. I seem to have gotten another build where sblat3 failed with SIGILL on the initial test run. Infuriatingly enough, it doesn't seem to be doing it again when the same test is rerun. I've run it over and over again in GDB hoping to catch the issue, but the process is continuously exiting successfully. GDB's environment has
Here's the contents of I'm not familiar with Fortran enough to understand where or why it could be failing. I assume that Fortran is calling back to PowerPC ASM/C implementations of these methods. I'm also not very experienced at using GDB beyond running programs and setting breakpoints in binaries that have debug symbols. Any help in further debugging this issue would be much appreciated! |
This is assembly instruction,not definitely completely illegal like unsupported by cpu, maybe getting unaligned arguments to otherwise legitimate instruction, it does not matter which compiler generated it, it is wronfly permited by -m(arch) flags. Probably test needs to run few more times to crash if it did do once. |
I think for POWER9 you would need to try the current |
I seem to have caught the issue in gdb! Logs are here I'm pretty sure that's what you're looking for, but I've got gdb still open just in case. I realized that I don't actually have debug symbols enabled... I'm going to rebuild another copy with them enabled and see if I can get it to croak in the meantime. Also of note, considering we're using musl libc here:
|
ssyr2k_kernel_U is common code (built from driver/level3/syr2k_kernel.c) so if something is trashing the stack it is probably the sgemm kernel. |
Try to build with MAX_STACK_ALLOC=0 |
So I've tried a few different things:
This didn't have any effect. It still seems to crash in the same place. musl libc's default stack size is much smaller (128k by default) than glibc. I tried building with Lastly, I disabled all compiler optimizations by removing all optimization flags and adding The un-optimized code appears to be spending most of it's time around here in the callstack:
I found that I can get a full annotated dump of the assembly for these binaries with a command like I'm not exactly sure what my next steps should be. |
In first backtrace after 0x0000000100028e94 starts fancy Call profile is same as x86_64. |
Building with TARGET=PPC970 on POWER9 (ppc64le, gcc-9.2.1) does not give me any crashes, |
Changing the SGEMM and CGEMM kernels in KERNEL.PPC970 to the non-altivec versions as used for DGEMM and ZGEMM (with the corresponding blanking of the INCOPY/ITCOPY and -OBJ entries, and changing of the GEMM_UNROLL_M to 4 and 2 respectively) fixed "my" problem, and will probably work for you as well. |
As noted earlier and in the binutils bug, that macro issue won't happen on LE because binutils |
@awilfox thanks for the reminder, I had indeed lost track of the context of the dcbt problem. Unfortunately even the old GotoBLAS snapshots do not provide much insight into this choice of macro - GotoBLAS-1.00 from 2006 already had PPC970 support, and DCBT at that time was defined (unconditionally) as |
This is what I have come up with so far (excluding the DCBT bug - do not see it on big-endian opensuse so probably recent binutils have changed their baseline to power4. Still probably makes sense to drop the "Darwin or BSD" requirement on the conditional for two-operand dcbt). |
Hello,
I am currently trying to update some code to use OpenBLAS and implement it on a dSPACE 1103 PowerPC board but I am having some issues. The build steps are a little confusing and I will try my best to explain. There are 3 devices in this setup
A laptop where the OpenBLAS is compiled
-Haswell i7-4700HQ
-Ubuntu 18.04 WSL 64 bit
The host for the dSPACE system
-Sandy Bridge-E Xeon E5-1620
-Windows 7 64 bit
-MATLAB 32 bit
Real-time board
-dSPACE 1103
-Power PC 750 GX
-Receives final compiled code from the host
I first compile OpenBLAS on the laptop using
make DYNAMIC_ARCH=1 BINARY=32 HOSTCC=gcc CC=i686-w64-mingw32-gcc FC=i686-w64-mingw32-gfortran CFLAGS='-static-libgcc -static-libstdc++ -static -ggdb' FFLAGS='-static' && mv -f libopenblas.dll.a libopenblas.lib
I then copy over the lib/dll.a/include files to the host PC. In my existing code MYCODE.c I include cblas.h and update my functions to use cblas. I then use MATLAB to compile a mex file by using the following command
mex -v MYCODE.c libopenblas.lib -g -I'openblas/include' -lmwlapack
This compiles successfully and I am able to run my mex file and obtain correct results.
Now to upload the code to the real-time board I first create a model in Simulink that uses the mex file and call that using
rtwbuild('MYMODEL')
which will take my C files and compile them using the PPCTools37 compiler for the real-time board PPC architecture. I then receive the errorAny idea why I would be receiving this error?
Thank You
The text was updated successfully, but these errors were encountered: