Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler Failure on POWER8 with /kernel/power/i{c,d,s,z}a{min,max}.c #2254

Closed
grisuthedragon opened this issue Sep 12, 2019 · 132 comments · Fixed by #2263
Closed

Compiler Failure on POWER8 with /kernel/power/i{c,d,s,z}a{min,max}.c #2254

grisuthedragon opened this issue Sep 12, 2019 · 132 comments · Fixed by #2263

Comments

@grisuthedragon
Copy link
Contributor

I updated my installations on my POWER8 machine and since version 0.3.6 up to the current development branch, the build process fails with:

$ make  MAKE_NB_JOBS=1
...
cc -c -Ofast -mcpu=power8 -mtune=power8 -mvsx -malign-power -DUSE_OPENMP -fno-fast-math -fopenmp -DMAX_STACK_ALLOC=2048 -fopenmp -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DUSE_OPENMP -DNO_WARMUP -DMAX_CPU_NUMBER=160 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.8.dev\" -DASMNAME=isamax_k -DASMFNAME=isamax_k_ -DNAME=isamax_k_ -DCNAME=isamax_k -DCHAR_NAME=\"isamax_k_\" -DCHAR_CNAME=\"isamax_k\" -DNO_AFFINITY -I.. -UDOUBLE  -UCOMPLEX -UCOMPLEX -UDOUBLE -DUSE_ABS  -UUSE_MIN ../kernel/power/isamax.c -o isamax_k.o
../kernel/power/isamax.c: In function ‘siamax_kernel_64’:
../kernel/power/isamax.c:288:1: internal compiler error: in build_int_cst_wide, at tree.c:1210
 }
 ^
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
Preprocessed source stored into /tmp/ccriHyPV.out file, please attach this to your bugreport.
make[1]: *** [isamax_k.o] Error 1
make[1]: Leaving directory `/root/OpenBLAS/kernel'
make: *** [libs] Error 1

OS Details:
CentOS 7.6 ppc64el, gcc 4.8.5, gfortran 4.8.5, IBM POWER8 LC822

@grisuthedragon grisuthedragon changed the title Compiler Failure on POWER8 with /kernel/power/ica{min,max}.c Compiler Failure on POWER8 with /kernel/power/i{c,d,s,z}a{min,max}.c Sep 12, 2019
@martin-frbg
Copy link
Collaborator

Interesting - this file is indeed new in 0.3.6, but (from debugging #2233) the ICE in question appears to be fixed in more recent compilers. (Unfortunately the source line 288 from the error message tells us nothing as it is simply the closing brace on the last line of the file).

@grisuthedragon
Copy link
Contributor Author

If I find some time I will take a closer look inside the routines and why they are crashing.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Sep 12, 2019

I see now that this has already been reported (against gcc 7.1.0) by susilehtlola as https://bugzilla.redhat.com/show_bug.cgi?id=1740539
(From the tests for #2233, the code compiles with gcc 5.4 and again from 7.3 onwards (7.3.0/8.2.1/9.1), so it could be that something was fixed, broken, and fixed again)

@martin-frbg
Copy link
Collaborator

martin-frbg commented Sep 12, 2019

Reproduced in the unicamp.br minicloud. Preprocessed source of isamax.c is here:
isamax_preprocessed.txt if somebody wants to add it to the redhat bugzilla entry (@susilehtola)
Also the problem goes away at -O0 (instead of our default -Ofast) so can be worked around with
a #pragma GCC optimize "O0" in the affected files.

@martin-frbg
Copy link
Collaborator

Also of note is that in my tests (gcc-4.8.5 20150623 Red Hat 4.8.5-36 on CentOS 7) it is only the
single precision versions (isamin/isamax.c and icamin/icamax.c) that cause an ICE.

@susilehtola
Copy link
Contributor

@martin-frbg thanks, uploaded. However, in my experience there's very little movement on RHEL bugzilla tickets.

If someone has a RHEL subscription, they should make an issue about the bug with Red Hat; things get moving when a paying customer complains.

@grisuthedragon
Copy link
Contributor Author

grisuthedragon commented Sep 13, 2019

I did some experiments with isamax.c and found the around line 111 the snippets:

temp0 += temp1;                                                                                                                                                                                                    
temp0 += temp1; //temp0+32

and around line 170

    quadruple_indices = vec_sel( quadruple_indices,ind2,r3);
    quadruple_values= vec_sel(quadruple_values,vv0,r3);      
    
    temp0+=temp1;
    temp0+=temp1; //temp0+32

seem to disturb the compiler.

Update: narrow the range a bit.

@quickwritereader
Copy link
Contributor

Could you comment/remove some lines to find which parts cause this error?

@grisuthedragon
Copy link
Contributor Author

@quickwritereader That was exactly what I did. Commenting out the above mentioned lines, the code compiles.

@quickwritereader
Copy link
Contributor

quickwritereader commented Sep 13, 2019

addition? could you remove that c++ style comment or replace it with c style /* */

@grisuthedragon
Copy link
Contributor Author

The c++ style comment is already in OpenBLAS and not from me. I tried to change "+=" to "temp0 = temp+temp1" because I had this problem in ancient times with gcc and the vetorization of a piece of code but that did not change anything.

@quickwritereader
Copy link
Contributor

Maybe to rename/refactor temp0 to other names. feel free to change refactor. comment blocks to c style.
temp0 = vec_add( temp1 , vec_add(temp0,temp1));

@grisuthedragon
Copy link
Contributor Author

Renaming and replacing the addition does not help.

@quickwritereader
Copy link
Contributor

from what versions of gcc it disappears?
remove all register hints too.
Also rewrite + operations with this style result=vec_add(operand1,operand2)

@grisuthedragon
Copy link
Contributor Author

I tried it with gcc 5.4 and there it works. The problem is as long as RHEL7 with gcc 4.8.5 is the recommend OS from IBM for the POWER8, it should work with the old gcc as well.

Rewritting the + to vec_add does also not help.

@martin-frbg
Copy link
Collaborator

Strangely I saw no effect of removing these lines (on the gcc 4.8.5 build that is), and have not managed to narrow it down to less than the entire siamax_kernel_64 routine. My current impression is that this is some more general issue with the __register vector declarations or perhaps an interaction with OpenMP (at least I found some resolved gcc/gfortran issues about failure of build_int_cs_wide in conjunction with OpenMP from the gcc 4.x timeframe).

@grisuthedragon
Copy link
Contributor Author

Since the gcc 4.x series will pass away, a possible fix would be do deactivate the kernels for GCC < 5.x and bypass this problem this way.

@quickwritereader
Copy link
Contributor

ok lets replace double addition with one introducing new variable before for loop after temp1=temp1 <<1 ; line
__vector unsigned int add_32= temp1<<1;
or
__vector unsigned int add_32= {32,32,32,32};
then change two additions with one
temp0=vec_add(temp0,add_32);
Afaik there should be enough registers to hold additional number.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Sep 13, 2019

As long as we do not suspect anything fundamentally wrong with these files, perhaps the easiest
solution is to wrap the #pragma I suggested above in a GCC version check - probably
#if (( defined(__GNUC__) && __GNUC__ < 6 to be on the safe side.
There is no need to deactivate these kernels as such, just to deactivate the default -Ofast compiler optimization for them.

@quickwritereader
Copy link
Contributor

with recent issues, I see inline assembly was the best choice but it doubles work for the new architecture.

@martin-frbg
Copy link
Collaborator

I now see that a gcc 8.3.0 build on power8 (Ubuntu 19.04) returns 200+ errors in the single-precision complex LAPACK tests. Disabling optimization of icamin/icamax.c works around this as well (and incidentally also removes the single failure in CTFSM that we have been seeing across all architectures)

@quickwritereader
Copy link
Contributor

I will try to check those this week.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Sep 17, 2019

More importantly (and thanks to isuruf's help with debugging our Travis script), DYNAMIC_ARCH=1 with gcc 5.4 fails in the power9 part with another ICE:

./kernel/power/caxpy.c: In function ‘caxpy_kernel_16’:
../kernel/power/caxpy.c:62:33: internal compiler error: in rs6000_emit_le_vsx_move, at config/rs6000/rs6000.c:9157
         register __vector float vy_0 = vec_vsx_ld( offset_0 ,vptr_y ) ;
                                 ^

and this time the #pragma GCC optimize "O0" does not help. (This is probably why the CI for #2243 had failed - I had merged your PR as I could not reproduce the error in the unicamp minicloud, but now I realize I had only built for the power8 target with this old gcc)
Update: this error does not occur with (at least) gcc 8.3.0.

@edelsohn
Copy link

Disabling features to work-around issues in GCC 4.8.5 is okay, although IBM should work with RH to backport the fixes. But ICEs in GCC 7/8 should be reported and fixed in GCC. None of this was reported to the GCC community.

@quickwritereader
Copy link
Contributor

@martin-frbg how about to use gcc generated optimized assembly files for both power8 and power9 directly?
I intended to convert them inline assemblies but had to postpone.

@martin-frbg
Copy link
Collaborator

Yes I guess that would work (assuming the generated assembly is not totally unreadable from a human perspective). Once we are convinced that we have found a working version that is - 8.3.0
miscompiling icamin/icamax certainly complicated matters, and I suspect creating self-contained test cases for the GCC folks may be a non-trivial task as well.

@edelsohn
Copy link

edelsohn commented Sep 19, 2019

It's not great to create too much hand-written assembly if the compiler produces efficient code. The compiler will adapt to future processors. Hand-written assembly code creates another location to update.

If the testcase cannot be reduced, please open a GCC Bugzilla with the pre-processed source file.

@quickwritereader
Copy link
Contributor

@martin-frbg could you check if it passes after reverting caxpy fix.
I had to use intrinsics because of power8 error if you remember.

@martin-frbg
Copy link
Collaborator

I have now dumped the A matrix in SGEHRD after the GEMM and TRMM calls during the SHST01 run for all three scenarios (both SGEMM and TRMM using power9 code, and either of the two replaced with the corresponding power8 version). Matrix elements here are in the e+10 range, so differences in accuracy in the fifth or sixth decimal place are quite significant in absolute numbers. What I do not understand right now is that switching only the STRMMKERNEL to its power8 version has an effect on some elements of the GEMM result in the second cycle - could there be an uninitialized storage area at play ? (Attachments contain dumps of a(i,j) in the form of a double loop for i=1 to lda { for j=1 to n; print a(i.j)} )
sgehrd_p9both.txt
sgehrd_p8trmm.txt
sgehrd_p8gemm.txt

@quickwritereader
Copy link
Contributor

quickwritereader commented Dec 21, 2019

@martin-frbg thanks. I am upset all these codes turned to headache.
could you dump all information of input for sgemm case so we could test it using
against reference cases ,too

@quickwritereader
Copy link
Contributor

gemm_calc
to compile

export DIR_P8=_path_to_power8_blas__
export DIR_P9=_path_to_power9_blas__
export CC=gcc
${CC}  gemm_calc.c -I${DIR_P8} ${DIR_P8}/libopenblas.a -lm -fopenmp -o calc_power8
${CC} gemm_calc.c -I${DIR_P9} ${DIR_P9}/libopenblas.a -lm -fopenmp -o calc_power9

export PARAM_M=1
export PARAM_N=16
export PARAM_K=16
export BLAS_LDB=132
export BLAS_BETTA=1
export BLAS_ALPHA=-1

./calc_power8
./calc_power9

here

@quickwritereader
Copy link
Contributor

quickwritereader commented Dec 21, 2019

Dumped values could be copied using

    f_order_copy (  AA,16,16,16, a, nrowa ,  ncola, lda );
    f_order_copy (  BB,16,16,16, b, nrowb ,  ncolb,  ldb );
    f_order_copy (  CC,16,16,16,  c,m ,  n, ldc ); 
    f_order_copy (  CC,16,16,16,  cref,m ,  n, ldc ); 

for above case all dumps was 16X16 with lda-16
so we can copy it to differnt ldas diffrence m,n,
and use different alphas bettas to see
I did not add equality check with reference. but from previous code
if betta was small or zero it passed equality check. but for C - A*B case it did not as
it was hard to compare them with relative error because of C impact.

@quickwritereader
Copy link
Contributor

quickwritereader commented Dec 21, 2019

for example when I pass M=1
then I get for power 9
20480
but for power 8
22876.5684
but for when M=16
both power 8 power9 are 20480
The reason was that power8 using double precision for 1x16.

so I concluded that there is not anything missing on calculation side.

@quickwritereader
Copy link
Contributor

quickwritereader commented Dec 21, 2019

so I hope that I can find something typo or missing condition or calculation. But if it turns out that its just precision thing
then we don't have any option other than either to ignore or switch to power8 style double operations

@quickwritereader
Copy link
Contributor

quickwritereader commented Dec 22, 2019

@martin-frbg
I think we could play this trick for your second gemm call. before that call set input matrixes to exact copy of some numbers.
and check resulted values. how far they are different.

I think it would be better than checking separately I noted above

@quickwritereader
Copy link
Contributor

quickwritereader commented Dec 22, 2019

@martin-frbg could we ask lapack team about severity. for threshold 31 it passes as failure got 30. and 22 others are passing even for 3.0

@quickwritereader
Copy link
Contributor

plus if I change seed numbers it passes

@martin-frbg
Copy link
Collaborator

Certainly I would not have any doubts about this if it did not look like a regression compared to the previous code. (But compiling for TARGET=POWER6 in the same environment shows some failures with a threshold excession in the range of 1e6 so probably our time and energy is better spent elsewhere)

@RajalakshmiSR
Copy link

@quickwritereader @martin-frbg Can this be closed? Do you prefer opening a separate issue for lapack threshold change?

@sh1ng
Copy link

sh1ng commented Feb 12, 2020

Still reproducible on 0.3.8

../kernel/power/isamax.c: In function 'siamax_kernel_64':
../kernel/power/isamax.c:288:1: internal compiler error: in build_int_cst_wide, at tree.c:1210
 }
 ^
../kernel/power/icamax.c: In function 'ciamax_kernel_32':
../kernel/power/icamax.c:326:1: internal compiler error: in build_int_cst_wide, at tree.c:1210
 }
 ^
../kernel/power/icamin.c: In function 'ciamin_kernel_32':
../kernel/power/icamin.c:264:1: internal compiler error: in build_int_cst_wide, at tree.c:1210
 }
 ^

gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)

NAME="CentOS Linux"
VERSION="7 (AltArch)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (AltArch)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7:server"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

@martin-frbg
Copy link
Collaborator

Is that big-endian POWER8 ? On little-endian it is supposed to pick up isamax_power8.S (isamax.c as processed by a newer gcc that does not iCE), unless I messed up the "if not big-endian" in KERNEL.POWER8.

@sh1ng
Copy link

sh1ng commented Feb 12, 2020

ppc64le
cpu : POWER8E (raw), altivec supported

@edelsohn
Copy link

Use IBM AT12.0 or AT13.0 or RH Developer Toolset or something newer than GCC 4.8.5 RH.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Feb 12, 2020

It should not even try to compile isamax.c in 0.3.8 - are you building with cmake by any chance ? (The simple KERNEL file parser in utils.cmake simply skips over conditional expressions.) Unfortunately I have lost OpenPower cloud access again so will need to reapply.

@edelsohn
Copy link

@martin-frbg When you have the account, please contact me so that we can set it not to expire.

@sh1ng
Copy link

sh1ng commented Feb 12, 2020

@martin-frbg no I'm using vanilla make. Will try with cmake

@sh1ng
Copy link

sh1ng commented Feb 12, 2020

@edelsohn I'm aware of the options, but prefer to use default gcc

@edelsohn
Copy link

If you want to create complications for yourself, that's your choice.

@martin-frbg
Copy link
Collaborator

@sh1ng cmake definitely won't be better, it would only have explained why my dirty trick with the conditionals in KERNEL.POWER8 does not work. Unfortunately too many related problems came up during the lifetime of this issue, ppc64be, power9, the relevance of lapack-test deviations... in particular I seem to have bungled my original gcc48 workaround with a later hack for ppc64be.

@RajalakshmiSR
Copy link

I verified that 0.3.8 builds fine on POWER8BE/ POWER8LE and POWER9 using AT12.

@martin-frbg I think the check for not big endian in KERNEL.POWER8 needs a correction
-ifneq ($(BYTE_ORDER),$(ORDER_BIG_ENDIAN))
+ifneq ($(BYTE_ORDER),"ORDER_BIG_ENDIAN")

@martin-frbg
Copy link
Collaborator

@RajalakshmiSR thanks - silly me. And I guess i should combine that fix with a "GCC<9" version check .

@amelvill-umich
Copy link

amelvill-umich commented May 20, 2020

Can still reproduce with gcc 8.2.0, OpenBlas v0.3.8 RHEL 7.5, ppc64le.

For what it's worth, I can compile this without any problems on AMD64 GCC 9.3.0, Ubuntu 20.20

../kernel/power/isamax.c: In function ‘siamax_kernel_64’:
../kernel/power/isamin.c:288:1: internal compiler error: in build_int_cst_wide, at tree.c:1210
 }
 ^
../kernel/power/isamax.c:288:1: internal compiler error: in build_int_cst_wide, at tree.c:1210
 }
 ^
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
cc -c -Ofast -mcpu=power8 -mtune=power8 -mvsx -malign-power -DUSE_OPENMP -fno-fast-math -fopenmp -DMAX_STACK_ALLOC=2048 -fopenmp -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DUSE_OPENMP -DNO_WARMUP -DMAX_CPU_NUMBER=160 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.8\" -DASMNAME=cgemv_o -DASMFNAME=cgemv_o_ -DNAME=cgemv_o_ -DCNAME=cgemv_o -DCHAR_NAME=\"cgemv_o_\" -DCHAR_CNAME=\"cgemv_o\" -DNO_AFFINITY -I.. -UDOUBLE  -DCOMPLEX -UDOUBLE -DCOMPLEX -UTRANS -UCONJ -DXCONJ ../kernel/power/cgemv_n.c -o cgemv_o.o
cc -c -Ofast -mcpu=power8 -mtune=power8 -mvsx -malign-power -DUSE_OPENMP -fno-fast-math -fopenmp -DMAX_STACK_ALLOC=2048 -fopenmp -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DUSE_OPENMP -DNO_WARMUP -DMAX_CPU_NUMBER=160 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.8\" -DASMNAME=cgemv_u -DASMFNAME=cgemv_u_ -DNAME=cgemv_u_ -DCNAME=cgemv_u -DCHAR_NAME=\"cgemv_u_\" -DCHAR_CNAME=\"cgemv_u\" -DNO_AFFINITY -I.. -UDOUBLE  -DCOMPLEX -UDOUBLE -DCOMPLEX -DTRANS -UCONJ -DXCONJ ../kernel/power/cgemv_t.c -o cgemv_u.o
cc -c -Ofast -mcpu=power8 -mtune=power8 -mvsx -malign-power -DUSE_OPENMP -fno-fast-math -fopenmp -DMAX_STACK_ALLOC=2048 -fopenmp -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DUSE_OPENMP -DNO_WARMUP -DMAX_CPU_NUMBER=160 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.8\" -DASMNAME=cgemv_s -DASMFNAME=cgemv_s_ -DNAME=cgemv_s_ -DCNAME=cgemv_s -DCHAR_NAME=\"cgemv_s_\" -DCHAR_CNAME=\"cgemv_s\" -DNO_AFFINITY -I.. -UDOUBLE  -DCOMPLEX -UDOUBLE -DCOMPLEX -UTRANS -DCONJ -DXCONJ ../kernel/power/cgemv_n.c -o cgemv_s.o
cc -c -Ofast -mcpu=power8 -mtune=power8 -mvsx -malign-power -DUSE_OPENMP -fno-fast-math -fopenmp -DMAX_STACK_ALLOC=2048 -fopenmp -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DUSE_OPENMP -DNO_WARMUP -DMAX_CPU_NUMBER=160 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.8\" -DASMNAME=cgemv_d -DASMFNAME=cgemv_d_ -DNAME=cgemv_d_ -DCNAME=cgemv_d -DCHAR_NAME=\"cgemv_d_\" -DCHAR_CNAME=\"cgemv_d\" -DNO_AFFINITY -I.. -UDOUBLE  -DCOMPLEX -UDOUBLE -DCOMPLEX -DTRANS -DCONJ -DXCONJ ../kernel/power/cgemv_t.c -o cgemv_d.o
cc -c -Ofast -mcpu=power8 -mtune=power8 -mvsx -malign-power -DUSE_OPENMP -fno-fast-math -fopenmp -DMAX_STACK_ALLOC=2048 -fopenmp -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DUSE_OPENMP -DNO_WARMUP -DMAX_CPU_NUMBER=160 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.8\" -DASMNAME=csymv_U -DASMFNAME=csymv_U_ -DNAME=csymv_U_ -DCNAME=csymv_U -DCHAR_NAME=\"csymv_U_\" -DCHAR_CNAME=\"csymv_U\" -DNO_AFFINITY -I.. -UDOUBLE  -DCOMPLEX -DCOMPLEX -UDOUBLE -ULOWER ../kernel/power/../generic/zsymv_k.c -o csymv_U.o
cc -c -Ofast -mcpu=power8 -mtune=power8 -mvsx -malign-power -DUSE_OPENMP -fno-fast-math -fopenmp -DMAX_STACK_ALLOC=2048 -fopenmp -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DUSE_OPENMP -DNO_WARMUP -DMAX_CPU_NUMBER=160 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.8\" -DASMNAME=csymv_L -DASMFNAME=csymv_L_ -DNAME=csymv_L_ -DCNAME=csymv_L -DCHAR_NAME=\"csymv_L_\" -DCHAR_CNAME=\"csymv_L\" -DNO_AFFINITY -I.. -UDOUBLE  -DCOMPLEX -DCOMPLEX -UDOUBLE -DLOWER ../kernel/power/../generic/zsymv_k.c -o csymv_L.o
cc -c -Ofast -mcpu=power8 -mtune=power8 -mvsx -malign-power -DUSE_OPENMP -fno-fast-math -fopenmp -DMAX_STACK_ALLOC=2048 -fopenmp -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DUSE_OPENMP -DNO_WARMUP -DMAX_CPU_NUMBER=160 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.8\" -DASMNAME=chemv_U -DASMFNAME=chemv_U_ -DNAME=chemv_U_ -DCNAME=chemv_U -DCHAR_NAME=\"chemv_U_\" -DCHAR_CNAME=\"chemv_U\" -DNO_AFFINITY -I.. -UDOUBLE  -DCOMPLEX -DCOMPLEX -UDOUBLE -ULOWER -DHEMV ../kernel/power/../generic/zhemv_k.c -o chemv_U.o
cc -c -Ofast -mcpu=power8 -mtune=power8 -mvsx -malign-power -DUSE_OPENMP -fno-fast-math -fopenmp -DMAX_STACK_ALLOC=2048 -fopenmp -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DUSE_OPENMP -DNO_WARMUP -DMAX_CPU_NUMBER=160 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.8\" -DASMNAME=chemv_L -DASMFNAME=chemv_L_ -DNAME=chemv_L_ -DCNAME=chemv_L -DCHAR_NAME=\"chemv_L_\" -DCHAR_CNAME=\"chemv_L\" -DNO_AFFINITY -I.. -UDOUBLE  -DCOMPLEX -DCOMPLEX -UDOUBLE -DLOWER -DHEMV ../kernel/power/../generic/zhemv_k.c -o chemv_L.o
cc -c -Ofast -mcpu=power8 -mtune=power8 -mvsx -malign-power -DUSE_OPENMP -fno-fast-math -fopenmp -DMAX_STACK_ALLOC=2048 -fopenmp -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DUSE_OPENMP -DNO_WARMUP -DMAX_CPU_NUMBER=160 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.8\" -DASMNAME=chemv_V -DASMFNAME=chemv_V_ -DNAME=chemv_V_ -DCNAME=chemv_V -DCHAR_NAME=\"chemv_V_\" -DCHAR_CNAME=\"chemv_V\" -DNO_AFFINITY -I.. -UDOUBLE  -DCOMPLEX -DCOMPLEX -UDOUBLE -ULOWER -DHEMV -DHEMVREV ../kernel/power/../generic/zhemv_k.c -o chemv_V.o
../kernel/power/icamax.c: In function ‘ciamax_kernel_32’:
../kernel/power/icamax.c:326:1: internal compiler error: in build_int_cst_wide, at tree.c:1210
 }

With 0.3.9 it gives me... a segfault?

gfortran -Ofast -mcpu=power8 -mtune=power8 -mvsx -malign-power -DUSE_OPENMP -fno-fast-math -fopenmp -Wall -frecursive -fno-optimize-sibling-calls -m64 -fopenmp  -O2 -frecursive -mcpu=power8 -mtune=power8 -malign-power -DUSE_OPENMP -fno-fast-math -fopenmp -fno-optimize-sibling-calls  -o zblat3 zblat3.o ../libopenblas_power8p-r0.3.9.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/ppc64le-redhat-linux/4.8.5 -L/usr/lib/gcc/ppc64le-redhat-linux/4.8.5/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/ppc64le-redhat-linux/4.8.5/../../..  -lc
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./sblat1

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x3FFFA2167193
#1  0x3FFFA2167D13
#2  0x3FFFA23C0477
#3  0x3FFFA2309C2C
#4  0x3FFFA23261B7
#5  0x100036DF in MAIN__ at sblat1.f:?
make[1]: *** [level1] Segmentation fault

I notice it seems to be still trying to link against the GCC 4.8.5 libraries, so there might be something I still have to look at on my end.

@martin-frbg
Copy link
Collaborator

The fix from #2411 mentioned above went into what became 0.3.9, not 0.3.8
The 0.3.9 release definitely builds and passes tests with both gcc 9.3.1 and gcc 8.3.1 (fedora 30 & 28). A mixup between compiler versions (mismatched gcc and gfortran, or picking up an older runtime) is likely to cause a crash

@quickwritereader
Copy link
Contributor

@martin-frbg did really good job fixing those issues with generating assemblies.
In my forked project, I have updated sources for icamax icamin C source files. But never Pull requested them as better and tested fixes were already supplied.
People having these types of issues could also check it.

@martin-frbg
Copy link
Collaborator

(Re)checked 0.3.9 on Centos7 with a gcc8.2.1 from developer toolset (yum install devtoolset-8-gcc-8.2.1 devtoolset-8-gcc-gfortran-8.2.1) now as well, no crash, no test failures, nothing suspicious in lapack-test.

@amelvill-umich
Copy link

It appears there was something wrong with the cluster's configuration, it seems to be working now. Sorry for the mixup!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
9 participants