Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault with eigen in R #703

Closed
kyleam opened this issue Nov 30, 2015 · 26 comments
Closed

Segfault with eigen in R #703

kyleam opened this issue Nov 30, 2015 · 26 comments
Assignees
Labels

Comments

@kyleam
Copy link

kyleam commented Nov 30, 2015

Hello,

The Guix package manager contains an OpenBLAS recipe. When used with
R, calling eigen with a matrix over some size results in a segfault:

R --vanilla -d valgrind
> x <- eigen(crossprod(matrix(rnorm(50 * 500), 50, 500)))
==12518== Invalid read of size 8
==12518==    at 0x8E400E0: dgemv_t_SANDYBRIDGE (in /gnu/store/hw9p1zyn1nh8pbm1cl69nm0i391lk6c7-openblas-0.2.15/lib/libopenblasp-r0.2.15.so)
==12518==    by 0x16BAED48: dlatrd_ (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libRlapack.so)
==12518==    by 0x16C61F92: dsytrd_ (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libRlapack.so)
==12518==    by 0x16CB9540: dsyevr_ (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libRlapack.so)
==12518==    by 0x19F42D5E: La_rs (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/modules/lapack.so)
==12518==    by 0x19F45B96: mod_do_lapack (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/modules/lapack.so)
==12518==    by 0x4F35635: bcEval (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libR.so)
==12518==    by 0x4F432DF: Rf_eval (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libR.so)
==12518==    by 0x4F48F4B: Rf_applyClosure (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libR.so)
==12518==    by 0x4F4345E: Rf_eval (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libR.so)
==12518==    by 0x4F46BBD: do_set (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libR.so)
==12518==    by 0x4F4367C: Rf_eval (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libR.so)
==12518==  Address 0xfb0 is not stack'd, malloc'd or (recently) free'd
==12518==

 *** caught segfault ***
address 0xfb0, cause 'memory not mapped'

Traceback:
 1: eigen(crossprod(matrix(rnorm(50 * 500), 50, 500)))

This does not occur when R is built without OpenBLAS.

System details:

  • Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz
  • GNU/Linux (GuixSD and Guix-build on Arch Linux)
  • gcc 4.9

OpenBLAS has been configured with NO_LAPACK=1,DYNAMIC_ARCH=1.

R is built with the following flags:

  • --with-blas=openblas
  • --with-lapack
  • --with-cairo
  • --with-libpng
  • --with-jpeglib
  • --with-libtiff
  • --with-ICU
  • --enable-R-shlib
  • --enable-BLAS-shlib
  • --with-system-zlib
  • --with-system-bzlib
  • --with-system-pcre
  • --with-system-tre
  • --with-system-xz

Any ideas on what the source of this problem might be? (While
discussing this on the Guix ticket, it was suggested that the
issue may be on the R side.)

Thanks in advance.

@brada4
Copy link
Contributor

brada4 commented Nov 30, 2015

Does it happen with OMP_NUM_THREADS=1?
It could happen that your valgrind does not understand your pthread library

@brada4
Copy link
Contributor

brada4 commented Nov 30, 2015

By default valgrind assumes 2MB stack, common defaults on Linux systems are 4..8..10..16MB
You must rise it to $(ulimit -s) * 1024 via --max-stackframe= parameter for valgrind.
It would be of value to capture FULL R output under valgrind (starting with R banner):
$ valgrind --max-stackframe=8388608 --trace-children=yes R --vanilla 2>&1 | tee console.out

@kyleam
Copy link
Author

kyleam commented Nov 30, 2015

Thanks for your comments.

Does it happen with OMP_NUM_THREADS=1?
It could happen that your valgrind does not understand your pthread library

I still get the error if I set this as an environmental variable
before starting R.

If I run this without debugging with valgrind,

R --vanilla 2>&1 | tee console.out.3

I get

R version 3.2.2 (2015-08-14) -- "Fire Safety"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> x <- eigen(crossprod(matrix(rnorm(50 * 500), 50, 500)))

 *** caught segfault ***
address 0xfb0, cause 'memory not mapped'

Traceback:
 1: eigen(crossprod(matrix(rnorm(50 * 500), 50, 500)))

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 1
aborting ...

Apparently, I haven't set things up to do a core dump. On the Guix ticket,
someone reproduced the problem and included a core dump.

You must rise it to $(ulimit -s) * 1024 via --max-stackframe= parameter for valgrind.
It would be of value to capture FULL R output under valgrind (starting with R banner):
$ valgrind --max-stackframe=8388608 --trace-children=yes R --vanilla 2>&1 | tee console.out

Setting --max-stackframe based on my system's ulimit output, I still
get the error. When I run

valgrind --max-stackframe=8388608 --trace-children=yes R --vanilla 2>&1 | tee console.out

this is the output captured in console.out:

==2120== Memcheck, a memory error detector
==2120== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==2120== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==2120== Command: /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/bin/R --vanilla
==2120==
==2122== Memcheck, a memory error detector
==2122== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==2122== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==2122== Command: /gnu/store/q6b4jg9nhsxb6kvn87nzr2w6f2vi1gx3-coreutils-8.24/bin/uname -m
==2122==
==2122==
==2122== HEAP SUMMARY:
==2122==     in use at exit: 0 bytes in 0 blocks
==2122==   total heap usage: 3 allocs, 3 frees, 116 bytes allocated
==2122==
==2122== All heap blocks were freed -- no leaks are possible
==2122==
==2122== For counts of detected and suppressed errors, rerun with: -v
==2122== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1 from 1)
==2120== Memcheck, a memory error detector
==2120== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==2120== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==2120== Command: /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/bin/exec/R --vanilla
==2120==

R version 3.2.2 (2015-08-14) -- "Fire Safety"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> x <- eigen(crossprod(matrix(rnorm(50 * 500), 50, 500)))
==2120== Invalid read of size 8
==2120==    at 0x8E400E0: dgemv_t_SANDYBRIDGE (in /gnu/store/hw9p1zyn1nh8pbm1cl69nm0i391lk6c7-openblas-0.2.15/lib/libopenblasp-r0.2.15.so)
==2120==    by 0x20BB2D48: dlatrd_ (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libRlapack.so)
==2120==    by 0x20C65F92: dsytrd_ (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libRlapack.so)
==2120==    by 0x20CBD540: dsyevr_ (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libRlapack.so)
==2120==    by 0x23F46D5E: La_rs (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/modules/lapack.so)
==2120==    by 0x23F49B96: mod_do_lapack (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/modules/lapack.so)
==2120==    by 0x4F35635: bcEval (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libR.so)
==2120==    by 0x4F432DF: Rf_eval (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libR.so)
==2120==    by 0x4F48F4B: Rf_applyClosure (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libR.so)
==2120==    by 0x4F4345E: Rf_eval (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libR.so)
==2120==    by 0x4F46BBD: do_set (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libR.so)
==2120==    by 0x4F4367C: Rf_eval (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libR.so)
==2120==  Address 0xfb0 is not stack'd, malloc'd or (recently) free'd
==2120==

 *** caught segfault ***
address 0xfb0, cause 'memory not mapped'

Traceback:
 1: eigen(crossprod(matrix(rnorm(50 * 500), 50, 500)))

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 1
aborting ...
==2132== Memcheck, a memory error detector
==2132== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==2132== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==2132== Command: /gnu/store/7jhakv1r1nbs2sr2f7ammq256w7niarh-bash-static-4.3.39/bin/bash -c rm\ -rf\ /tmp/RtmpkBr2xO
==2132==
==2132== Conditional jump or move depends on uninitialised value(s)
==2132==    at 0x4E20B6: ??? (in /gnu/store/7jhakv1r1nbs2sr2f7ammq256w7niarh-bash-static-4.3.39/bin/bash)
==2132==    by 0x496D83: ??? (in /gnu/store/7jhakv1r1nbs2sr2f7ammq256w7niarh-bash-static-4.3.39/bin/bash)
==2132==    by 0x4970DD: ??? (in /gnu/store/7jhakv1r1nbs2sr2f7ammq256w7niarh-bash-static-4.3.39/bin/bash)
==2132==    by 0x4EBAEA: ??? (in /gnu/store/7jhakv1r1nbs2sr2f7ammq256w7niarh-bash-static-4.3.39/bin/bash)
==2132==    by 0x4EDF5E: ??? (in /gnu/store/7jhakv1r1nbs2sr2f7ammq256w7niarh-bash-static-4.3.39/bin/bash)
==2132==    by 0x4EF307: ??? (in /gnu/store/7jhakv1r1nbs2sr2f7ammq256w7niarh-bash-static-4.3.39/bin/bash)
==2132==    by 0x468544: ??? (in /gnu/store/7jhakv1r1nbs2sr2f7ammq256w7niarh-bash-static-4.3.39/bin/bash)
==2132==    by 0x402FE8: ??? (in /gnu/store/7jhakv1r1nbs2sr2f7ammq256w7niarh-bash-static-4.3.39/bin/bash)
==2132==    by 0xFFF000497: ???

[Edit: Truncated long, unhelpful output.]

@brada4
Copy link
Contributor

brada4 commented Nov 30, 2015

That was huge. You should not abort R. It just aborts that process and valgrind does not have chance to see the memory access error. Just make normal exit and cancel.
Please keep away bash errors - they are clear sign your stack size does not match one traced by valgrind.

{lease just post valgrind findings after you cancel SIGABRT and related to R and blas only.

@kyleam
Copy link
Author

kyleam commented Nov 30, 2015

Sorry. Thanks for your patience. Here is the output if I kill the
process with ABRT.

==5727== Memcheck, a memory error detector
==5727== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5727== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==5727== Command: /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/bin/R --vanilla
==5727==
==5729== Memcheck, a memory error detector
==5729== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5729== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==5729== Command: /gnu/store/q6b4jg9nhsxb6kvn87nzr2w6f2vi1gx3-coreutils-8.24/bin/uname -m
==5729==
==5729==
==5729== HEAP SUMMARY:
==5729==     in use at exit: 0 bytes in 0 blocks
==5729==   total heap usage: 3 allocs, 3 frees, 116 bytes allocated
==5729==
==5729== All heap blocks were freed -- no leaks are possible
==5729==
==5729== For counts of detected and suppressed errors, rerun with: -v
==5729== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1 from 1)
==5727== Memcheck, a memory error detector
==5727== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5727== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==5727== Command: /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/bin/exec/R --vanilla
==5727==

R version 3.2.2 (2015-08-14) -- "Fire Safety"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> x <- eigen(crossprod(matrix(rnorm(50 * 500), 50, 500)))
==5727== Invalid read of size 8
==5727==    at 0x8E400E0: dgemv_t_SANDYBRIDGE (in /gnu/store/hw9p1zyn1nh8pbm1cl69nm0i391lk6c7-openblas-0.2.15/lib/libopenblasp-r0.2.15.so)
==5727==    by 0x20DE2D48: dlatrd_ (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libRlapack.so)
==5727==    by 0x20E95F92: dsytrd_ (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libRlapack.so)
==5727==    by 0x20EED540: dsyevr_ (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libRlapack.so)
==5727==    by 0x24176D5E: La_rs (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/modules/lapack.so)
==5727==    by 0x24179B96: mod_do_lapack (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/modules/lapack.so)
==5727==    by 0x4F35635: bcEval (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libR.so)
==5727==    by 0x4F432DF: Rf_eval (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libR.so)
==5727==    by 0x4F48F4B: Rf_applyClosure (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libR.so)
==5727==    by 0x4F4345E: Rf_eval (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libR.so)
==5727==    by 0x4F46BBD: do_set (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libR.so)
==5727==    by 0x4F4367C: Rf_eval (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libR.so)
==5727==  Address 0xfb0 is not stack'd, malloc'd or (recently) free'd
==5727==

 *** caught segfault ***
address 0xfb0, cause 'memory not mapped'

Traceback:
 1: eigen(crossprod(matrix(rnorm(50 * 500), 50, 500)))

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: ==5727==
==5727== Process terminating with default action of signal 6 (SIGABRT)
==5727==    at 0x5D2A313: ??? (in /gnu/store/qv7bk62c22ms9i11dhfl71hnivyc82k2-glibc-2.22/lib/libc-2.22.so)
==5727==    by 0x502829E: R_SelectEx (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libR.so)
==5727==    by 0x50285C3: R_checkActivityEx (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libR.so)
==5727==    by 0x5028B04: Rstd_ReadConsole (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libR.so)
==5727==    by 0x4F69219: sigactionSegv (in /gnu/store/jb11p396a277rndb52da20ygdksccji8-r-3.2.2/lib/R/lib/libR.so)
==5727==    by 0x5A3BDCF: ??? (in /gnu/store/qv7bk62c22ms9i11dhfl71hnivyc82k2-glibc-2.22/lib/libpthread-2.22.so)
==5727==    by 0x8E400DF: dgemv_t_SANDYBRIDGE (in /gnu/store/hw9p1zyn1nh8pbm1cl69nm0i391lk6c7-openblas-0.2.15/lib/libopenblasp-r0.2.15.so)
==5727==
==5727== HEAP SUMMARY:
==5727==     in use at exit: 40,549,043 bytes in 12,076 blocks
==5727==   total heap usage: 31,025 allocs, 18,949 frees, 71,666,249 bytes allocated
==5727==
==5727== LEAK SUMMARY:
==5727==    definitely lost: 0 bytes in 0 blocks
==5727==    indirectly lost: 0 bytes in 0 blocks
==5727==      possibly lost: 2,128 bytes in 7 blocks
==5727==    still reachable: 40,546,915 bytes in 12,069 blocks
==5727==         suppressed: 0 bytes in 0 blocks
==5727== Rerun with --leak-check=full to see details of leaked memory
==5727==
==5727== For counts of detected and suppressed errors, rerun with: -v
==5727== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2 from 2)

@martin-frbg
Copy link
Collaborator

This reminds me of #644 (mixup of C and FORTRAN-style array index counting) do we know what R/rlapack does between calls to BLAS/lapack functions ? I think we would need a debuggable build of OpenBLAS here to determine if the access is just before or just behind an array, and perhaps even run
valgrind with the --db-attach=yes option to start a gdb debugger session at the point of failure.

@brada4
Copy link
Contributor

brada4 commented Nov 30, 2015

First we need debuggable glibc and clean mark on bash running simple R startup script.
After that we can toss the coin whose fault is it.

@brada4
Copy link
Contributor

brada4 commented Dec 2, 2015

Your R build parameters are wrong
--with-blas=openblas
it is command line parameter like -lopenblas, configure just drops it
--with-lapack
Use system lapack (something provided by arch linux, not openblas or guix)
--enable-BLAS-shlib
build R's own libRblas.so from netlib source
I'd leave just latest and symlink to openblas if needed.
On top of GUIX being not production ready and arch linux providing official R-3.2.2 and openblas 0.2.14 builds+ as bonus production quality bash and valgrind packages.

@kyleam
Copy link
Author

kyleam commented Dec 3, 2015

Thanks for the comment.

I'm aware of the state of Guix. My interest is in fixing its R
build. That'd be great if the issue ends up being in the build
configuration rather than somewhere upstream.

@brada4
Copy link
Contributor

brada4 commented Dec 3, 2015

I dont see how openblas (or tux racer for instance) could help there:
You have broken valgrind making all debug attempts useless (or you want to report a problem of your miscompilation to bash maintainers?).
You said building without openblas works - if it is blas-shlib version - good - you found the way most distributions package R, then user can choose right BLAS for their CPU, be it openblas or atlas or mkl or acml trending on their system at that moment.

@martin-frbg
Copy link
Collaborator

Should be easy to see then if the problem is gone when one builds R with blas-shlib and then makes sure that it uses OpenBLAS at runtime. (Also not sure why we would want a debugable libc rather than debugable OpenBLAS and where the conviction that valgrind is broken rather than the stack trashed comes from ?)

@brada4
Copy link
Contributor

brada4 commented Dec 3, 2015

Here:
==2132== Command: /gnu/store/7jhakv1r1nbs2sr2f7ammq256w7niarh-bash-static-4.3.39/bin/bash -c rm\ -rf\ /tmp/RtmpkBr2xO
==2132==
==2132== Conditional jump or move depends on uninitialised value(s)
==2132== at 0x4E20B6: ??? (in /gnu/store/7jhakv1r1nbs2sr2f7ammq256w7niarh-bash-static-4.3.39/bin/bash)
==2132== by 0x496D83: ??? (in /gnu/store/7jhakv1r1nbs2sr2f7ammq256w7niarh-bash-static-4.3.39/bin/bash)

@rekado
Copy link

rekado commented Dec 3, 2015

@brada4

The documentation makes clear that --with-blas=openblas is a configure time parameter. It is an instruction to link with -lopenblas.

--with-lapack links against the available lapack at build time. The documentation says:

Please do bear in mind that using --with-lapack is ‘definitely not recommended’: it is provided only because it is necessary on some platforms and because some users want to experiment with claimed performance improvements. Reporting problems where it is used unnecessarily will simply irritate the R helpers.

We could drop it in our R build, but I'd first like to know if this is related to the reported error.

You have broken valgrind making all debug attempts useless (or you want to report a problem of your miscompilation to bash maintainers?).
...
On top of GUIX being not production ready and arch linux providing official R-3.2.2 and openblas 0.2.14 builds+ as bonus production quality bash and valgrind packages.

I find your aggressiveness irritating and unhelpful. Please try to be less antagonistic.

@brada4
Copy link
Contributor

brada4 commented Dec 3, 2015

No need to spit flames. 1 screen down the document you posted TLDR you see proper flag to link against openblas.
Valgrind is built with some -O99 flag. It just cannot happen that bash has corrupt stack running 1k sized script that runs on million of other linux computers just fine,and 2 openblas threads first read memory, then write same place meaning compiler used for valgrind and/opr blas and/or R optimizes out pthread mutex.
It is clear miscompilation, and me not judge where it is.

@brada4
Copy link
Contributor

brada4 commented Dec 3, 2015

Latest build log for R-3.2.2 shows you link to both libRblas.so - old Netlib BLAS included in R and libopenblas.so (minus lapack/e), and in turn seconds later you build libRlapack.so and link to both libraries.
Later on your R build does not include pthread support, i'd bet it crashes in 3/4 of tries at eigen().

split R "example" in 5 lines, so you see for yourself if it is eigen crashing first ot other (backtrace does not match R eigen() call)

Tell me if you want me to continue on your build logs while i scrolled to best places in them.

@rekado
Copy link

rekado commented Feb 5, 2016

I dropped --with-blas=openblas and removed OpenBLAS from the build environment and can confirm that the crash is gone.

I don't really know if this is an issue with OpenBLAS, R, eigen, or the way we build R. All I see is that R segfaults when linked with OpenBLAS.

@brada4
Copy link
Contributor

brada4 commented Feb 5, 2016

It is a problem that your build confuses ld.so, and in place of starting openblas thread it jumps somewhere in text of netlib blas. e.g fedora builds --with-blas-shlib (i.e building libRblas),
or ubuntu --with-blas=-lblas and after uses alternatives framework to switch global libblas implementation.

@martin-frbg
Copy link
Collaborator

Could you try building with a current development snapshot of OpenBLAS rather than 0.2.15, just in case it may be one of the bugs fixed since then (in particular crashes due to NaNs appearing in intermediate computations, though not sure if this may play a role here). It is unfortunate that there is no "simple" C test case, and it would probably require an OpenBLAS built for debugging to see exactly where in dgemv_t it blows up.

@brada4
Copy link
Contributor

brada4 commented Feb 8, 2016

R checks NaNs and other deviations before calling BLAS/LAPACK
(output from R323 and OpenBLAS 0.2.15 epel (4xbig)haswell centos7)

R> x <- eigen(crossprod(matrix(rnorm(50 * 500), 50, 500)))
R> x <- eigen(crossprod(matrix(0, 50, 500)))
R> x <- eigen(crossprod(matrix(1, 50, 500)))
R> x <- eigen(crossprod(matrix(NA, 50, 500)))
Error in eigen(crossprod(matrix(NA, 50, 500))) :
infinite or missing values in 'x'
R> x <- eigen(crossprod(matrix(NaN, 50, 500)))
Error in eigen(crossprod(matrix(NaN, 50, 500))) :
infinite or missing values in 'x'

@xianyi xianyi added the Bug label Mar 1, 2016
@xianyi xianyi self-assigned this Mar 1, 2016
@xianyi
Copy link
Collaborator

xianyi commented Mar 1, 2016

Is it similar to #783 ?

@brada4
Copy link
Contributor

brada4 commented Mar 1, 2016

R does not crash in that test on Ubuntu 14.04 CentOS 7.2 Debian 8with all 3 randomize_va_space values, nor with windows 7 or 10 with or without EMET, 10k sequential runs each (default package available for particular os, old an new depending on distribution), looping through kernel sets in #783 .
There is some regression with threading with small samples up to 0.2.15 that I will try against -dev

@brada4
Copy link
Contributor

brada4 commented Mar 1, 2016

crossprod is dgemm
eigen is dsyevr
++ input validation before calling them.

@pjotrp
Copy link

pjotrp commented Jul 31, 2016

On Guix I hit a similar segfault with numpy.linalg.eigh using openblas 0.2.15. Upgrading to 0.2.18 made the segfault dissappear. It may be worth trying R again with the later openblas.

@brada4
Copy link
Contributor

brada4 commented Aug 1, 2016

you can post that in guix sd bug list, though keep your expectations low.
compiling R with mix of different compilers never woks great.

@pjotrp
Copy link

pjotrp commented Aug 1, 2016

@brada4 I have posted to Guix ML yesterday. Thanks OpenBLAS for fixing that eigen bug - because it was (indeed) an OpenBLAS bug.

I don't think your statement on compilers is correct. Guix does not mix compilers. It may be, however, that blas libraries were intermixed in the build like you found earlier in the thread, mostly due to the fact that the R build system allows for that (?!) Anyway, my take is that OpenBLAS has better optimizations, so we should be using that over the R libs.

GNU Guix and Nix are the only distributions that give full control over the dependency graph. I am not surprised Guix found this bug.

@martin-frbg
Copy link
Collaborator

If my reading of the discussion at http://lists.gnu.org/archive/html/guix-devel/2016-09/msg00661.html is correct, GUIX is back to using (the newer) OpenBLAS with R with no negative results. (Please reopen if necessary)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants