-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault with eigen in R #703
Comments
Does it happen with OMP_NUM_THREADS=1? |
By default valgrind assumes 2MB stack, common defaults on Linux systems are 4..8..10..16MB |
Thanks for your comments.
I still get the error if I set this as an environmental variable If I run this without debugging with valgrind,
I get
Apparently, I haven't set things up to do a core dump. On the Guix ticket,
Setting --max-stackframe based on my system's ulimit output, I still
this is the output captured in console.out:
[Edit: Truncated long, unhelpful output.] |
That was huge. You should not abort R. It just aborts that process and valgrind does not have chance to see the memory access error. Just make normal exit and cancel. {lease just post valgrind findings after you cancel SIGABRT and related to R and blas only. |
Sorry. Thanks for your patience. Here is the output if I kill the
|
This reminds me of #644 (mixup of C and FORTRAN-style array index counting) do we know what R/rlapack does between calls to BLAS/lapack functions ? I think we would need a debuggable build of OpenBLAS here to determine if the access is just before or just behind an array, and perhaps even run |
First we need debuggable glibc and clean mark on bash running simple R startup script. |
Your R build parameters are wrong |
Thanks for the comment. I'm aware of the state of Guix. My interest is in fixing its R |
I dont see how openblas (or tux racer for instance) could help there: |
Should be easy to see then if the problem is gone when one builds R with blas-shlib and then makes sure that it uses OpenBLAS at runtime. (Also not sure why we would want a debugable libc rather than debugable OpenBLAS and where the conviction that valgrind is broken rather than the stack trashed comes from ?) |
Here: |
The documentation makes clear that
We could drop it in our R build, but I'd first like to know if this is related to the reported error.
I find your aggressiveness irritating and unhelpful. Please try to be less antagonistic. |
No need to spit flames. 1 screen down the document you posted TLDR you see proper flag to link against openblas. |
Latest build log for R-3.2.2 shows you link to both libRblas.so - old Netlib BLAS included in R and libopenblas.so (minus lapack/e), and in turn seconds later you build libRlapack.so and link to both libraries. split R "example" in 5 lines, so you see for yourself if it is eigen crashing first ot other (backtrace does not match R eigen() call) Tell me if you want me to continue on your build logs while i scrolled to best places in them. |
I dropped I don't really know if this is an issue with OpenBLAS, R, eigen, or the way we build R. All I see is that R segfaults when linked with OpenBLAS. |
It is a problem that your build confuses ld.so, and in place of starting openblas thread it jumps somewhere in text of netlib blas. e.g fedora builds --with-blas-shlib (i.e building libRblas), |
Could you try building with a current development snapshot of OpenBLAS rather than 0.2.15, just in case it may be one of the bugs fixed since then (in particular crashes due to NaNs appearing in intermediate computations, though not sure if this may play a role here). It is unfortunate that there is no "simple" C test case, and it would probably require an OpenBLAS built for debugging to see exactly where in dgemv_t it blows up. |
R checks NaNs and other deviations before calling BLAS/LAPACK R> x <- eigen(crossprod(matrix(rnorm(50 * 500), 50, 500))) |
Is it similar to #783 ? |
R does not crash in that test on Ubuntu 14.04 CentOS 7.2 Debian 8with all 3 randomize_va_space values, nor with windows 7 or 10 with or without EMET, 10k sequential runs each (default package available for particular os, old an new depending on distribution), looping through kernel sets in #783 . |
crossprod is dgemm |
On Guix I hit a similar segfault with numpy.linalg.eigh using openblas 0.2.15. Upgrading to 0.2.18 made the segfault dissappear. It may be worth trying R again with the later openblas. |
you can post that in guix sd bug list, though keep your expectations low. |
@brada4 I have posted to Guix ML yesterday. Thanks OpenBLAS for fixing that eigen bug - because it was (indeed) an OpenBLAS bug. I don't think your statement on compilers is correct. Guix does not mix compilers. It may be, however, that blas libraries were intermixed in the build like you found earlier in the thread, mostly due to the fact that the R build system allows for that (?!) Anyway, my take is that OpenBLAS has better optimizations, so we should be using that over the R libs. GNU Guix and Nix are the only distributions that give full control over the dependency graph. I am not surprised Guix found this bug. |
If my reading of the discussion at http://lists.gnu.org/archive/html/guix-devel/2016-09/msg00661.html is correct, GUIX is back to using (the newer) OpenBLAS with R with no negative results. (Please reopen if necessary) |
Hello,
The Guix package manager contains an OpenBLAS recipe. When used with
R, calling eigen with a matrix over some size results in a segfault:
This does not occur when R is built without OpenBLAS.
System details:
OpenBLAS has been configured with NO_LAPACK=1,DYNAMIC_ARCH=1.
R is built with the following flags:
Any ideas on what the source of this problem might be? (While
discussing this on the Guix ticket, it was suggested that the
issue may be on the R side.)
Thanks in advance.
The text was updated successfully, but these errors were encountered: