Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible segmentation violation in dgemv_t #154

Closed
jimwright opened this issue Nov 8, 2012 · 10 comments
Closed

Possible segmentation violation in dgemv_t #154

jimwright opened this issue Nov 8, 2012 · 10 comments
Assignees
Labels
Milestone

Comments

@jimwright
Copy link

Hi,

I think I may have discovered a potential issue in kernel/x86_64/dgemv_t .S. Specifically there seems to an overflow of some internal buffer whose size appears to be exactly 32MB. This can be demonstrated by running the sample program below with the given parameters:

./crash 4194304 2
./crash 4194305 2

For values <= 4194304 the program succeeds, and for values >= 4194305 the program fails. (NB 4194304 in hex is 0x400000) This issues does not appear to effect non-transposed A (i.e. dgemv_n).

Output from valgrind and gdb is also given.

Let me know if there is any additional information that would be of use.

Regards,

Jim

Ubuntu 12.04 64 bit
OpenBlas 0.2.4
USE_THREAD = 0
NUM_THREADS = 1
NO_AFFINITY = 1
COMMON_OPT = -O2 -g

#include <stdio.h>
#include <stdlib.h>
#include <cblas.h>

int
main(int argc, char** argv)
{
    if (argc < 3)
    {
        fprintf(stderr, "usage: %s M N\n", argv[0]);
        return 1;
    }

    const size_t M = atoi(argv[1]);
    const size_t N = atoi(argv[2]);

    double* A = (double*) calloc(M * N, sizeof(double));
    double* X = (double*) calloc(M, sizeof(double));
    double* Y = (double*) calloc(N, sizeof(double));

    cblas_dgemv(CblasColMajor,          /* Order                */
                CblasTrans,             /* TransA               */
                M,                      /* M                    */
                N,                      /* N                    */
                1.0,                    /* alpha                */
                A,                      /* A                    */
                M,                      /* lda                  */
                X,                      /* X                    */
                1,                      /* incX                 */
                0.0,                    /* beta                 */
                Y,                      /* Y                    */
                1);                     /* incY                 */

    free(A);
    free(X);
    free(Y);

    return 0;
}

$ valgrind ./crash 4194304 2
==26104== Memcheck, a memory error detector
==26104== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==26104== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==26104== Command: ./crash 4194304 2
==26104==
==26104==
==26104== HEAP SUMMARY:
==26104== in use at exit: 0 bytes in 0 blocks
==26104== total heap usage: 22 allocs, 22 frees, 100,675,221 bytes allocated
==26104==
==26104== All heap blocks were freed -- no leaks are possible
==26104==
==26104== For counts of detected and suppressed errors, rerun with: -v
==26104== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)

$ valgrind ./crash 4194305 2
==26107== Memcheck, a memory error detector
==26107== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==26107== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==26107== Command: ./crash 4194305 2
==26107==
==26107== Invalid write of size 8
==26107== at 0x5088134: dgemv_t (dgemv_t.S:240)
==26107== by 0x1: ???
==26107== by 0x400000: ??? (in ./crash)
==26107== Address 0x8def000 is not stack'd, malloc'd or (recently) free'd
==26107==
==26107== Invalid read of size 8
==26107== at 0x50893A9: dgemv_t (dgemv_t.S:2249)
==26107== by 0x1: ???
==26107== by 0x400000: ??? (in ./crash)
==26107== Address 0x8def000 is not stack'd, malloc'd or (recently) free'd
==26107==
==26107==
==26107== HEAP SUMMARY:
==26107== in use at exit: 0 bytes in 0 blocks
==26107== total heap usage: 22 allocs, 22 frees, 100,675,245 bytes allocated
==26107==
==26107== All heap blocks were freed -- no leaks are possible
==26107==
==26107== For counts of detected and suppressed errors, rerun with: -v
==26107== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 2 from 2)

$ gdb --args crash 4194305 2
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
http://bugs.launchpad.net/gdb-linaro/...
Reading symbols from ./crash...done.
(gdb) r
Starting program: ./crash 4194305 2

Program received signal SIGSEGV, Segmentation fault.
?? () at ../kernel/x86_64/dgemv_t.S:240 from /usr/lib/libopenblas.so.0
240 movsd %xmm0, 0 * SIZE(X1)

@ghost ghost assigned xianyi Nov 9, 2012
@xianyi
Copy link
Collaborator

xianyi commented Nov 9, 2012

Hi @jimwright ,

Thank you for the report.

I will investigate this bug this weekend.

Thank you

Xianyi

@jimwright
Copy link
Author

Hi @xianyi,

I was wondering, did you have any success in reproducing/investigating the issue?

Kind regards,

Jim.

@xianyi
Copy link
Collaborator

xianyi commented Nov 14, 2012

Hi Jim,

We already reproduced this SEGFAULT bug.
You are right. The reason is the overflow of a bufffer. We are working on modified the assembly codes of gemv_t to fix it

Xianyi

xianyi added a commit that referenced this issue Nov 19, 2012
It overflowed the internal buffer. Thus, we split vector x into blocks when m is very large.

Thank @wangqian for this patch.
@xianyi
Copy link
Collaborator

xianyi commented Nov 19, 2012

Hi @jimwright ,

Could you test the develop branch?

Thank you

Xianyi

@staticfloat
Copy link
Contributor

@xianyi I have confirmed this fixes the error on Ubuntu 12.04 64-bit. I tried it with version 0.2.3, the error existed, I tried it with the develop branch, and the error was fixed.

@juliantaylor
Copy link

the testcase here still crashes on barcelona in i386 mode gcc-4.7 (debian unstable) with current git head 99d1978
crash in x86/gemv_t_sse2.S:195
even with the fix from gh-173

@xianyi
Copy link
Collaborator

xianyi commented Jan 14, 2013

Hi @juliantaylor ,

Is it single thread or multi thread ?

Xianyi

@juliantaylor
Copy link

the debian test was non-threaded, but also reproduce it on i386 fedora 11 (gcc 4.4) using threaded openblas

@xianyi
Copy link
Collaborator

xianyi commented Jan 20, 2013

Hi @juliantaylor ,

I think I fixed this bug on x86. Could you try it on develop branch?

Xianyi

@juliantaylor
Copy link

the develop branch fixes the issue on all my platforms, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants