New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debug symbols missing when running with valgrind #896

Open
andrewrk opened this Issue Apr 5, 2018 · 8 comments

Comments

Projects
None yet
2 participants
@andrewrk
Member

andrewrk commented Apr 5, 2018

pub fn main() void {
    *foo() += 1;
}

fn foo() &i32 {
    return @intToPtr(&i32, 10000000);
}
andy@xps ~/tmp> valgrind ./test
==15761== Memcheck, a memory error detector
==15761== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==15761== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==15761== Command: ./test
==15761== 
==15761== Invalid read of size 4
==15761==    at 0x21DDA8: ??? (in /home/andy/tmp/test)
==15761==    by 0x21DD88: ??? (in /home/andy/tmp/test)
==15761==    by 0x21DD1A: ??? (in /home/andy/tmp/test)
==15761==    by 0x21DBAB: ??? (in /home/andy/tmp/test)
==15761==    by 0x21DB2F: ??? (in /home/andy/tmp/test)
==15761==  Address 0x989680 is not stack'd, malloc'd or (recently) free'd
==15761== 
==15761== 
==15761== Process terminating with default action of signal 11 (SIGSEGV)
==15761==  Access not within mapped region at address 0x989680
==15761==    at 0x21DDA8: ??? (in /home/andy/tmp/test)
==15761==    by 0x21DD88: ??? (in /home/andy/tmp/test)
==15761==    by 0x21DD1A: ??? (in /home/andy/tmp/test)
==15761==    by 0x21DBAB: ??? (in /home/andy/tmp/test)
==15761==    by 0x21DB2F: ??? (in /home/andy/tmp/test)
==15761==  If you believe this happened as a result of a stack
==15761==  overflow in your program's main thread (unlikely but
==15761==  possible), you can try to increase the size of the
==15761==  main thread stack using the --main-stacksize= flag.
==15761==  The main thread stack size used in this run was 8388608.
==15761== 
==15761== HEAP SUMMARY:
==15761==     in use at exit: 0 bytes in 0 blocks
==15761==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==15761== 
==15761== All heap blocks were freed -- no leaks are possible
==15761== 
==15761== For counts of detected and suppressed errors, rerun with: -v
==15761== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
fish: “valgrind ./test” terminated by signal SIGSEGV (Address boundary error)

however, with gdb the symbols are read just fine.

@andrewrk andrewrk added this to the 0.3.0 milestone Apr 5, 2018

@andrewrk

This comment has been minimized.

Member

andrewrk commented Apr 5, 2018

I can reproduce this with trunk clang:

static int *foo(void) {
    return (int *)10000000;
}

void _start(void) {
    int *x = foo();
    *x += 1;
}
[nix-shell:~/tmp]$ ~/local/llvm7-debug/bin/clang -o test test.c  -nostdlib -g
[nix-shell:~/tmp]$ valgrind ./test
==18185== Memcheck, a memory error detector
==18185== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==18185== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==18185== Command: ./test
==18185== 
==18185== Invalid read of size 4
==18185==    at 0x400135: ??? (in /home/andy/tmp/test)
==18185==  Address 0x989680 is not stack'd, malloc'd or (recently) free'd
@bnoordhuis

This comment has been minimized.

Member

bnoordhuis commented Apr 5, 2018

Does it work if you run objcopy --decompress-debug-sections on the binary afterwards?

Ref: https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1567219

@andrewrk

This comment has been minimized.

Member

andrewrk commented Apr 5, 2018

Same behavior after --decompress-debug-sections.

More info: The above C code reproduces the problem with GCC as well as clang.

From #valgrind-dev on freenode:

<tomhughes> that is (more or less by accident) producing a statically linked executable and those don't generally work very well with valgrind
<tomhughes> in particular in this case I think something about it is stopping valgrind recognising it as ELF when it gets mapped so it never tries to read symbols or debug data from it

I believe this is a valgrind issue. Perhaps we can send a patch.

@andrewrk

This comment has been minimized.

Member

andrewrk commented Apr 5, 2018

I've narrowed this down to linking with LLD vs linking with binutils ld.

static int *foo(void) {
    return (int *)10000000;
}

int main(void) {
    int *x = foo();
    *x += 1;
}

Create an .o file:

~/local/llvm7-debug/bin/clang -c test.c -g

Find out link command zig is using:

zig build-exe test.zig  --library c --verbose-link

Use the same link command printed from above step, but replace LLD with binutils ld, and replace zig-cache/test.o with the test.o produced from clang above:

ld --gc-sections -m elf_x86_64 -o test /nix/store/84h2zni7h805k0i1ys2bba3dsp1cqnhh-glibc-2.26-131/lib/Scrt1.o /nix/store/84h2zni7h805k0i1ys2bba3dsp1cqnhh-glibc-2.26-131/lib/crti.o /nix/store/cm5znbhvylrh702hhsy9hnad7wk3v7xv-gcc-7.3.0/lib/gcc/x86_64-unknown-linux-gnu/7.3.0/crtbegin.o -L /nix/store/84h2zni7h805k0i1ys2bba3dsp1cqnhh-glibc-2.26-131/lib -L /nix/store/cm5znbhvylrh702hhsy9hnad7wk3v7xv-gcc-7.3.0/lib/gcc/x86_64-unknown-linux-gnu/7.3.0 -dynamic-linker /nix/store/84h2zni7h805k0i1ys2bba3dsp1cqnhh-glibc-2.26-131/lib/ld-linux-x86-64.so.2 test.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lm -lgcc --as-needed -lgcc_s --no-as-needed /nix/store/cm5znbhvylrh702hhsy9hnad7wk3v7xv-gcc-7.3.0/lib/gcc/x86_64-unknown-linux-gnu/7.3.0/crtend.o /nix/store/84h2zni7h805k0i1ys2bba3dsp1cqnhh-glibc-2.26-131/lib/crtn.o

Note that valgrind sees debug info:

==23143==    at 0x400577: main (test.c:7)

Now repeat link but with LLD:

~/local/llvm7-debug/bin/ld.lld --gc-sections -m elf_x86_64 -o test /nix/store/84h2zni7h805k0i1ys2bba3dsp1cqnhh-glibc-2.26-131/lib/Scrt1.o /nix/store/84h2zni7h805k0i1ys2bba3dsp1cqnhh-glibc-2.26-131/lib/crti.o /nix/store/cm5znbhvylrh702hhsy9hnad7wk3v7xv-gcc-7.3.0/lib/gcc/x86_64-unknown-linux-gnu/7.3.0/crtbegin.o -L /nix/store/84h2zni7h805k0i1ys2bba3dsp1cqnhh-glibc-2.26-131/lib -L /nix/store/cm5znbhvylrh702hhsy9hnad7wk3v7xv-gcc-7.3.0/lib/gcc/x86_64-unknown-linux-gnu/7.3.0 -dynamic-linker /nix/store/84h2zni7h805k0i1ys2bba3dsp1cqnhh-glibc-2.26-131/lib/ld-linux-x86-64.so.2 test.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lm -lgcc --as-needed -lgcc_s --no-as-needed /nix/store/cm5znbhvylrh702hhsy9hnad7wk3v7xv-gcc-7.3.0/lib/gcc/x86_64-unknown-linux-gnu/7.3.0/crtend.o /nix/store/84h2zni7h805k0i1ys2bba3dsp1cqnhh-glibc-2.26-131/lib/crtn.o

Note that valgrind does not see debug info:

==23136==    at 0x2010F7: ??? (in /home/andy/tmp/test)
@andrewrk

This comment has been minimized.

Member

andrewrk commented Apr 5, 2018

@andrewrk

This comment has been minimized.

Member

andrewrk commented Apr 5, 2018

Workaround suggested by llvm-dev mailing list:

diff --git a/src/link.cpp b/src/link.cpp
index 3c6e27e3..14e8cb52 100644
--- a/src/link.cpp
+++ b/src/link.cpp
@@ -217,6 +217,7 @@ static void construct_linker_job_elf(LinkJob *lj) {
         lj->args.append(g->linker_script);
     }
 
+    lj->args.append("--no-rosegment");
     lj->args.append("--gc-sections");
 
     lj->args.append("-m");

I'm considering if we should use this. I don't fully understand the implications yet, but it does fix valgrind's debug info.

@andrewrk

This comment has been minimized.

Member

andrewrk commented Apr 5, 2018

--rosegment puts read-only data and executable text in a single segment and make it executable, so your read-only code becomes executable. I think that shouldn't be a default behavior. If valgrind needs that, we can't fix it by changing the default.

But I can't think of a reason that valgrind needs executable data segment. It is likely that valgrind assumes some specific segment layout that the GNU linker creates and can't handle other layout even if it is valid.

I think this makes sense. I'm now convinced this is a bug in valgrind and I will work with the valgrind devs to fix the issue.

@andrewrk

This comment has been minimized.

Member

andrewrk commented Apr 8, 2018

https://sourceforge.net/p/valgrind/mailman/message/36286103/

It looks like the devs would appreciate a patch.

This patch does some of the work and should be a clue as to what further changes need to be made for valgrind to support executables compiled with clang and linked with LLD:

diff --git a/coregrind/m_debuginfo/debuginfo.c b/coregrind/m_debuginfo/debuginfo.c
index c8a6124a2..dbe6da10a 100644
--- a/coregrind/m_debuginfo/debuginfo.c
+++ b/coregrind/m_debuginfo/debuginfo.c
@@ -1126,9 +1126,7 @@ ULong VG_(di_notify_mmap)( Addr a, Bool allow_SkFileV, Int use_fd )
 #    error "Unknown platform"
 #  endif
 
-#  if defined(VGP_x86_darwin) && DARWIN_VERS >= DARWIN_10_7
    is_ro_map = seg->hasR && !seg->hasW && !seg->hasX;
-#  endif
 
 #  if defined(VGO_solaris)
    is_rx_map = seg->hasR && seg->hasX && !seg->hasW;
diff --git a/coregrind/m_debuginfo/readelf.c b/coregrind/m_debuginfo/readelf.c
index 70c28e629..106a6ec0c 100644
--- a/coregrind/m_debuginfo/readelf.c
+++ b/coregrind/m_debuginfo/readelf.c
@@ -1797,6 +1797,12 @@ Bool ML_(read_elf_debug_info) ( struct _DebugInfo* di )
          TRACE_SYMTAB("rw_map:  avma %#lx   size %lu  foff %ld\n",
                       map->avma, map->size, map->foff);
    }
+   for (i = 0; i < VG_(sizeXA)(di->fsm.maps); i++) {
+      const DebugInfoMapping* map = VG_(indexXA)(di->fsm.maps, i);
+      if (map->ro)
+         TRACE_SYMTAB("ro_map:  avma %#lx   size %lu  foff %ld\n",
+                      map->avma, map->size, map->foff);
+   }
 
    if (phdr_mnent == 0
        || !ML_(img_valid)(mimg, phdr_mioff, phdr_mnent * phdr_ment_szB)) {
@@ -1877,7 +1883,7 @@ Bool ML_(read_elf_debug_info) ( struct _DebugInfo* di )
                Bool loaded = False;
                for (j = 0; j < VG_(sizeXA)(di->fsm.maps); j++) {
                   const DebugInfoMapping* map = VG_(indexXA)(di->fsm.maps, j);
-                  if (   (map->rx || map->rw)
+                  if (   (map->rx || map->rw || map->ro)
                       && map->size > 0 /* stay sane */
                       && a_phdr.p_offset >= map->foff
                       && a_phdr.p_offset <  map->foff + map->size
@@ -1908,6 +1914,16 @@ Bool ML_(read_elf_debug_info) ( struct _DebugInfo* di )
                            i, (UWord)item.bias);
                         loaded = True;
                      }
+                     if (map->ro
+                         && (a_phdr.p_flags & (PF_R | PF_W | PF_X))
+                            == PF_R) {
+                        item.exec = False;
+                        VG_(addToXA)(svma_ranges, &item);
+                        TRACE_SYMTAB(
+                           "PT_LOAD[%ld]:   acquired as ro, bias 0x%lx\n",
+                           i, (UWord)item.bias);
+                        loaded = True;
+                     }
                   }
                }
                if (!loaded) {

@andrewrk andrewrk modified the milestones: 0.3.0, 0.4.0 Apr 8, 2018

andrewrk added a commit that referenced this issue Apr 18, 2018

add --no-rosegment cli option
this provides a workaround for #896
until valgrind adds support for clang/LLD
(equivalent to gcc/gold -rosegment)

@andrewrk andrewrk added downstream and removed upstream labels Aug 22, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment