New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NULL pointer deref of vt->backend in vbp_poke() #2278

Closed
Dridi opened this Issue Mar 24, 2017 · 13 comments

Comments

Projects
None yet
5 participants
@Dridi
Member

Dridi commented Mar 24, 2017

Originally reported on #2263 by @joshuaspence:

Panic at: Fri, 24 Mar 2017 09:39:28 GMT
Assert error in child_sigsegv_handler(), mgt/mgt_child.c line 271:
  Condition(Segmentation fault by instruction at 0x74) not true.
version = varnish-5.1.1 revision de38712, vrt api = 6.0
ident = Linux,4.4.0-62-generic,x86_64,-junix,-smalloc,-smalloc,-hcritbit,epoll
now = 211663.478702 (mono), 1490348368.506800 (real)
Backtrace:
  0x438b37: /usr/sbin/varnishd() [0x438b37]
  0x463d1a: /usr/sbin/varnishd() [0x463d1a]
  0x7f08daeab390: /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390) [0x7f08daeab390]
  0x4197cb: /usr/sbin/varnishd() [0x4197cb]
  0x419c32: /usr/sbin/varnishd() [0x419c32]
  0x451b69: /usr/sbin/varnishd() [0x451b69]
  0x451feb: /usr/sbin/varnishd() [0x451feb]
  0x7f08daea16ba: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f08daea16ba]
  0x7f08dabd782d: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f08dabd782d]
thread = (cache-worker)
thr.req = (nil) {
},
thr.busyobj = (nil) {
},
@nigoroll

This comment has been minimized.

Show comment
Hide comment
@nigoroll

nigoroll Mar 26, 2017

Contributor

@joshuaspence, could you please install debuginfo packages?
To still be able to get anything from your report, we need to manually resolve the addresses. Can you please tell us which packages you used?

Contributor

nigoroll commented Mar 26, 2017

@joshuaspence, could you please install debuginfo packages?
To still be able to get anything from your report, we need to manually resolve the addresses. Can you please tell us which packages you used?

@joshuaspence

This comment has been minimized.

Show comment
Hide comment
@joshuaspence

joshuaspence Mar 26, 2017

Sorry, what do you mean by debuginfo packages?

joshuaspence commented Mar 26, 2017

Sorry, what do you mean by debuginfo packages?

@nigoroll

This comment has been minimized.

Show comment
Hide comment
@nigoroll

nigoroll Mar 26, 2017

Contributor

@joshuaspence debuginfo packages contain symbol information useful for debugging in separate files. But I just realized that the https://packagecloud.io/varnishcache/varnish5 does not contain them, sorry.
Have you used packages from packagecloud?

Contributor

nigoroll commented Mar 26, 2017

@joshuaspence debuginfo packages contain symbol information useful for debugging in separate files. But I just realized that the https://packagecloud.io/varnishcache/varnish5 does not contain them, sorry.
Have you used packages from packagecloud?

@joshuaspence

This comment has been minimized.

Show comment
Hide comment
@joshuaspence

joshuaspence Mar 26, 2017

Yep, PackageCloud.

joshuaspence commented Mar 26, 2017

Yep, PackageCloud.

@nigoroll

This comment has been minimized.

Show comment
Hide comment
@nigoroll

nigoroll Mar 27, 2017

Contributor

@joshuaspence could please tell me exactly which package you are using?

Contributor

nigoroll commented Mar 27, 2017

@joshuaspence could please tell me exactly which package you are using?

@nigoroll

This comment has been minimized.

Show comment
Hide comment
@nigoroll

nigoroll Mar 27, 2017

Contributor

inspecting the binaries from https://packagecloud.io/varnishcache/varnish5/packages/el/7/varnish-5.1.1-1.el7.x86_64.rpm/download with gdb and assuming that the first function called is pool_thread, I get the following estimate of the functions we on the stack (function offset in decimal as returned by gdb disassemble).
This doesn't seem to make any sense, so either I was looking at the wrong binaries, or made a stupid mistake, or the pointers from the backtrace were garbled.

Backtrace:
  0x438b37: /usr/sbin/varnishd() [pan_ic+2087]
  0x463d1a: /usr/sbin/varnishd() [MAC_Arg+298 VAS_Fail]
  0x7f08daeab390: /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390) [0x7f08daeab390]
  0x4197cb: /usr/sbin/varnishd() [VBP_Remove ?? invalid]
  0x419c32: /usr/sbin/varnishd() [tcp_handle+580 ?? invalid]
  0x451b69: /usr/sbin/varnishd() [WRK_Thread+1241]
  0x451feb: /usr/sbin/varnishd() [pool_thread+43]
  0x7f08daea16ba: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f08daea16ba]
  0x7f08dabd782d: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f08dabd782d]
Contributor

nigoroll commented Mar 27, 2017

inspecting the binaries from https://packagecloud.io/varnishcache/varnish5/packages/el/7/varnish-5.1.1-1.el7.x86_64.rpm/download with gdb and assuming that the first function called is pool_thread, I get the following estimate of the functions we on the stack (function offset in decimal as returned by gdb disassemble).
This doesn't seem to make any sense, so either I was looking at the wrong binaries, or made a stupid mistake, or the pointers from the backtrace were garbled.

Backtrace:
  0x438b37: /usr/sbin/varnishd() [pan_ic+2087]
  0x463d1a: /usr/sbin/varnishd() [MAC_Arg+298 VAS_Fail]
  0x7f08daeab390: /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390) [0x7f08daeab390]
  0x4197cb: /usr/sbin/varnishd() [VBP_Remove ?? invalid]
  0x419c32: /usr/sbin/varnishd() [tcp_handle+580 ?? invalid]
  0x451b69: /usr/sbin/varnishd() [WRK_Thread+1241]
  0x451feb: /usr/sbin/varnishd() [pool_thread+43]
  0x7f08daea16ba: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f08daea16ba]
  0x7f08dabd782d: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f08dabd782d]
@Dridi

This comment has been minimized.

Show comment
Hide comment
@Dridi

Dridi Mar 27, 2017

Member

@nigoroll, see my latest comment in #2269, I don't think installing the debuginfo sub-package will help the back-trace. I'm busy and can't work on the packaging but suggested a fix for future RPM packages.

This however, looks like a Debian system.

@joshuaspence could you install de38712 from source? This way if the segfault occurs again the back-trace should look similar, but with symbols.

Member

Dridi commented Mar 27, 2017

@nigoroll, see my latest comment in #2269, I don't think installing the debuginfo sub-package will help the back-trace. I'm busy and can't work on the packaging but suggested a fix for future RPM packages.

This however, looks like a Debian system.

@joshuaspence could you install de38712 from source? This way if the segfault occurs again the back-trace should look similar, but with symbols.

@nigoroll

This comment has been minimized.

Show comment
Hide comment
@nigoroll

nigoroll Mar 27, 2017

Contributor

OK, for https://packagecloud.io/varnishcache/varnish5/packages/debian/jessie/varnish_5.1.1-1_amd64.deb/download things seem to match with no offset for pool_thread

(gdb) set debug-file-directory ./usr/lib/debug
(gdb) file usr/sbin/varnishd
Reading symbols from usr/sbin/varnishd...Reading symbols from /tmp/2278/deb/usr/lib/debug/.build-id/e7/668b427ee5ed44f0fe85335d7c604b36f82040.debug...done.
done.
(gdb) info symbol 0x451feb
pool_thread + 43 in section .text of /tmp/2278/deb/usr/sbin/varnishd
(gdb) info symbol 0x451b69
WRK_Thread + 1241 in section .text of /tmp/2278/deb/usr/sbin/varnishd
...

so the annotated stack trace is then:

Panic at: Fri, 24 Mar 2017 09:39:28 GMT
Assert error in child_sigsegv_handler(), mgt/mgt_child.c line 271:
  Condition(Segmentation fault by instruction at 0x74) not true.
version = varnish-5.1.1 revision de38712, vrt api = 6.0
ident = Linux,4.4.0-62-generic,x86_64,-junix,-smalloc,-smalloc,-hcritbit,epoll
now = 211663.478702 (mono), 1490348368.506800 (real)
Backtrace:
  0x438b37: /usr/sbin/varnishd() [pan_ic + 375]
  0x463d1a: /usr/sbin/varnishd() [?? no symbol]
  0x7f08daeab390: /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390) [0x7f08daeab390]
  0x4197cb: /usr/sbin/varnishd() [vbp_poke + 187 ]
  0x419c32: /usr/sbin/varnishd() [vbp_task + 98]
  0x451b69: /usr/sbin/varnishd() [WRK_Thread + 1241 ]
  0x451feb: /usr/sbin/varnishd() [pool_thread + 43]
  0x7f08daea16ba: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f08daea16ba]
  0x7f08dabd782d: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f08dabd782d]
thread = (cache-worker)
thr.req = (nil) {
},
thr.busyobj = (nil) {
},

vbp_poke + 187 is:

   0x00000000004197b4 <+164>:	callq  0x411840 <round@plt>
   0x00000000004197b9 <+169>:	cvttsd2si %xmm0,%r12d
   0x00000000004197be <+174>:	test   %r12d,%r12d
   0x00000000004197c1 <+177>:	jle    0x419998 <vbp_poke+648>
   0x00000000004197c7 <+183>:	mov    0x28(%rbp),%rax
   0x00000000004197cb <+187>:	mov    0x74(%rax),%eax    <<<===
   0x00000000004197ce <+190>:	cmp    $0x2,%eax
   0x00000000004197d1 <+193>:	ja     0x419b0b <vbp_poke+1019>
   0x00000000004197d7 <+199>:	cmp    $0x1,%eax
   0x00000000004197da <+202>:	je     0x4199a0 <vbp_poke+656>

which matches the segmentation fault if %rax == 0

so I think we're here:

        /* Send the PROXY header */
        assert(vt->backend->proxy_header <= 2);
        if (vt->backend->proxy_header == 1) {
                if (vbp_write_proxy_v1(vt, &s) != 0)
                        return;

so it looks like we're running into vt->backend == NULL
To confirm:

(gdb) print /x offsetof(struct vbp_target, backend)
$2 = 0x28
(gdb) print /x offsetof(struct backend, proxy_header)
$3 = 0x74

-> it all works out

Can't continue with this now, anyone feel free to take it, otherwise I might get back later
@joshuaspence no response needed any more

Contributor

nigoroll commented Mar 27, 2017

OK, for https://packagecloud.io/varnishcache/varnish5/packages/debian/jessie/varnish_5.1.1-1_amd64.deb/download things seem to match with no offset for pool_thread

(gdb) set debug-file-directory ./usr/lib/debug
(gdb) file usr/sbin/varnishd
Reading symbols from usr/sbin/varnishd...Reading symbols from /tmp/2278/deb/usr/lib/debug/.build-id/e7/668b427ee5ed44f0fe85335d7c604b36f82040.debug...done.
done.
(gdb) info symbol 0x451feb
pool_thread + 43 in section .text of /tmp/2278/deb/usr/sbin/varnishd
(gdb) info symbol 0x451b69
WRK_Thread + 1241 in section .text of /tmp/2278/deb/usr/sbin/varnishd
...

so the annotated stack trace is then:

Panic at: Fri, 24 Mar 2017 09:39:28 GMT
Assert error in child_sigsegv_handler(), mgt/mgt_child.c line 271:
  Condition(Segmentation fault by instruction at 0x74) not true.
version = varnish-5.1.1 revision de38712, vrt api = 6.0
ident = Linux,4.4.0-62-generic,x86_64,-junix,-smalloc,-smalloc,-hcritbit,epoll
now = 211663.478702 (mono), 1490348368.506800 (real)
Backtrace:
  0x438b37: /usr/sbin/varnishd() [pan_ic + 375]
  0x463d1a: /usr/sbin/varnishd() [?? no symbol]
  0x7f08daeab390: /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390) [0x7f08daeab390]
  0x4197cb: /usr/sbin/varnishd() [vbp_poke + 187 ]
  0x419c32: /usr/sbin/varnishd() [vbp_task + 98]
  0x451b69: /usr/sbin/varnishd() [WRK_Thread + 1241 ]
  0x451feb: /usr/sbin/varnishd() [pool_thread + 43]
  0x7f08daea16ba: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f08daea16ba]
  0x7f08dabd782d: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f08dabd782d]
thread = (cache-worker)
thr.req = (nil) {
},
thr.busyobj = (nil) {
},

vbp_poke + 187 is:

   0x00000000004197b4 <+164>:	callq  0x411840 <round@plt>
   0x00000000004197b9 <+169>:	cvttsd2si %xmm0,%r12d
   0x00000000004197be <+174>:	test   %r12d,%r12d
   0x00000000004197c1 <+177>:	jle    0x419998 <vbp_poke+648>
   0x00000000004197c7 <+183>:	mov    0x28(%rbp),%rax
   0x00000000004197cb <+187>:	mov    0x74(%rax),%eax    <<<===
   0x00000000004197ce <+190>:	cmp    $0x2,%eax
   0x00000000004197d1 <+193>:	ja     0x419b0b <vbp_poke+1019>
   0x00000000004197d7 <+199>:	cmp    $0x1,%eax
   0x00000000004197da <+202>:	je     0x4199a0 <vbp_poke+656>

which matches the segmentation fault if %rax == 0

so I think we're here:

        /* Send the PROXY header */
        assert(vt->backend->proxy_header <= 2);
        if (vt->backend->proxy_header == 1) {
                if (vbp_write_proxy_v1(vt, &s) != 0)
                        return;

so it looks like we're running into vt->backend == NULL
To confirm:

(gdb) print /x offsetof(struct vbp_target, backend)
$2 = 0x28
(gdb) print /x offsetof(struct backend, proxy_header)
$3 = 0x74

-> it all works out

Can't continue with this now, anyone feel free to take it, otherwise I might get back later
@joshuaspence no response needed any more

@nigoroll nigoroll changed the title from Assert error in child_sigsegv_handler(), mgt/mgt_child.c line 271: Condition(Segmentation fault by instruction at 0x74) not true. to NULL pointer deref of vt->backend in vbp_poke() Mar 27, 2017

@nigoroll

This comment has been minimized.

Show comment
Hide comment
@nigoroll

nigoroll Mar 27, 2017

Contributor

credit for offsetof macro
macro define offsetof(t, f) &((t *) 0)->f)
http://stackoverflow.com/questions/39607565/how-to-define-offsetof-macro-in-gdb

Contributor

nigoroll commented Mar 27, 2017

credit for offsetof macro
macro define offsetof(t, f) &((t *) 0)->f)
http://stackoverflow.com/questions/39607565/how-to-define-offsetof-macro-in-gdb

@Dridi

This comment has been minimized.

Show comment
Hide comment
@Dridi

Dridi Mar 27, 2017

Member

@nigoroll it looks like I made the assumption that vbp_poke couldn't overlap an update of the probe (eg. going cold). I don't see how to write a test case on this one, I suppose acquiring the lock to read proxy_header once but bailing out if vt->backend is NULL would be the way to go.

Thoughts? I can write a patch this evening.

Member

Dridi commented Mar 27, 2017

@nigoroll it looks like I made the assumption that vbp_poke couldn't overlap an update of the probe (eg. going cold). I don't see how to write a test case on this one, I suppose acquiring the lock to read proxy_header once but bailing out if vt->backend is NULL would be the way to go.

Thoughts? I can write a patch this evening.

@joshuaspence

This comment has been minimized.

Show comment
Hide comment
@joshuaspence

joshuaspence Mar 27, 2017

Thanks for the great work here :)

joshuaspence commented Mar 27, 2017

Thanks for the great work here :)

@Dridi

This comment has been minimized.

Show comment
Hide comment
@Dridi

Dridi Mar 28, 2017

Member

So, yesterday evening wasn't quiet enough to have a look so I had a quick one now. This may happen when the backend is deleted, not going cold. I will send a patch today, I expect (again) quiet time ahead.

Member

Dridi commented Mar 28, 2017

So, yesterday evening wasn't quiet enough to have a look so I had a quick one now. This may happen when the backend is deleted, not going cold. I will send a patch today, I expect (again) quiet time ahead.

@hermunn

This comment has been minimized.

Show comment
Hide comment
@hermunn

hermunn May 16, 2017

Contributor

Proxy support for backend probes is not in 4.1, so no backport is necessary.

Contributor

hermunn commented May 16, 2017

Proxy support for backend probes is not in 4.1, so no backport is necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment