Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NULL pointer deref of vt->backend in vbp_poke() #2278

Closed
Dridi opened this issue Mar 24, 2017 · 13 comments
Closed

NULL pointer deref of vt->backend in vbp_poke() #2278

Dridi opened this issue Mar 24, 2017 · 13 comments

Comments

@Dridi
Copy link
Member

@Dridi Dridi commented Mar 24, 2017

Originally reported on #2263 by @joshuaspence:

Panic at: Fri, 24 Mar 2017 09:39:28 GMT
Assert error in child_sigsegv_handler(), mgt/mgt_child.c line 271:
  Condition(Segmentation fault by instruction at 0x74) not true.
version = varnish-5.1.1 revision de38712, vrt api = 6.0
ident = Linux,4.4.0-62-generic,x86_64,-junix,-smalloc,-smalloc,-hcritbit,epoll
now = 211663.478702 (mono), 1490348368.506800 (real)
Backtrace:
  0x438b37: /usr/sbin/varnishd() [0x438b37]
  0x463d1a: /usr/sbin/varnishd() [0x463d1a]
  0x7f08daeab390: /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390) [0x7f08daeab390]
  0x4197cb: /usr/sbin/varnishd() [0x4197cb]
  0x419c32: /usr/sbin/varnishd() [0x419c32]
  0x451b69: /usr/sbin/varnishd() [0x451b69]
  0x451feb: /usr/sbin/varnishd() [0x451feb]
  0x7f08daea16ba: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f08daea16ba]
  0x7f08dabd782d: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f08dabd782d]
thread = (cache-worker)
thr.req = (nil) {
},
thr.busyobj = (nil) {
},
@nigoroll
Copy link
Member

@nigoroll nigoroll commented Mar 26, 2017

@joshuaspence, could you please install debuginfo packages?
To still be able to get anything from your report, we need to manually resolve the addresses. Can you please tell us which packages you used?

@joshuaspence
Copy link

@joshuaspence joshuaspence commented Mar 26, 2017

Sorry, what do you mean by debuginfo packages?

@nigoroll
Copy link
Member

@nigoroll nigoroll commented Mar 26, 2017

@joshuaspence debuginfo packages contain symbol information useful for debugging in separate files. But I just realized that the https://packagecloud.io/varnishcache/varnish5 does not contain them, sorry.
Have you used packages from packagecloud?

@joshuaspence
Copy link

@joshuaspence joshuaspence commented Mar 26, 2017

Yep, PackageCloud.

@nigoroll
Copy link
Member

@nigoroll nigoroll commented Mar 27, 2017

@joshuaspence could please tell me exactly which package you are using?

@nigoroll
Copy link
Member

@nigoroll nigoroll commented Mar 27, 2017

inspecting the binaries from https://packagecloud.io/varnishcache/varnish5/packages/el/7/varnish-5.1.1-1.el7.x86_64.rpm/download with gdb and assuming that the first function called is pool_thread, I get the following estimate of the functions we on the stack (function offset in decimal as returned by gdb disassemble).
This doesn't seem to make any sense, so either I was looking at the wrong binaries, or made a stupid mistake, or the pointers from the backtrace were garbled.

Backtrace:
  0x438b37: /usr/sbin/varnishd() [pan_ic+2087]
  0x463d1a: /usr/sbin/varnishd() [MAC_Arg+298 VAS_Fail]
  0x7f08daeab390: /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390) [0x7f08daeab390]
  0x4197cb: /usr/sbin/varnishd() [VBP_Remove ?? invalid]
  0x419c32: /usr/sbin/varnishd() [tcp_handle+580 ?? invalid]
  0x451b69: /usr/sbin/varnishd() [WRK_Thread+1241]
  0x451feb: /usr/sbin/varnishd() [pool_thread+43]
  0x7f08daea16ba: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f08daea16ba]
  0x7f08dabd782d: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f08dabd782d]
@Dridi
Copy link
Member Author

@Dridi Dridi commented Mar 27, 2017

@nigoroll, see my latest comment in #2269, I don't think installing the debuginfo sub-package will help the back-trace. I'm busy and can't work on the packaging but suggested a fix for future RPM packages.

This however, looks like a Debian system.

@joshuaspence could you install de38712 from source? This way if the segfault occurs again the back-trace should look similar, but with symbols.

@nigoroll
Copy link
Member

@nigoroll nigoroll commented Mar 27, 2017

OK, for https://packagecloud.io/varnishcache/varnish5/packages/debian/jessie/varnish_5.1.1-1_amd64.deb/download things seem to match with no offset for pool_thread

(gdb) set debug-file-directory ./usr/lib/debug
(gdb) file usr/sbin/varnishd
Reading symbols from usr/sbin/varnishd...Reading symbols from /tmp/2278/deb/usr/lib/debug/.build-id/e7/668b427ee5ed44f0fe85335d7c604b36f82040.debug...done.
done.
(gdb) info symbol 0x451feb
pool_thread + 43 in section .text of /tmp/2278/deb/usr/sbin/varnishd
(gdb) info symbol 0x451b69
WRK_Thread + 1241 in section .text of /tmp/2278/deb/usr/sbin/varnishd
...

so the annotated stack trace is then:

Panic at: Fri, 24 Mar 2017 09:39:28 GMT
Assert error in child_sigsegv_handler(), mgt/mgt_child.c line 271:
  Condition(Segmentation fault by instruction at 0x74) not true.
version = varnish-5.1.1 revision de38712, vrt api = 6.0
ident = Linux,4.4.0-62-generic,x86_64,-junix,-smalloc,-smalloc,-hcritbit,epoll
now = 211663.478702 (mono), 1490348368.506800 (real)
Backtrace:
  0x438b37: /usr/sbin/varnishd() [pan_ic + 375]
  0x463d1a: /usr/sbin/varnishd() [?? no symbol]
  0x7f08daeab390: /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390) [0x7f08daeab390]
  0x4197cb: /usr/sbin/varnishd() [vbp_poke + 187 ]
  0x419c32: /usr/sbin/varnishd() [vbp_task + 98]
  0x451b69: /usr/sbin/varnishd() [WRK_Thread + 1241 ]
  0x451feb: /usr/sbin/varnishd() [pool_thread + 43]
  0x7f08daea16ba: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f08daea16ba]
  0x7f08dabd782d: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f08dabd782d]
thread = (cache-worker)
thr.req = (nil) {
},
thr.busyobj = (nil) {
},

vbp_poke + 187 is:

   0x00000000004197b4 <+164>:	callq  0x411840 <round@plt>
   0x00000000004197b9 <+169>:	cvttsd2si %xmm0,%r12d
   0x00000000004197be <+174>:	test   %r12d,%r12d
   0x00000000004197c1 <+177>:	jle    0x419998 <vbp_poke+648>
   0x00000000004197c7 <+183>:	mov    0x28(%rbp),%rax
   0x00000000004197cb <+187>:	mov    0x74(%rax),%eax    <<<===
   0x00000000004197ce <+190>:	cmp    $0x2,%eax
   0x00000000004197d1 <+193>:	ja     0x419b0b <vbp_poke+1019>
   0x00000000004197d7 <+199>:	cmp    $0x1,%eax
   0x00000000004197da <+202>:	je     0x4199a0 <vbp_poke+656>

which matches the segmentation fault if %rax == 0

so I think we're here:

        /* Send the PROXY header */
        assert(vt->backend->proxy_header <= 2);
        if (vt->backend->proxy_header == 1) {
                if (vbp_write_proxy_v1(vt, &s) != 0)
                        return;

so it looks like we're running into vt->backend == NULL
To confirm:

(gdb) print /x offsetof(struct vbp_target, backend)
$2 = 0x28
(gdb) print /x offsetof(struct backend, proxy_header)
$3 = 0x74

-> it all works out

Can't continue with this now, anyone feel free to take it, otherwise I might get back later
@joshuaspence no response needed any more

@nigoroll nigoroll changed the title Assert error in child_sigsegv_handler(), mgt/mgt_child.c line 271: Condition(Segmentation fault by instruction at 0x74) not true. NULL pointer deref of vt->backend in vbp_poke() Mar 27, 2017
@nigoroll
Copy link
Member

@nigoroll nigoroll commented Mar 27, 2017

credit for offsetof macro
macro define offsetof(t, f) &((t *) 0)->f)
http://stackoverflow.com/questions/39607565/how-to-define-offsetof-macro-in-gdb

@Dridi
Copy link
Member Author

@Dridi Dridi commented Mar 27, 2017

@nigoroll it looks like I made the assumption that vbp_poke couldn't overlap an update of the probe (eg. going cold). I don't see how to write a test case on this one, I suppose acquiring the lock to read proxy_header once but bailing out if vt->backend is NULL would be the way to go.

Thoughts? I can write a patch this evening.

@joshuaspence
Copy link

@joshuaspence joshuaspence commented Mar 27, 2017

Thanks for the great work here :)

@Dridi
Copy link
Member Author

@Dridi Dridi commented Mar 28, 2017

So, yesterday evening wasn't quiet enough to have a look so I had a quick one now. This may happen when the backend is deleted, not going cold. I will send a patch today, I expect (again) quiet time ahead.

@hermunn
Copy link
Member

@hermunn hermunn commented May 16, 2017

Proxy support for backend probes is not in 4.1, so no backport is necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.