VCL->conf goes away #1933

Closed
litux opened this Issue Apr 29, 2016 · 15 comments

Projects

None yet

6 participants

@litux
litux commented Apr 29, 2016 edited

Used varnish 4.1.1 Using 4.1.2-1 did the same

Varnish config passed the test. Varnish started, but after some short time varnish stopped. These were the log messages

"kernel: varnishd[8485]: segfault at 209390 ip 00000000004406af sp 00007f02e753d670 error 4 in varnishd[400000+93000]"

"varnishd[8432]: Child (8434) Last panic at: Fri, 29 Apr 2016 07:42:59 GMT
"Assert error in child_sigsegv_handler(), mgt/mgt_child.c line 282:
  Condition(Segmentation fault by instruction at 0x209500) not true.
errno = 11 (Resource temporarily unavailable)
thread = (cache-worker)
version = varnish-4.1.1 revision 66bb824
ident = Linux,2.6.32-573.22.1.el6.x86_64,x86_64,-junix,-smalloc,-smalloc,-hcritbit,epoll
Backtrace:
  0x4328b3: varnishd() [0x4328b3]
  0x451f09: varnishd() [0x451f09]
  0x36f780f7e0: libpthread.so.0() [0x36f780f7e0]
  0x43c729: varnishd(VCL_DefaultDirector+0x19) [0x43c729]
  0x436fce: varnishd(CNT_Request+0x74e) [0x436fce]
  0x44dc5b: varnishd(HTTP1_Session+0x11b) [0x44dc5b]
  0x439dbd: varnishd(SES_Proto_Req+0x5d) [0x439dbd]
  0x448a1d: varnishd() [0x448a1d]
  0x448e4b: varnishd() [0x448e4b]
  0x36f7807aa1: libpthread.so.0() [0x36f7807aa1]
req = 0x7f02dcca6020 {
  vxid = 98329, step = R_STP_RECV,
  req_body = R_BODY_NONE,
  restarts = 0, esi_level = 0,
  sp = 0x7f02dc80f220 {
    fd = 19, vxid = 98328,
    client = 192.168.1.19 13385,
    step = S_STP_H1PROC,
  },
  worker = 0x7f02e753ebc0 {
    stack = {0x7f02e753f000 -> 0x7f02e7533000},
    ws = 0x7f02e753edb8 {
      id = \"wrk\",
      {s,f,r,e} = {0x7f02e753e380,0x7f02e753e380,(nil),+2040},
    },
    VCL::method = DELIVER,
    VCL::return = deliver,
    VCL::methods = {},
  },
  ws = 0x7f02dcca6200 {
    id = \"req\",
    {s,f,r,e} = {0x7f02dcca8000,+480,(nil),+57336},
  },
  http_conn = 0x7f02dcca6128 {
    fd = 19,
    doclose = NULL,
    ws = 0x7f02dcca6200,
    {rxbuf_b, rxbuf_e} = {0x7f02dcca8000, 0x7f02dcca81aa},
    {pipeline_b, pipeline_e} = {(nil), (nil)},
    content_length = -1,
    body_status = none,
    first_byte_timeout = 0.000000,
    between_bytes_timeout = 0.000000,
  },
  http[req] = 0x7f02dcca6298 {
    ws[req] = 0x7f02dcca6200,
    hdr
Apr 29 09:42:59 ssd varnishd[8432]: Child (9448) Pushing vcls failed:
Could not load compiled VCL.
#011dlopen(vcl_boot/vgc.so) = vcl_boot/vgc.so: cannot open shared object file: No such file or directory"

Using Centos 6.7 installed varnish, varnish-libs, varnish-libs-devel from yum repository repo.varnish-cache.org

To keep using varnish without problems downgraded to 4.0.3-1 and the problem has been solved

@nigoroll
Contributor

may be a dup of #1920 and #1888
@litux are you using the PROXY protocol?

@litux
litux commented Apr 29, 2016

There's a difference. The varnish server works for some time (20-30 seconds) and then dies with these messages.

I've not used the PROXY protocol in varnish. Varnish sends the traffic to a HAProxy server which load balances traffic to nginx

@fgsch
Member
fgsch commented May 2, 2016 edited

Apr 29 09:42:59 ssd varnishd[8432]: Child (9448) Pushing vcls failed:
Could not load compiled VCL.
#011dlopen(vcl_boot/vgc.so) = vcl_boot/vgc.so: cannot open shared object file: No such file or directory"

This looks like #1802 but that was fixed in 4.1.1.

Were you reloading a VCL when this happened and could you reproduce it?
Anything special in your VCL?

@litux
litux commented May 3, 2016

When this happened I was visiting webpages to check working of varnish 4.1.1 + .

After some time (20-30sec) this happens and it happens everytime. I have to "kill" all varnish processes (just 1 left), start varnish again and it happens again after some time. So it's reproducible.

The vcl configuration from the unchanged varnish configuration file works in the 4.0.3-1 version.

@fgsch
Member
fgsch commented May 3, 2016

Do you have any jobs reloading the VCL?
I doubt it's related to the VCL but can you share it?

@litux
litux commented May 3, 2016

There is no job reloading the VCL

@bsdphk bsdphk added a commit that referenced this issue May 4, 2016
@bsdphk bsdphk Add a bunch of protective asserts on vcl->conf, to help flush out
a family of tickets related to it (#1933 #1920 #1898 (?) #1888 and #1865 (?))
71be412
@bsdphk
Contributor
bsdphk commented May 4, 2016

Can you please use "varnishadm panic.show" to try to get a complete panic message ?

Also, if you could test this on -trunk it whould be a big help.

@bsdphk
Contributor
bsdphk commented May 4, 2016

Are you using any vmod directors? If so, which?

@litux
litux commented May 6, 2016 edited

Using no mod directors.

I've upgraded to version 4.1.2 to get the panic message (then downgraded again) There's also a kernel message when panic appears.

varnishd[30537]: segfault at 2094b0 ip 0000000000440b4f sp 00007fef95119670 error 4 in varnishd[400000+93000]

Complete panic message I've xxxxxx'd the hostnames and / or IP addresses

`
Last panic at: Fri, 06 May 2016 13:23:20 GMT
"Assert error in child_sigsegv_handler(), mgt/mgt_child.c line 282:
  Condition(Segmentation fault by instruction at 0x209620) not true.
thread = (cache-worker)
version = varnish-4.1.2 revision 0d7404e
ident = Linux,2.6.32-573.26.1.el6.x86_64,x86_64,-junix,-smalloc,-smalloc,-hcritbit,epoll
Backtrace:
  0x432d03: varnishd() [0x432d03]
  0x452659: varnishd() [0x452659]
  0x36f780f7e0: libpthread.so.0() [0x36f780f7e0]
  0x43cbc9: varnishd(VCL_DefaultDirector+0x19) [0x43cbc9]
  0x43743e: varnishd(CNT_Request+0x74e) [0x43743e]
  0x44e27b: varnishd(HTTP1_Session+0x11b) [0x44e27b]
  0x43a22d: varnishd(SES_Proto_Req+0x5d) [0x43a22d]
  0x44903d: varnishd() [0x44903d]
  0x44946b: varnishd() [0x44946b]
  0x36f7807aa1: libpthread.so.0() [0x36f7807aa1]
req = 0x7fef8a896020 {
  vxid = 2, step = R_STP_RECV,
  req_body = R_BODY_NONE,
  restarts = 0, esi_level = 0,
  sp = 0x7fef8a40f420 {
    fd = 17, vxid = 1,
    client = xxxxxx 17906,
    step = S_STP_H1PROC,
  },
  worker = 0x7fef9511abc0 {
    stack = {0x7fef9511b000 -> 0x7fef9510f000},
    ws = 0x7fef9511adb8 {
      id = \"wrk\",
      {s,f,r,e} = {0x7fef9511a380,0x7fef9511a380,(nil),+2040},
    },
    VCL::method = none,
    ws = 0x7fef8a896200 {
      id = \"req\",
      {s,f,r,e} = {0x7fef8a898000,+376,(nil),+57336},
    },
    http_conn = 0x7fef8a896128 {
      fd = 17,
      doclose = NULL,
      ws = 0x7fef8a896200,
      {rxbuf_b, rxbuf_e} = {0x7fef8a898000, 0x7fef8a898145},
      {pipeline_b, pipeline_e} = {(nil), (nil)},
      content_length = -1,
      body_status = none,
      first_byte_timeout = 0.000000,
      between_bytes_timeout = 0.000000,
    },
    http[req] = 0x7fef8a896298 {
      ws[req] = 0x7fef8a896200,
      hdrs {
        \"GET\",
        \"/2016/05/06/\",
        \"HTTP/1.1\",
        \"Host: xxxxxx\",
        \"Accept-Encoding: gzip\",
        \"Accept: */*\",
        \"User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)\",
        \"From: magdolna.gerendai@meltwater.com\",
        \"X-Forwarded-Host: xxxxxx\",
        \"X-Forwarded-Server: xxxxxxx\",
        \"Connection: Keep-Alive\",
        \"X-Forwarded-For: xxxxxxx, xxxxxxxx\",
      },
    },
    vcl = {
      temp = warm
      srcname = {
"
`
@bsdphk bsdphk changed the title from segfault error 4 in varnish 4.1.1+ to VCL->conf goes away May 9, 2016
@bsdphk
Contributor
bsdphk commented May 9, 2016

I have merged four other issues into this one (See above), there is some information to be gleaned in those other tickets, but I have not been able to distill any wisdom apart from vcl->conf going NULL.

@nigoroll
Contributor

Anyone running into this, please try to get hold of a core dump

@nigoroll nigoroll closed this May 27, 2016
@nigoroll nigoroll reopened this May 27, 2016
@nigoroll
Contributor

close was accidental, sorry

@lkarsten lkarsten removed the priority label May 30, 2016
@lkarsten
Contributor

the priority label has special magic powers, please don't use it.

@lkarsten
Contributor
lkarsten commented Jun 14, 2016 edited

Removing backport label so it doesn't clutter up the todo list for 4.1.3. Will be added again automatically when the issue is resolved.

@lkarsten lkarsten pushed a commit that referenced this issue Jun 14, 2016
@bsdphk bsdphk + Lasse Karstensen Add a bunch of protective asserts on vcl->conf, to help flush out
a family of tickets related to it (#1933 #1920 #1898 (?) #1888 and #1865 (?))
2e624aa
@bsdphk bsdphk added a commit that closed this issue Jun 18, 2016
@bsdphk bsdphk Some time ago I decided that it was more convenient if it were
actually possible to locate the compiled VCL shared library based
on the loaded VCL name, and removed the part of the subdirectory
name which made it unique.

Bad idea.

Add a comment to make sure I don't get that Idea again.

Fixes #1933
68ac5a6
@bsdphk bsdphk closed this in 68ac5a6 Jun 18, 2016
@lkarsten lkarsten pushed a commit that referenced this issue Jun 27, 2016
@bsdphk bsdphk + Lasse Karstensen Don't trust dlopen refcounting.
Some time ago I decided that it was more convenient if it were
actually possible to locate the compiled VCL shared library based
on the loaded VCL name, and removed the part of the subdirectory
name which made it unique.

Bad idea.

Add a comment to make sure I don't get that Idea again.

Fixes #1933

Merge notes:
* Removed label check, as that doesn't exist in 4.1
* Rewrote commit title.

Conflicts:
	bin/varnishd/mgt/mgt_vcl.c
3692c22
@lkarsten lkarsten pushed a commit that referenced this issue Jun 27, 2016
Lasse Karstensen Add 1933 to changelog.
Also fix incorrect-but-working links to issues. Cut&paste.

Ref: #1933
201051e
@lkarsten
Contributor

Applied to 4.1 now. Not reviewed for 4.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment