New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VCL->conf goes away #1933

Closed
litux opened this Issue Apr 29, 2016 · 15 comments

Comments

Projects
None yet
6 participants
@litux

litux commented Apr 29, 2016

Used varnish 4.1.1 Using 4.1.2-1 did the same

Varnish config passed the test. Varnish started, but after some short time varnish stopped. These were the log messages

"kernel: varnishd[8485]: segfault at 209390 ip 00000000004406af sp 00007f02e753d670 error 4 in varnishd[400000+93000]"

"varnishd[8432]: Child (8434) Last panic at: Fri, 29 Apr 2016 07:42:59 GMT
"Assert error in child_sigsegv_handler(), mgt/mgt_child.c line 282:
  Condition(Segmentation fault by instruction at 0x209500) not true.
errno = 11 (Resource temporarily unavailable)
thread = (cache-worker)
version = varnish-4.1.1 revision 66bb824
ident = Linux,2.6.32-573.22.1.el6.x86_64,x86_64,-junix,-smalloc,-smalloc,-hcritbit,epoll
Backtrace:
  0x4328b3: varnishd() [0x4328b3]
  0x451f09: varnishd() [0x451f09]
  0x36f780f7e0: libpthread.so.0() [0x36f780f7e0]
  0x43c729: varnishd(VCL_DefaultDirector+0x19) [0x43c729]
  0x436fce: varnishd(CNT_Request+0x74e) [0x436fce]
  0x44dc5b: varnishd(HTTP1_Session+0x11b) [0x44dc5b]
  0x439dbd: varnishd(SES_Proto_Req+0x5d) [0x439dbd]
  0x448a1d: varnishd() [0x448a1d]
  0x448e4b: varnishd() [0x448e4b]
  0x36f7807aa1: libpthread.so.0() [0x36f7807aa1]
req = 0x7f02dcca6020 {
  vxid = 98329, step = R_STP_RECV,
  req_body = R_BODY_NONE,
  restarts = 0, esi_level = 0,
  sp = 0x7f02dc80f220 {
    fd = 19, vxid = 98328,
    client = 192.168.1.19 13385,
    step = S_STP_H1PROC,
  },
  worker = 0x7f02e753ebc0 {
    stack = {0x7f02e753f000 -> 0x7f02e7533000},
    ws = 0x7f02e753edb8 {
      id = \"wrk\",
      {s,f,r,e} = {0x7f02e753e380,0x7f02e753e380,(nil),+2040},
    },
    VCL::method = DELIVER,
    VCL::return = deliver,
    VCL::methods = {},
  },
  ws = 0x7f02dcca6200 {
    id = \"req\",
    {s,f,r,e} = {0x7f02dcca8000,+480,(nil),+57336},
  },
  http_conn = 0x7f02dcca6128 {
    fd = 19,
    doclose = NULL,
    ws = 0x7f02dcca6200,
    {rxbuf_b, rxbuf_e} = {0x7f02dcca8000, 0x7f02dcca81aa},
    {pipeline_b, pipeline_e} = {(nil), (nil)},
    content_length = -1,
    body_status = none,
    first_byte_timeout = 0.000000,
    between_bytes_timeout = 0.000000,
  },
  http[req] = 0x7f02dcca6298 {
    ws[req] = 0x7f02dcca6200,
    hdr
Apr 29 09:42:59 ssd varnishd[8432]: Child (9448) Pushing vcls failed:
Could not load compiled VCL.
#011dlopen(vcl_boot/vgc.so) = vcl_boot/vgc.so: cannot open shared object file: No such file or directory"

Using Centos 6.7 installed varnish, varnish-libs, varnish-libs-devel from yum repository repo.varnish-cache.org

To keep using varnish without problems downgraded to 4.0.3-1 and the problem has been solved

@nigoroll

This comment has been minimized.

Show comment
Hide comment
@nigoroll

nigoroll Apr 29, 2016

Contributor

may be a dup of #1920 and #1888
@litux are you using the PROXY protocol?

Contributor

nigoroll commented Apr 29, 2016

may be a dup of #1920 and #1888
@litux are you using the PROXY protocol?

@litux

This comment has been minimized.

Show comment
Hide comment
@litux

litux Apr 29, 2016

There's a difference. The varnish server works for some time (20-30 seconds) and then dies with these messages.

I've not used the PROXY protocol in varnish. Varnish sends the traffic to a HAProxy server which load balances traffic to nginx

litux commented Apr 29, 2016

There's a difference. The varnish server works for some time (20-30 seconds) and then dies with these messages.

I've not used the PROXY protocol in varnish. Varnish sends the traffic to a HAProxy server which load balances traffic to nginx

@fgsch

This comment has been minimized.

Show comment
Hide comment
@fgsch

fgsch May 2, 2016

Member

Apr 29 09:42:59 ssd varnishd[8432]: Child (9448) Pushing vcls failed:
Could not load compiled VCL.
#011dlopen(vcl_boot/vgc.so) = vcl_boot/vgc.so: cannot open shared object file: No such file or directory"

This looks like #1802 but that was fixed in 4.1.1.

Were you reloading a VCL when this happened and could you reproduce it?
Anything special in your VCL?

Member

fgsch commented May 2, 2016

Apr 29 09:42:59 ssd varnishd[8432]: Child (9448) Pushing vcls failed:
Could not load compiled VCL.
#011dlopen(vcl_boot/vgc.so) = vcl_boot/vgc.so: cannot open shared object file: No such file or directory"

This looks like #1802 but that was fixed in 4.1.1.

Were you reloading a VCL when this happened and could you reproduce it?
Anything special in your VCL?

@litux

This comment has been minimized.

Show comment
Hide comment
@litux

litux May 3, 2016

When this happened I was visiting webpages to check working of varnish 4.1.1 + .

After some time (20-30sec) this happens and it happens everytime. I have to "kill" all varnish processes (just 1 left), start varnish again and it happens again after some time. So it's reproducible.

The vcl configuration from the unchanged varnish configuration file works in the 4.0.3-1 version.

litux commented May 3, 2016

When this happened I was visiting webpages to check working of varnish 4.1.1 + .

After some time (20-30sec) this happens and it happens everytime. I have to "kill" all varnish processes (just 1 left), start varnish again and it happens again after some time. So it's reproducible.

The vcl configuration from the unchanged varnish configuration file works in the 4.0.3-1 version.

@fgsch

This comment has been minimized.

Show comment
Hide comment
@fgsch

fgsch May 3, 2016

Member

Do you have any jobs reloading the VCL?
I doubt it's related to the VCL but can you share it?

Member

fgsch commented May 3, 2016

Do you have any jobs reloading the VCL?
I doubt it's related to the VCL but can you share it?

@litux

This comment has been minimized.

Show comment
Hide comment
@litux

litux May 3, 2016

There is no job reloading the VCL

litux commented May 3, 2016

There is no job reloading the VCL

bsdphk added a commit that referenced this issue May 4, 2016

@bsdphk

This comment has been minimized.

Show comment
Hide comment
@bsdphk

bsdphk May 4, 2016

Contributor

Can you please use "varnishadm panic.show" to try to get a complete panic message ?

Also, if you could test this on -trunk it whould be a big help.

Contributor

bsdphk commented May 4, 2016

Can you please use "varnishadm panic.show" to try to get a complete panic message ?

Also, if you could test this on -trunk it whould be a big help.

@bsdphk

This comment has been minimized.

Show comment
Hide comment
@bsdphk

bsdphk May 4, 2016

Contributor

Are you using any vmod directors? If so, which?

Contributor

bsdphk commented May 4, 2016

Are you using any vmod directors? If so, which?

@litux

This comment has been minimized.

Show comment
Hide comment
@litux

litux May 6, 2016

Using no mod directors.

I've upgraded to version 4.1.2 to get the panic message (then downgraded again) There's also a kernel message when panic appears.

varnishd[30537]: segfault at 2094b0 ip 0000000000440b4f sp 00007fef95119670 error 4 in varnishd[400000+93000]

Complete panic message I've xxxxxx'd the hostnames and / or IP addresses

`
Last panic at: Fri, 06 May 2016 13:23:20 GMT
"Assert error in child_sigsegv_handler(), mgt/mgt_child.c line 282:
  Condition(Segmentation fault by instruction at 0x209620) not true.
thread = (cache-worker)
version = varnish-4.1.2 revision 0d7404e
ident = Linux,2.6.32-573.26.1.el6.x86_64,x86_64,-junix,-smalloc,-smalloc,-hcritbit,epoll
Backtrace:
  0x432d03: varnishd() [0x432d03]
  0x452659: varnishd() [0x452659]
  0x36f780f7e0: libpthread.so.0() [0x36f780f7e0]
  0x43cbc9: varnishd(VCL_DefaultDirector+0x19) [0x43cbc9]
  0x43743e: varnishd(CNT_Request+0x74e) [0x43743e]
  0x44e27b: varnishd(HTTP1_Session+0x11b) [0x44e27b]
  0x43a22d: varnishd(SES_Proto_Req+0x5d) [0x43a22d]
  0x44903d: varnishd() [0x44903d]
  0x44946b: varnishd() [0x44946b]
  0x36f7807aa1: libpthread.so.0() [0x36f7807aa1]
req = 0x7fef8a896020 {
  vxid = 2, step = R_STP_RECV,
  req_body = R_BODY_NONE,
  restarts = 0, esi_level = 0,
  sp = 0x7fef8a40f420 {
    fd = 17, vxid = 1,
    client = xxxxxx 17906,
    step = S_STP_H1PROC,
  },
  worker = 0x7fef9511abc0 {
    stack = {0x7fef9511b000 -> 0x7fef9510f000},
    ws = 0x7fef9511adb8 {
      id = \"wrk\",
      {s,f,r,e} = {0x7fef9511a380,0x7fef9511a380,(nil),+2040},
    },
    VCL::method = none,
    ws = 0x7fef8a896200 {
      id = \"req\",
      {s,f,r,e} = {0x7fef8a898000,+376,(nil),+57336},
    },
    http_conn = 0x7fef8a896128 {
      fd = 17,
      doclose = NULL,
      ws = 0x7fef8a896200,
      {rxbuf_b, rxbuf_e} = {0x7fef8a898000, 0x7fef8a898145},
      {pipeline_b, pipeline_e} = {(nil), (nil)},
      content_length = -1,
      body_status = none,
      first_byte_timeout = 0.000000,
      between_bytes_timeout = 0.000000,
    },
    http[req] = 0x7fef8a896298 {
      ws[req] = 0x7fef8a896200,
      hdrs {
        \"GET\",
        \"/2016/05/06/\",
        \"HTTP/1.1\",
        \"Host: xxxxxx\",
        \"Accept-Encoding: gzip\",
        \"Accept: */*\",
        \"User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)\",
        \"From: magdolna.gerendai@meltwater.com\",
        \"X-Forwarded-Host: xxxxxx\",
        \"X-Forwarded-Server: xxxxxxx\",
        \"Connection: Keep-Alive\",
        \"X-Forwarded-For: xxxxxxx, xxxxxxxx\",
      },
    },
    vcl = {
      temp = warm
      srcname = {
"
`

litux commented May 6, 2016

Using no mod directors.

I've upgraded to version 4.1.2 to get the panic message (then downgraded again) There's also a kernel message when panic appears.

varnishd[30537]: segfault at 2094b0 ip 0000000000440b4f sp 00007fef95119670 error 4 in varnishd[400000+93000]

Complete panic message I've xxxxxx'd the hostnames and / or IP addresses

`
Last panic at: Fri, 06 May 2016 13:23:20 GMT
"Assert error in child_sigsegv_handler(), mgt/mgt_child.c line 282:
  Condition(Segmentation fault by instruction at 0x209620) not true.
thread = (cache-worker)
version = varnish-4.1.2 revision 0d7404e
ident = Linux,2.6.32-573.26.1.el6.x86_64,x86_64,-junix,-smalloc,-smalloc,-hcritbit,epoll
Backtrace:
  0x432d03: varnishd() [0x432d03]
  0x452659: varnishd() [0x452659]
  0x36f780f7e0: libpthread.so.0() [0x36f780f7e0]
  0x43cbc9: varnishd(VCL_DefaultDirector+0x19) [0x43cbc9]
  0x43743e: varnishd(CNT_Request+0x74e) [0x43743e]
  0x44e27b: varnishd(HTTP1_Session+0x11b) [0x44e27b]
  0x43a22d: varnishd(SES_Proto_Req+0x5d) [0x43a22d]
  0x44903d: varnishd() [0x44903d]
  0x44946b: varnishd() [0x44946b]
  0x36f7807aa1: libpthread.so.0() [0x36f7807aa1]
req = 0x7fef8a896020 {
  vxid = 2, step = R_STP_RECV,
  req_body = R_BODY_NONE,
  restarts = 0, esi_level = 0,
  sp = 0x7fef8a40f420 {
    fd = 17, vxid = 1,
    client = xxxxxx 17906,
    step = S_STP_H1PROC,
  },
  worker = 0x7fef9511abc0 {
    stack = {0x7fef9511b000 -> 0x7fef9510f000},
    ws = 0x7fef9511adb8 {
      id = \"wrk\",
      {s,f,r,e} = {0x7fef9511a380,0x7fef9511a380,(nil),+2040},
    },
    VCL::method = none,
    ws = 0x7fef8a896200 {
      id = \"req\",
      {s,f,r,e} = {0x7fef8a898000,+376,(nil),+57336},
    },
    http_conn = 0x7fef8a896128 {
      fd = 17,
      doclose = NULL,
      ws = 0x7fef8a896200,
      {rxbuf_b, rxbuf_e} = {0x7fef8a898000, 0x7fef8a898145},
      {pipeline_b, pipeline_e} = {(nil), (nil)},
      content_length = -1,
      body_status = none,
      first_byte_timeout = 0.000000,
      between_bytes_timeout = 0.000000,
    },
    http[req] = 0x7fef8a896298 {
      ws[req] = 0x7fef8a896200,
      hdrs {
        \"GET\",
        \"/2016/05/06/\",
        \"HTTP/1.1\",
        \"Host: xxxxxx\",
        \"Accept-Encoding: gzip\",
        \"Accept: */*\",
        \"User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)\",
        \"From: magdolna.gerendai@meltwater.com\",
        \"X-Forwarded-Host: xxxxxx\",
        \"X-Forwarded-Server: xxxxxxx\",
        \"Connection: Keep-Alive\",
        \"X-Forwarded-For: xxxxxxx, xxxxxxxx\",
      },
    },
    vcl = {
      temp = warm
      srcname = {
"
`
@bsdphk

This comment has been minimized.

Show comment
Hide comment
@bsdphk

bsdphk May 9, 2016

Contributor

I have merged four other issues into this one (See above), there is some information to be gleaned in those other tickets, but I have not been able to distill any wisdom apart from vcl->conf going NULL.

Contributor

bsdphk commented May 9, 2016

I have merged four other issues into this one (See above), there is some information to be gleaned in those other tickets, but I have not been able to distill any wisdom apart from vcl->conf going NULL.

@nigoroll

This comment has been minimized.

Show comment
Hide comment
@nigoroll

nigoroll May 27, 2016

Contributor

Anyone running into this, please try to get hold of a core dump

Contributor

nigoroll commented May 27, 2016

Anyone running into this, please try to get hold of a core dump

@nigoroll nigoroll closed this May 27, 2016

@nigoroll nigoroll reopened this May 27, 2016

@nigoroll

This comment has been minimized.

Show comment
Hide comment
@nigoroll

nigoroll May 27, 2016

Contributor

close was accidental, sorry

Contributor

nigoroll commented May 27, 2016

close was accidental, sorry

@lkarsten lkarsten removed the priority label May 30, 2016

@lkarsten

This comment has been minimized.

Show comment
Hide comment
@lkarsten

lkarsten May 30, 2016

Contributor

the priority label has special magic powers, please don't use it.

Contributor

lkarsten commented May 30, 2016

the priority label has special magic powers, please don't use it.

@lkarsten

This comment has been minimized.

Show comment
Hide comment
@lkarsten

lkarsten Jun 14, 2016

Contributor

Removing backport label so it doesn't clutter up the todo list for 4.1.3. Will be added again automatically when the issue is resolved.

Contributor

lkarsten commented Jun 14, 2016

Removing backport label so it doesn't clutter up the todo list for 4.1.3. Will be added again automatically when the issue is resolved.

lkarsten pushed a commit that referenced this issue Jun 14, 2016

@bsdphk bsdphk closed this in 68ac5a6 Jun 18, 2016

lkarsten pushed a commit that referenced this issue Jun 27, 2016

Don't trust dlopen refcounting.
Some time ago I decided that it was more convenient if it were
actually possible to locate the compiled VCL shared library based
on the loaded VCL name, and removed the part of the subdirectory
name which made it unique.

Bad idea.

Add a comment to make sure I don't get that Idea again.

Fixes #1933

Merge notes:
* Removed label check, as that doesn't exist in 4.1
* Rewrote commit title.

Conflicts:
	bin/varnishd/mgt/mgt_vcl.c

lkarsten pushed a commit that referenced this issue Jun 27, 2016

Lasse Karstensen
Add 1933 to changelog.
Also fix incorrect-but-working links to issues. Cut&paste.

Ref: #1933
@lkarsten

This comment has been minimized.

Show comment
Hide comment
@lkarsten

lkarsten Jun 27, 2016

Contributor

Applied to 4.1 now. Not reviewed for 4.0.

Contributor

lkarsten commented Jun 27, 2016

Applied to 4.1 now. Not reviewed for 4.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment