4.1.3: Varnish crashes with "Assert error in CNT_Request(), cache/cache_req_fsm.c line 820" #2106

Closed
Lutacon opened this Issue Oct 3, 2016 · 33 comments

Projects

None yet
@Lutacon
Lutacon commented Oct 3, 2016 edited

Hi there,

We have this problem each day. Varnish crashes and we lost everything we have in the cache. We thought we could fix this problem upgrading to 4.1.3 from 4.1.2 but the problem persists.

Your Environment

  • Version used: varnish-4.1.3 revision 5e3b6d2
  • Operating System and version: Debian Jessie 8.4, kernel 3.14.32-xxxx-grs-ipv6-64
"Assert error in CNT_Request(), cache/cache_req_fsm.c line 820:
  Condition(req->req_step == R_STP_LOOKUP || req->req_step == R_STP_RECV) not true.
errno = 2 (No such file or directory)
thread = (cache-worker)
version = varnish-4.1.3 revision 5e3b6d2
ident = Linux,3.14.32-xxxx-grs-ipv6-64,x86_64,-junix,-smalloc,-smalloc,-hcritbit,epoll
Backtrace:
  0x434025: varnishd() [0x434025]
  0x438779: varnishd(CNT_Request+0x1bd9) [0x438779]
  0x44fc6f: varnishd(HTTP1_Session+0xef) [0x44fc6f]
  0x43a5ad: varnishd(SES_Proto_Req+0x5d) [0x43a5ad]
  0x44a522: varnishd() [0x44a522]
  0x44a98b: varnishd() [0x44a98b]
  0x74c77a6fb0a4: libpthread.so.0(+0x80a4) [0x74c77a6fb0a4]
  0x74c77a43087d: libc.so.6(clone+0x6d) [0x74c77a43087d]
req = 0x74c6e4c06020 {
  vxid = 0, step = R_STP_RESTART,
  req_body = R_BODY_INIT,
  restarts = 0, esi_level = 0,
  sp = 0x74c715f8b220 {
    fd = 79, vxid = 26927013,
    client = X.X.X.X 27212,
    t_open = 1475250171.316174,
    t_idle = 1475250171.645251,
    step = S_STP_H1PROC,
  },
  ws = 0x74c6e4c06200 {
    id = \"req\",
    {s,f,r,e} = {0x74c6e4c08000,0x74c6e4c08000,+32768,+57336},
  },
  http_conn = 0x74c6e4c06128 {
    fd = 79,
    doclose = NULL,
    ws = 0x74c6e4c06200,
    {rxbuf_b, rxbuf_e} = {0x74c6e4c08000, 0x74c6e4c08000},
    {pipeline_b, pipeline_e} = {(nil), (nil)},
    content_length = 0,
    body_status = none,
    first_byte_timeout = 0.000000,
    between_bytes_timeout = 0.000000,
  },
  http[req] = 0x74c6e4c06298 {
    ws[] = (nil),
    hdrs {
    },
  },
  flags = {
  },
},

The config in varnish.etc :

START=yes
NFILES=131072
MEMLOCK=82000

DAEMON_OPTS="-a :80 \
             -T localhost:6082 \
             -f /etc/varnish/drupal.vcl \
             -S /etc/varnish/secret \
             -p thread_pool_add_delay=2 \
             -p thread_pools=8 \
             -p thread_pool_min=100 \
             -p thread_pool_max=4000 \
             -s malloc,16G"
´´´
@Lutacon Lutacon changed the title from Varnish crashes with "Assert error in CNT_Request(), cache/cache_req_fsm.c line 820" to 4.1.3: Varnish crashes with "Assert error in CNT_Request(), cache/cache_req_fsm.c line 820" Oct 3, 2016
@Dridi
Member
Dridi commented Oct 3, 2016

This is a known and fixed issue, see #2000.

@Dridi Dridi closed this Oct 3, 2016
@Lutacon
Lutacon commented Oct 3, 2016

@Dridi We have varnish-4.1.3 revision 5e3b6d2 which is superior to beta2 version mentioned in #2000 .

@Dridi
Member
Dridi commented Oct 3, 2016

My bad, reopening.

@Dridi Dridi reopened this Oct 3, 2016
@nigoroll
Contributor
nigoroll commented Oct 3, 2016

This looks like an incorrect restart. Can you share the VCL (at least those bits where you're restarting)? Are you using any inline-C or vmods?

@Lutacon
Lutacon commented Oct 3, 2016 edited

We are have vmod softpurge and variable . The variable vmod is not official is this one :
http://download.gnu.org.ua/release/vmod-variable/vmod-variable-1.0.tar.gz

We are using softpurge and standard purge because we have some cache objects that we want to refresh quickly and other don't.

We have object variants by user type. Sometimes we override the vary header to make global cache objects (Static files).

We are using Drupal 7.50 with Authcache

http://pastebin.com/m36S01yQ --> drupal.vcl
http://pastebin.com/ZgKBcfr3 --> core.vcl

@Lutacon
Lutacon commented Oct 13, 2016

We removed the variable vmod and now the panic is reduced. We have a interval of 7 days for each panic

@fgsch
Member
fgsch commented Oct 13, 2016

So you are seeing the same panic, just not that often?
Can you provide the running VCL please and updated panic?

@nigoroll
Contributor

@Lutacon because the issue you are seeing is so strange, I think we should exclude other sources of error:

  • which varnish binaries are you running? If you built them yourselves, could you check your build? If they are from a different source, can you please name it?
  • you redacted the kernel version as 3.14.32-xxxx-grs-ipv6-64, so this appears to be a custom build. Can you check if the kernel changes could be related? Can you check if this issue appears on a vanilla kernel?
  • Are you using any kind of visualization? If yes, which?
@fgsch fgsch added bug 4.1 labels Oct 13, 2016
@Lutacon
Lutacon commented Oct 24, 2016 edited

First of all. Sorry for the late response we are having a busy month >_<

@fgsch Yes. We are seeing the same panic but with an interval of 3-4 days.

@nigoroll The answers to your questions :

  • Are you asking for the binaries to compile the vmods? Or the binaries for the installation of varnish? The binaries to compile vmod are from the version v4.1.2. The installation of varnish it's from the Debian repository v4.1.3
  • We have a custom kernel from OVH (Hosting company). So yes this is a custom build. We don't think this could be possible right now on our production server.
  • No.

Updated VCL : http://pastebin.com/AcxtSTrQ

Updated panic:

Assert error in CNT_Request(), cache/cache_req_fsm.c line 820:
  Condition(req->req_step == R_STP_LOOKUP || req->req_step == R_STP_RECV) not true.
thread = (cache-worker)
version = varnish-4.1.3 revision 5e3b6d2
ident = Linux,3.14.32-xxxx-grs-ipv6-64,x86_64,-junix,-smalloc,-smalloc,-hcritbit,epoll
Backtrace:
  0x434025: varnishd() [0x434025]
  0x438779: varnishd(CNT_Request+0x1bd9) [0x438779]
  0x44fc6f: varnishd(HTTP1_Session+0xef) [0x44fc6f]
  0x43a5ad: varnishd(SES_Proto_Req+0x5d) [0x43a5ad]
  0x44a522: varnishd() [0x44a522]
  0x44a98b: varnishd() [0x44a98b]
  0x74c77a6fb0a4: libpthread.so.0(+0x80a4) [0x74c77a6fb0a4]
  0x74c77a43087d: libc.so.6(clone+0x6d) [0x74c77a43087d]
req = 0x74c712e9f020 {
  vxid = 0, step = R_STP_RESTART,
  req_body = R_BODY_INIT,
  restarts = 0, esi_level = 0,
  sp = 0x74c739110220 {
    fd = 54, vxid = 40569635,
    client = 37.35.239.195 50468,
    t_open = 1477255774.476773,
    t_idle = 1477255778.367042,
    step = S_STP_H1PROC,
  },
  ws = 0x74c712e9f200 {
    id = \"req\",
    {s,f,r,e} = {0x74c712ea1000,0x74c712ea1000,+32768,+57336},
  },
  http_conn = 0x74c712e9f128 {
    fd = 54,
    doclose = NULL,
    ws = 0x74c712e9f200,
    {rxbuf_b, rxbuf_e} = {0x74c712ea1000, 0x74c712ea1000},
    {pipeline_b, pipeline_e} = {(nil), (nil)},
    content_length = 0,
    body_status = none,
    first_byte_timeout = 0.000000,
    between_bytes_timeout = 0.000000,
  },
  http[req] = 0x74c712e9f298 {
    ws[] = (nil),
    hdrs {
    },
  },
  flags = {
  },
},

"

@nigoroll
Contributor
nigoroll commented Oct 25, 2016 edited

Do I understand correctly that

The binaries to compile vmod are from the version v4.1.2. The installation of varnish it's from the Debian repository v4.1.3

This does not sound good. Please ensure that you (re-)compile your vmods with exactly the same varnish installation you are running and report back

@Lutacon
Lutacon commented Oct 25, 2016 edited

Yes. It's correct.

We updated from 4.1.2 to 4.1.3 and we didn't re-compile vmods after update.

Ok. We are going to recompile vmods with version 4.1.3 and i'll report back

@Lutacon
Lutacon commented Oct 26, 2016

We are going to reinstall varnish and compile vmods.

Anyway we had this panic before upgrading to 4.1.3

@kixelated

Hey guys, we're seeing the same issue a few times a day here at Twitch. I haven't really started to dig into it but we're running 4.1.3. I can try to gather more info if it would be helpful.

@fgsch
Member
fgsch commented Nov 5, 2016

In 4.1.3 RESTART is mapped to 0 so this is not a restart but most likely req not properly (re)initialized.

Do you know or could you capture if these requests are being put in the waiting list?

@Lutacon
Lutacon commented Nov 5, 2016 edited

Hello,

An updated about our situation. We had in our VCL the vmod STD and we were logging some requests. We didn't include in the bug report because we thought it wasn't necessary...

We have removed the STD vmod and we didn't have a crash since 23/10.

We are going to try the same configuration on our development server with STD to try to reproduce it.

@llavaud
llavaud commented Nov 16, 2016

Hello,

Same bug here with exactly same Varnish version on Debian Wheezy.
We use the following vmods compiled from github "varnish-modules":

import std;
import directors;
import saintmode;
import header;
import softpurge;

We dont use "restart" but "retry" in our VCL.

Tell me if you need more information

@Lutacon
Lutacon commented Nov 16, 2016

@llavaud We disabled the std vmod and we didn't have a crash since 23/10. You could try that.

@llavaud
llavaud commented Nov 17, 2016

ok great to know there is a workaround but std vmod is very helpful... :(

@fgsch
Member
fgsch commented Nov 19, 2016

@Lutacon I checked your VCL and I don't see where you are/were using std.
Can you let us know?

@Lutacon
Lutacon commented Nov 19, 2016 edited

@fgsch When we post our vcl we did some cleanup to remove some internal comments and our IPs. We also removed the std logs.

This is where we were using std :

sub authcache_key_cid {
  if (req.http.Cookie ~ "(^|;)\s*S?SESS[a-z0-9]+=") {
    // Use the whole session cookie to differentiate between authenticated
    // users.
    set req.http.X-Authcache-Key-CID = "sess:"+regsuball(req.http.Cookie, "^(.*;\s*)?(S?SESS[a-z0-9]+=[^;]*).*$", "\2");
  }
  else {
    // If authcache key retrieval was enforced for anonymous traffic, the HTTP
    // host is used in order to keep apart anonymous users of different
    // domains.
    set req.http.X-Authcache-Key-CID = "host:"+req.http.host;
  }
    if (req.http.Cookie ~ "(^|;\s*)(aucp14=1)(;|$)"){
        set req.http.X-Authcache-Key-CID = req.http.X-Authcache-Key-CID + "-iva1";
    } else if (req.http.Cookie ~ "(^|;\s*)(aucp14=0)(;|$)"){
        set req.http.X-Authcache-Key-CID = req.http.X-Authcache-Key-CID + "-iva0";
    } else {
        if (req.http.Cookie ~ "(^|;\s*)(aucp14=)(;|$)"){
           //replace
           std.log("maclog has_js antes ["+req.http.Cookie+"]");
           set req.http.Cookie = regsuball(req.http.Cookie, "aucp14", "cancel_aucp14");
        }
        set req.http.X-Authcache-Key-CID = req.http.X-Authcache-Key-CID + "-iva1";
        set req.http.Cookie = req.http.Cookie + ";aucp14=1";
    }

  std.log("maclog req.http.X-Authcache-Key-CID[" + req.http.X-Authcache-Key-CID + "] ");
}
@fgsch
Member
fgsch commented Nov 20, 2016

@Lutacon thanks. So having those 2 lines is enough to cause the crash?
Can you share the varnishadm param.show output?

@fgsch
Member
fgsch commented Nov 20, 2016

@Lutacon also any luck trying to reproduce it?

@Lutacon
Lutacon commented Nov 20, 2016 edited

@fgsch We were not be able to reproduce it on our development server with the same setup but we though it's because we don't have the same traffic.

And since we remove those 2 lines we didn't have a crash.

varnish param.show output:

accept_filter              off [bool] (default)
acceptor_sleep_decay       0.9 (default)
acceptor_sleep_incr        0.000 [seconds] (default)
acceptor_sleep_max         0.050 [seconds] (default)
auto_restart               on [bool] (default)
backend_idle_timeout       60.000 [seconds] (default)
ban_dups                   on [bool] (default)
ban_lurker_age             60.000 [seconds] (default)
ban_lurker_batch           1000 (default)
ban_lurker_sleep           0.010 [seconds] (default)
between_bytes_timeout      60.000 [seconds] (default)
cc_command                 "exec gcc -std=gnu99 -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wall -Werror -Wno-error=unused-result -pthread -fpic -shared -Wl,-x -o %o %s" (default)
cli_buffer                 8k [bytes] (default)
cli_limit                  48k [bytes] (default)
cli_timeout                60.000 [seconds] (default)
clock_skew                 10 [seconds] (default)
connect_timeout            3.500 [seconds] (default)
critbit_cooloff            180.000 [seconds] (default)
debug                      none (default)
default_grace              10.000 [seconds] (default)
default_keep               0.000 [seconds] (default)
default_ttl                120.000 [seconds] (default)
feature                    +esi_disable_xml_check,+esi_ignore_other_elements
fetch_chunksize            16k [bytes] (default)
fetch_maxchunksize         0.25G [bytes] (default)
first_byte_timeout         60.000 [seconds] (default)
gzip_buffer                32k [bytes] (default)
gzip_level                 6 (default)
gzip_memlevel              8 (default)
http_gzip_support          on [bool] (default)
http_max_hdr               64 [header lines] (default)
http_range_support         on [bool] (default)
http_req_hdr_len           8k [bytes] (default)
http_req_size              32k [bytes] (default)
http_resp_hdr_len          8k [bytes] (default)
http_resp_size             32k [bytes] (default)
idle_send_timeout          60.000 [seconds] (default)
listen_depth               1024 [connections] (default)
lru_interval               2.000 [seconds] (default)
max_esi_depth              5 [levels] (default)
max_restarts               4 [restarts] (default)
max_retries                4 [retries] (default)
nuke_limit                 50 [allocations] (default)
pcre_match_limit           10000 (default)
pcre_match_limit_recursion 20 (default)
ping_interval              3 [seconds] (default)
pipe_timeout               60.000 [seconds] (default)
pool_req                   10,100,10 (default)
pool_sess                  10,100,10 (default)
pool_vbo                   10,100,10 (default)
prefer_ipv6                off [bool] (default)
rush_exponent              3 [requests per request] (default)
send_timeout               600.000 [seconds] (default)
session_max                100000 [sessions] (default)
shm_reclen                 255b [bytes] (default)
shortlived                 10.000 [seconds] (default)
sigsegv_handler            on [bool] (default)
syslog_cli_traffic         on [bool] (default)
tcp_fastopen               off [bool] (default)
tcp_keepalive_intvl        75.000 [seconds] (default)
tcp_keepalive_probes       9 [probes] (default)
tcp_keepalive_time         7200.000 [seconds] (default)
thread_pool_add_delay      2.000 [seconds]
thread_pool_destroy_delay  1.000 [seconds] (default)
thread_pool_fail_delay     0.200 [seconds] (default)
thread_pool_max            4000 [threads]
thread_pool_min            100 [threads] (default)
thread_pool_stack          48k [bytes] (default)
thread_pool_timeout        300.000 [seconds] (default)
thread_pools               8 [pools]
thread_queue_limit         20 (default)
thread_stats_rate          10 [requests] (default)
timeout_idle               5.000 [seconds] (default)
timeout_linger             0.050 [seconds] (default)
vcc_allow_inline_c         off [bool] (default)
vcc_err_unref              on [bool] (default)
vcc_unsafe_path            on [bool] (default)
vcl_cooldown               600.000 [seconds] (default)
vcl_dir                    /etc/varnish (default)
vmod_dir                   /usr/lib/varnish/vmods (default)
vsl_buffer                 4k [bytes] (default)
vsl_mask                   -VCL_trace,-WorkThread,-Hash,-VfpAcct (default)
vsl_reclen                 255b [bytes] (default)
vsl_space                  80M [bytes] (default)
vsm_free_cooldown          60.000 [seconds] (default)
vsm_space                  1M [bytes] (default)
workspace_backend          64k [bytes] (default)
workspace_client           64k [bytes] (default)
workspace_session          0.50k [bytes] (default)
workspace_thread           2k [bytes] (default)
@fgsch
Member
fgsch commented Nov 21, 2016

@Lutacon OK. Do you happen to have the varnishstat output at the time of the crash?
Also, what's the reason behind the 2 seconds thread_pool_add_delay setting? That looks like a mistake/leftover from a V3 to V4 migration. Can you remove that change and try re-enabling std?

@Lutacon
Lutacon commented Nov 21, 2016

@fgsch I don't have the varnishstat output for that time. About the thread_pool setting we are thinking about removing it from the daemon but we saw those parameters from varnish book. At this time we have this in the daemon options:

DAEMON_OPTS="-a :80 \
             -T localhost:6082 \
             -f /etc/varnish/drupal.vcl \
             -S /etc/varnish/secret \
             -p thread_pool_add_delay=2 \
             -p thread_pools=8 \
             -p thread_pool_min=100 \
             -p thread_pool_max=4000 \
             -s malloc,16G"

We can't try that at this moment because of the black friday and ciber monday. If we have a crash on those dates we are fucked. I'll let you know when we try those options.

@fgsch
Member
fgsch commented Nov 21, 2016

@Lutacon you can remove it entirely. Where in the Varnish Book this is?
Please note this used to be in milliseconds (in 3.0) and was changed to seconds.

@mbgrydeland
Contributor

Having looked at this case again, @nigoroll and me have a theory that this could be another way the bug from #2117 can present itself. #2117 was fixed by @nigoroll in #2133, and this fix has been backported to Varnish 4.1 by @hermunn in commit 49ce7f8. Please try the HEAD of the 4.1 branch.

@mbgrydeland
Contributor

Writing the commit message of 713be80, I got the ticket number reference wrong. Instead of a reference to this ticket (#2106), I specified #2117.

@mbgrydeland mbgrydeland closed this Dec 9, 2016
@hermunn
Contributor
hermunn commented Dec 9, 2016

@mbgrydeland I will fix the changelog

@hermunn
Contributor
hermunn commented Dec 16, 2016

Backport review: This is fixed in the latest version of 4.1, and @mbgrydeland also fixed the corresponding problem in master.

@jansonz
jansonz commented Feb 13, 2017 edited

Seems that the issue is still there on varnish-4.1.4. Root cause though has moved from line 820 to line 824:
Before upgrade:
Feb 5 20:46:17 hostname varnishd[26356]: Child (29271) not responding to CLI, killing it. Feb 5 20:46:17 hostname varnishd[26356]: Child (29271) not responding to CLI, killing it. Feb 5 20:46:18 hostname varnishd[26356]: Child (29271) died signal=6 Feb 5 20:46:18 hostname varnishd[26356]: Child (29271) Last panic at: Sun, 05 Feb 2017 20:46:18 GMT#012"Assert error in CNT_Request(), cache/cache_req_fsm.c line 820:#012 Condition(req->req_step == R_STP_LOOKUP || req->req_step == R_STP_RECV) not true.#012thread = (cache-worker)#012version = varnish-4.1.3 revision 5e3b6d2#012ident = Linux,3.10.0-327.28.3.el7.x86_64,x86_64,-junix,-smalloc,-smalloc,-hcritbit,epoll#012Backtrace:#012 0x4342f6: varnishd() [0x4342f6]#012 0x438be1: varnishd(CNT_Request+0x1c91) [0x438be1]#012 0x45018f: varnishd(HTTP1_Session+0xef) [0x45018f]#012 0x43a91d: varnishd(SES_Proto_Req+0x5d) [0x43a91d]#012 0x44aa61: varnishd() [0x44aa61]#012 0x44aecb: varnishd() [0x44aecb]#012 0x7fdc29e4adc5: libpthread.so.0(+0x7dc5) [0x7fdc29e4adc5]#012 0x7fdc29b77ced: libc.so.6(clone+0x6d) [0x7fdc29b77ced]#012req = 0x7fd59bf74020 {#012 vxid = 0, step = R_STP_RESTART,#012 req_body = R_BODY_INIT,#012 restarts = 0, esi_level = 0,#012 sp = 0x7fce98928620 {#012 fd = 198, vxid = 338328832,#012 client = XXX.XXX.XXX.XXX 36720,#012 t_open = 1486327570.443593,#012 t_idle = 1486327570.573333,#012 step = S_STP_H1PROC,#012 },#012 ws = 0x7fd59bf74200 {#012 id = \"req\",#012 {s,f,r,e} = {0x7fd59bf76000,0x7fd59bf76000,+32768,+57336},#012 },#012 http_conn = 0x7fd59bf74128 {#012 fd = 198,#012 doclose = NULL,#012 ws = 0x7fd59bf74200,#012 {rxbuf_b, rxbuf_e} = {0x7fd59bf76000, 0x7fd59bf76000},#012 {pipeline_b, pipeline_e} = {(nil), (nil)},#012 content_length = 0,#012 body_status = none,#012 first_byte_timeout = 0.000000,#012 between_bytes_timeout = 0.000000,#012 },#012 http[req] = 0x7fd59bf74298 {#012 ws[] = (nil),#012 hdrs {#012 },#012 },#012 flags = {#012 },#012},#012#012"

After upgrade:
`
Feb 10 21:45:57 hostname varnishd[6804]: Child (6805) not responding to CLI, killed it.
Feb 10 21:45:57 hostname varnishd[6804]: Unexpected reply from ping: 400 CLI communication error (hdr)
Feb 10 21:45:57 hostname varnishd[6804]: Child (6805) not responding to CLI, killed it.
Feb 10 21:45:57 hostname varnishd[6804]: Unexpected reply from ping: 400 CLI communication error
Feb 10 21:45:58 hostname varnishd[6804]: Child (6805) died signal=6
Feb 10 21:45:58 hostname varnishd[6804]: Child (6805) Last panic at: Fri, 10 Feb 2017 21:45:58 GMT#012"Assert error in CNT_Request(), cache/cache_req_fsm.c line 824:#12 Condition(req->req_step == R_STP_LOOKUP || req->req_step == R_STP_RECV) not true.#012thread = (cache-worker)#012version = varnish-4.1.4 revision 4529ff7#012ident = Linux,3.10.0-327.28.3.el7.x86_64,x86_64,-junix,-smalloc,-smalloc,-hcritbit,epoll#012now = 13437574.716160 (mono), 1486763150.990211 (real)#012Backtrace:#12 0x434612: pan_ic+0x192#012 0x438f36: CNT_Request+0x1cc6#012 0x450a7f: HTTP1_Session+0xef#012 0x43ac6d: SES_Proto_Req+0x5d#012 0x44b089: WRK_Thread+0x4c9#012 0x44b4fb: pool_thread+0x2b#012 0x7fcbd3bd8dc5: libpthread.so.0(+0x7dc5) [0x7fcbd3bd8dc5]#12 0x7fcbd3905ced: libc.so.6(clone+0x6d) [0x7fcbd3905ced]#012req = 0x7fc13cff0020 {#12 vxid = 0, step = R_STP_RESTART,#12 req_body = R_BODY_INIT,#12 restarts = 0, esi_level = 0,#12 sp = 0x7fbb95e46c20 {#12 fd = 111, vxid = 953949037,#12 client = XXX.XXX.XXX.XXX 60106,#12 t_open = 1486763150.806245,#12 t_idle = 1486763150.812608,#12 step = S_STP_H1PROC,#12 },#12 ws = 0x7fc13cff0200 {#12 id = "req",#12 {s,f,r,e} = {0x7fc13cff2000,0x7fc13cff2000,+32768,+57336},#12 },#12 http_conn = 0x7fc13cff0128 {#12 fd = 111,#12 doclose = NULL,#12 ws = 0x7fc13cff0200,#12 {rxbuf_b, rxbuf_e} = {0x7fc13cff2000, 0x7fc13cff2000},#12 {pipeline_b, pipeline_e} = {(nil), (nil)},#12 content_length = 0,#12 body_status = none,#12 first_byte_timeout = 0.000000,#12 between_bytes_timeout = 0.000000,#12 },#12 http[req] = 0x7fc13cff0298 {#12 ws[] = (nil),#12 hdrs {#12 },#12 },#12 flags = {#12 },#12},#12#012"

Feb 13 05:12:14 hostname varnishd[6804]: Child (14618) not responding to CLI, killed it.
Feb 13 05:12:14 hostname varnishd[6804]: Unexpected reply from ping: 400 CLI communication error (hdr)
Feb 13 05:12:15 hostname varnishd[6804]: Child (14618) died signal=6
Feb 13 05:12:15 hostname varnishd[6804]: Child (14618) Last panic at: Mon, 13 Feb 2017 05:12:15 GMT#012"Assert error in CNT_Request(), cache/cache_req_fsm.c line 824:#12 Condition(req->req_step == R_STP_LOOKUP || req->req_step == R_STP_RECV) not true.#012thread = (cache-worker)#012version = varnish-4.1.4 revision 4529ff7#012ident = Linux,3.10.0-327.28.3.el7.x86_64,x86_64,-junix,-smalloc,-smalloc,-hcritbit,epoll#012now = 13637155.584010 (mono), 1486962731.858061 (real)#012Backtrace:#12 0x434612: pan_ic+0x192#012 0x438f36: CNT_Request+0x1cc6#012 0x450a7f: HTTP1_Session+0xef#012 0x43ac6d: SES_Proto_Req+0x5d#012 0x44b089: WRK_Thread+0x4c9#012 0x44b4fb: pool_thread+0x2b#012 0x7fcbd3bd8dc5: libpthread.so.0(+0x7dc5) [0x7fcbd3bd8dc5]#12 0x7fcbd3905ced: libc.so.6(clone+0x6d) [0x7fcbd3905ced]#012req = 0x7fbb8917e020 {#12 vxid = 0, step = R_STP_RESTART,#12 req_body = R_BODY_INIT,#12 restarts = 0, esi_level = 0,#12 sp = 0x7fbbe7187220 {#12 fd = 41, vxid = 628272487,#12 client = XXX.XXX.XXX.XXX 44323,#12 t_open = 1486962731.698359,#12 t_idle = 1486962731.700768,#12 step = S_STP_H1PROC,#12 },#12 ws = 0x7fbb8917e200 {#12 id = "req",#12 {s,f,r,e} = {0x7fbb89180000,0x7fbb89180000,+32768,+57336},#12 },#12 http_conn = 0x7fbb8917e128 {#12 fd = 41,#12 doclose = NULL,#12 ws = 0x7fbb8917e200,#12 {rxbuf_b, rxbuf_e} = {0x7fbb89180000, 0x7fbb89180000},#12 {pipeline_b, pipeline_e} = {(nil), (nil)},#12 content_length = 0,#12 body_status = none,#12 first_byte_timeout = 0.000000,#12 between_bytes_timeout = 0.000000,#12 },#12 http[req] = 0x7fbb8917e298 {#12 ws[] = (nil),#12 hdrs {#12 },#12 },#12 flags = {#12 },#12},#12#012"
`

OS:
[root@hostname ~]$ cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)

Packages:
[root@hostname ~]$ rpm -qa|grep varnish
varnish-4.1.4-1.el7.x86_64
varnish-release-4.1-2.el7.noarch
varnish-agent-4.1.1-1.el7.centos.x86_64

Let me know if there is any more debug information I need to add.

Thanks,
J

@scoof
Contributor
scoof commented Feb 13, 2017

The fix is in 4.1.5, not 4.1.4

@jansonz
jansonz commented Feb 13, 2017

Thanks @scoof. Have upgraded to 4.1.5 now. Fingers crossed. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment