WS_Assert(), cache/cache_ws.c line 59 #1834

bsdphk · 2016-03-04T13:30:46Z

Old ticket imported from Trac
See archived copy here: https://varnish-cache.org/trac/ticket/1834

nigoroll · 2016-05-27T12:32:40Z

lkarsten · 2016-05-27T12:36:30Z

I don't think we've squashed this one. I asked the reporter to check if it still happens with 4.1HEAD, timeout in 10d if nothing heard.

porcospino · 2016-06-16T15:43:52Z

Just had it happen on 4.1.2
Last panic at: Thu, 16 Jun 2016 12:02:44 GMT
"Assert error in WS_Assert(), cache/cache_ws.c line 59:
Condition(*ws->e == 0x15) not true.
thread = (cache-worker)
version = varnish-4.1.2 revision 0d7404e
ident = Linux,3.13.0-87-generic,x86_64,-junix,-smalloc,-smalloc,-hcritbit,epoll
Backtrace:
0x433976: varnishd() [0x433976]
0x44a5ab: varnishd(WS_Assert+0x18b) [0x44a5ab]
0x44af71: varnishd(WS_ReleaseP+0x11) [0x44af71]
0x43a865: varnishd(SES_RxStuff+0x205) [0x43a865]
0x44ea78: varnishd(HTTP1_Session+0x178) [0x44ea78]
0x439d0d: varnishd(SES_Proto_Req+0x5d) [0x439d0d]
0x4493d1: varnishd() [0x4493d1]
0x44983b: varnishd() [0x44983b]
0x7f897fec9184: libpthread.so.0(+0x8184) [0x7f897fec9184]
0x7f897fbf637d: libc.so.6(clone+0x6d) [0x7f897fbf637d]
req = 0x7f89660ec020 {
vxid = 0, step = R_STP_RESTART,
req_body = R_BODY_INIT,
restarts = 0, esi_level = 0,
sp = 0x7f8957ad2a20 {
fd = 25, vxid = 29036,
client = 192.168.69.189 59741,
step = S_STP_H1NEWREQ,
},
ws = 0x7f89660ec200 {
id = "req",
{s,f,r,e} = {0x7f89660ee000,+32768,+57336,+57336},
},
http_conn = 0x7f89660ec128 {
fd = 25,
doclose = NULL,
ws = 0x7f89660ec200,
{rxbuf_b, rxbuf_e} = {0x7f89660f6000, 0x7f89660fdfe3},
{pipeline_b, pipeline_e} = {(nil), (nil)},
content_length = 0,
body_status = none,
first_byte_timeout = 0.000000,
between_bytes_timeout = 0.000000,
},
http[req] = 0x7f89660ec298 {
ws[] = (nil),
hdrs {
},
},
flags = {
},
},

rnsanchez · 2016-06-16T18:15:37Z

We do see this panic on our custom 4.1 branch (some internal functionalities on top of upstream/4.1). As suggested in the old ticket (Trac), we're checking whether 96 KB for both client and backend workspaces help.

Last panic at: Thu, 16 Jun 2016 06:00:27 GMT
"Assert error in WS_Assert(), cache/cache_ws.c line 59:
  Condition(*ws->e == 0x15) not true.
thread = (cache-worker)
version = varnish-4.1.3-dev revision 96c00a5
ident = Linux,4.4.11,x86_64,-junix,-sfile,-smalloc,-smalloc,-hcritbit,epoll
Backtrace:
  0x434502: pan_ic+0x152
  0x44b7c6: WS_Assert+0x176
  0x44c171: WS_ReleaseP+0x11
  0x43b56d: SES_RxStuff+0x1fd
  0x4500c2: HTTP1_Session+0x1b2
  0x43aa2d: SES_Proto_Req+0x5d
  0x44a62a: WRK_Thread+0x47a
  0x44aa7b: pool_thread+0x2b
  0x7f33fa0ca654: libpthread.so.0(+0x7654) [0x7f33fa0ca654]
  0x7f33f9e007dd: libc.so.6(clone+0x6d) [0x7f33f9e007dd]
req = 0x7c459fa47020 {
      vxid = 0, step = R_STP_RESTART,
      req_body = R_BODY_INIT,
      restarts = 0, esi_level = 0,
      sp = 0x7c45a844d620 {
        fd = 114, vxid = 67983,
        client = 186.250.8.9 11882,
        t_open = 1466056827.205797,
        t_idle = 1466056827.205797,
    step = S_STP_H1NEWREQ,
  },
  ws = 0x7c459fa47208 {
    id = \"req\",
    {s,f,r,e} = {0x7c459fa49008,+32768,+57328,+57328},
  },
  http_conn = 0x7c459fa47130 {
    fd = 114,
    doclose = NULL,
    ws = 0x7c459fa47208,
    {rxbuf_b, rxbuf_e} = {0x7c459fa51008, 0x7c459fa58feb},
    {pipeline_b, pipeline_e} = {(nil), (nil)},
    content_length = 0,
    body_status = none,
    first_byte_timeout = 0.000000,
    between_bytes_timeout = 0.000000,
  },
  http[req] = 0x7c459fa472a0 {
    ws[] = (nil),
    hdrs {
    },
  },
  flags = {
  },
},

"

lkarsten · 2016-06-17T07:34:53Z

@rnsanchez and @porcospino - Can you please try the 4.1.3-beta1 version and see if that helps? See https://www.varnish-cache.org/releases/index.html#releases .

There isn't a specific fix for this in there, but it may be that one of the many other fixed bugs is provoking this.

rnsanchez · 2016-06-17T11:30:50Z

@lkarsten I need to review the commits up to 4.1.3-beta1 before rolling out to this server, but I will keep you posted.

lkarsten · 2016-08-25T06:57:32Z

@rnsanchez Is still current? Did you rebase your branch to 4.1.3? Do you still see it with the raised workspace sizes you describe in #2008 ?

rnsanchez · 2016-08-25T14:01:00Z

@lkarsten we're still using 96 KB, and our deployed branch is based on commit 1d73aa7. We have not seen anything specific to this ticket since new binaries were deployed (around July 9th).

Sorry for the delay.

bsdphk · 2016-09-12T04:53:55Z

timing this out.

hermunn · 2016-09-13T09:04:37Z

backport review: Timed out, nothing to backport

Dridi · 2016-11-25T14:45:43Z

We have a theory and a likely culprit for this problem. Basically, if planets^Wpointers align, session initialization [1,2] may overwrite the end-of-workspace marker with a NUL char in 4.1. This is also true in current master [3] and possibly even more unsafe for H2. This is basically because we disregard the result of the workspace reservation.

[1] https://github.com/varnishcache/varnish-cache/blob/9dc2ec5/bin/varnishd/cache/cache_session.c#L154-L157
[2] https://github.com/varnishcache/varnish-cache/blob/9dc2ec5/bin/varnishd/cache/cache_session.c#L174-L185
[3] https://github.com/varnishcache/varnish-cache/blob/3f0b634/bin/varnishd/cache/cache_session.c#L193-L204

mbgrydeland · 2016-11-25T16:11:57Z

FWIW, the attached test case triggers the assertion on my machine. I am not suggesting this test case for inclusion, as the sizes used is waaaay to fragile to be useful as a regression test. The size of the bogus allocation needs to be exactly large enough to have zero WS left at the right point in order to trigger the bug. I ended up on 208 bytes by trial and success ;-)
Edit: 192 is the magic number on my computer for Varnish 4.1 HEAD.
hunch-ws.vtc.txt

HTC_RxInit could write a single '\0' NUL character outside of the workspace when its called and there is zero bytes left in the workspace. This would trigger the workspace canary causing subsequent assertion. Fix by releaving HTC_RxInit of adding the '\0' character. HTC_RxStuff now returns HTC_S_Overflow early if the available buffer space is zero. The '\0' character is inserted just before calling the completion check function. Also fix an off-by-one error on the http_{req|resp}_size calculations, where the maximum number of bytes accepted was one less than the paramter indicated. c00039.vtc and c00040.vtc has been edited to reflect that and to be more expressive about the sizes they generate. Fixes: varnishcache#1834

HTC_RxInit and HTC_RxReInit could write a single '\0' NUL character outside of the workspace when its called and there is zero bytes left in the workspace. This would trigger the workspace canary causing subsequent assertion. Fix by releaving HTC_RxInit and HTC_RxReInit of adding the '\0' character. HTC_RxStuff and V1F_FetchRespHdr returns HTC_S_OVERFLOW if the available buffer space is zero. Both make sure to insert the '\0' character just before calling the completion check function. Note that this fix does not change the fact that we have exchausted the workspace and are unable to continue. Varnishd will panic nonetheless, but at least we have not stepped out of our boundries. Ref: #1834

hermunn · 2017-01-27T15:02:43Z

FYI: The final commit, c1fa324 is a backport to 4.1. Thanks again @mbgrydeland!

lkarsten self-assigned this May 30, 2016

rnsanchez mentioned this issue Jul 7, 2016

Panic: Assert error in VBE_Delete() #2008

Closed

bsdphk closed this as completed Sep 12, 2016

vssync added the backport-review-4.1 label Sep 12, 2016

hermunn removed the backport-review-4.1 label Sep 13, 2016

Dridi reopened this Nov 25, 2016

Dridi added 4.1 b=bug a=need bugwash r=trunk c=varnishd labels Nov 26, 2016

mbgrydeland closed this as completed in 4bcc1c2 Dec 5, 2016

ghost mentioned this issue Feb 17, 2017

Crash in 4.1.5 with "Assert error in SES_RxReInit(), cache/cache_session.c line 179" #2219

Closed

nigoroll mentioned this issue Jul 5, 2022

Condition(*ws->e == 0x15) not true // Assert error in WS_Assert(), cache/cache_ws.c line 59 #3822

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WS_Assert(), cache/cache_ws.c line 59 #1834

WS_Assert(), cache/cache_ws.c line 59 #1834

bsdphk commented Mar 4, 2016

nigoroll commented May 27, 2016

lkarsten commented May 27, 2016 •

edited

porcospino commented Jun 16, 2016 •

edited

rnsanchez commented Jun 16, 2016

lkarsten commented Jun 17, 2016

rnsanchez commented Jun 17, 2016

lkarsten commented Aug 25, 2016

rnsanchez commented Aug 25, 2016

bsdphk commented Sep 12, 2016

hermunn commented Sep 13, 2016

Dridi commented Nov 25, 2016

mbgrydeland commented Nov 25, 2016 •

edited

hermunn commented Jan 27, 2017

WS_Assert(), cache/cache_ws.c line 59 #1834

WS_Assert(), cache/cache_ws.c line 59 #1834

Comments

bsdphk commented Mar 4, 2016

nigoroll commented May 27, 2016

lkarsten commented May 27, 2016 • edited

porcospino commented Jun 16, 2016 • edited

rnsanchez commented Jun 16, 2016

lkarsten commented Jun 17, 2016

rnsanchez commented Jun 17, 2016

lkarsten commented Aug 25, 2016

rnsanchez commented Aug 25, 2016

bsdphk commented Sep 12, 2016

hermunn commented Sep 13, 2016

Dridi commented Nov 25, 2016

mbgrydeland commented Nov 25, 2016 • edited

hermunn commented Jan 27, 2017

lkarsten commented May 27, 2016 •

edited

porcospino commented Jun 16, 2016 •

edited

mbgrydeland commented Nov 25, 2016 •

edited