first_byte_timeout is ignored when re-using a backend connection (HTTP Keep Alive) #1772

bsdphk · 2016-03-04T13:27:10Z

Old ticket imported from Trac
See archived copy here: https://varnish-cache.org/trac/ticket/1772

mbgrydeland · 2016-03-18T12:41:50Z

I've been pondering this and #1806 a bit.

To me it seems that the base problem relates to the inability to forcefully remove a vbc from the waiter code. We used to have that option in the API, but it proved racy and was removed at some point during the 4.1 development cycle.

To fix the race issue, reused vbc's would start off in a STOLEN state. STOLEN here meaning that the waiter has the fd, and has not yet released it from it's in kernel event monitoring. The VBT_Wait function would wait for the waiter to release it, and the fetch code would take care to call VBT_Wait before starting to read or poll on the socket. But for the waiter release to happen, some event would need to register on the fd. The event could either be incoming bytes, a remote hangup or the waiter implemented timeout. The waiter timeout set for vbc's is backend_idle_timeout (default 60s). Lacking any incoming bytes or hangup, this translates to the first_byte_timeout not being used, but instead a timeout up to backend_idle_timeout is used.

To remedy this I suggest taking a slightly different approach. Instead of forcing a waiter exit first and then doing the reads on the vbc, we should use the waiter to do the first read timeout instead. To do this, VBT_Wait would go away, and be replaced with a VBT_Read taking a read timeout as an argument and is tasked with both reading bytes and implementing the read timeout. Fetch code would then call VBT_Read to read bytes, providing first_byte_timeout on the first call, and between_bytes_timeout on subsequent calls.

VBT_Read would upon finding that the vbc is still on the waiter, call into the waiter API to change the timeout. This would require changes to the waiter API, but looks doable (would need to trigger a loop in the waiter thread through the control socket like when inserting new entries). VBT_Read then does a pthread_cond_wait like before waiting for the waiter event. The waiter event would remove the fd from its set, so this slower path would only happen once. When the vbc is not on the waiter, a poll with timeout is done instead.

The waiter API for changing the timeout would have to be able to return failure. This could happen if the call to change the timeout happens at the same time as a socket event happens. The calling code in VBT_Read would need to handle that.

It is my belief that these changes would also fix #1806.

Martin

bsdphk · 2016-03-18T14:17:23Z

Yeah, I've had thoughts along the same lines myself, but it's more complexity than I like...

This avoids getting stuck with a stalled backend connection until reaching backend_idle_timeout. We now wait for the waiter to release the fd for a limited amount of time (first_byte_timeout). In case we reach the timeout, the backend connection is killed to make sure we are not going to reuse it. Fixes varnishcache#1772.

fgaillot · 2017-06-16T10:17:02Z

Hi,

I got into the same issue describe above. I'm not familiar with varnish code but I gave it a try and a wrote a patch for this. I tried a different approach from the one proposed by @mbgrydeland: the VBT_Wait() function now takes a timeout argument (set to first_byte_timeout) to abort before reaching backend_idle_timeout. If the timeout is reached, the backend connection is then killed to avoid reusing it.

This approach looks simpler but may not be convenient for some reason I didn't see. Please fell free to give any feedbacks or remarks on this patch.

See #2347.

Thx.

PS: I tried this patch on 5.1 and 4.1 branch only, but the check-pick on master applies without conflicting. make check is successful on both branches.

Fixes: varnishcache#1772

Fixes: #1772

Fixes: varnishcache#1772

fgsch mentioned this issue Mar 12, 2016

one minute delay on return (pipe) and a POST-Request #1806

Closed

jpastuszek mentioned this issue Nov 3, 2016

Varnish stuck on VBT_Wait after reused connection (VBC_STATE_STOLEN) failed V1F_SendReq #2126

Closed

ema mentioned this issue Nov 15, 2016

Introduce a parameter value for extrachance retries #2135

Closed

dridi added b=bug c=varnishd r=4.1 r=trunk labels Apr 18, 2017

fgaillot mentioned this issue Jun 16, 2017

Make sure first_byte_timeout is applied on reused backend connection #2347

Closed

daghf mentioned this issue Nov 16, 2017

Backend "extrachance" retry fixes #2490

Merged

daghf added a commit to daghf/varnish-cache that referenced this issue Nov 24, 2017

Honor first_byte_timeout for recycled backend connections

0aaf2b5

Fixes: varnishcache#1772

daghf added a commit to daghf/varnish-cache that referenced this issue Nov 24, 2017

Honor first_byte_timeout for recycled backend connections

7837fd8

Fixes: varnishcache#1772

daghf added a commit to daghf/varnish-cache that referenced this issue Nov 24, 2017

Honor first_byte_timeout for recycled backend connections

79ab625

Fixes: varnishcache#1772

daghf added a commit to daghf/varnish-cache that referenced this issue Nov 24, 2017

Honor first_byte_timeout for recycled backend connections

cfae478

Fixes: varnishcache#1772

daghf added a commit to daghf/varnish-cache that referenced this issue Nov 24, 2017

Honor first_byte_timeout for recycled backend connections

b953320

Fixes: varnishcache#1772

daghf added a commit to daghf/varnish-cache that referenced this issue Nov 26, 2017

Honor first_byte_timeout for recycled backend connections

2cf212b

Fixes: varnishcache#1772

daghf mentioned this issue Nov 26, 2017

First byte timeout fixes #2504

Merged

daghf closed this as completed in #2504 Nov 27, 2017

daghf added a commit that referenced this issue Nov 27, 2017

Honor first_byte_timeout for recycled backend connections

eecd409

Fixes: #1772

vssync added the a=backport-review-4.1 label Nov 27, 2017

daghf added a commit that referenced this issue Nov 27, 2017

Honor first_byte_timeout for recycled backend connections

9754715

Fixes: #1772

daghf removed the a=backport-review-4.1 label Nov 27, 2017

dmatetelki pushed a commit to dmatetelki/varnish-cache that referenced this issue Mar 14, 2019

Honor first_byte_timeout for recycled backend connections

68bdb81

Fixes: varnishcache#1772

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

first_byte_timeout is ignored when re-using a backend connection (HTTP Keep Alive) #1772

first_byte_timeout is ignored when re-using a backend connection (HTTP Keep Alive) #1772

bsdphk commented Mar 4, 2016

mbgrydeland commented Mar 18, 2016

bsdphk commented Mar 18, 2016

fgaillot commented Jun 16, 2017

first_byte_timeout is ignored when re-using a backend connection (HTTP Keep Alive) #1772

first_byte_timeout is ignored when re-using a backend connection (HTTP Keep Alive) #1772

Comments

bsdphk commented Mar 4, 2016

mbgrydeland commented Mar 18, 2016

bsdphk commented Mar 18, 2016

fgaillot commented Jun 16, 2017