reserve a fraction of idle threads for each priority class #2796

nigoroll · 2018-10-09T12:21:11Z

Previously, we used a minimum number of idle threads (the reserve) to ensure that we do not assign all threads with client requests and no threads left over for backend requests.

This was actually only a special case of the more general issue exposed by h2: Lower priority tasks depend on higher priority tasks (for h2, sessions need streams, which need requests, which may need backend requests).

To solve this situation, we divide the reserve by the number of priority classes and schedule lower priority tasks only if there are enough idle threads to run higher priority tasks eventually.

This change does not guarantee any upper limit on the amount of time it can take for a task to be scheduled (e.g. backend requests could be blocking on arbitrarily long timeouts), so the thread pool watchdog is still warranted. But this change should guarantee that we do make progress eventually.

nigoroll · 2018-10-09T13:54:03Z

Maybe I could have chosen a cleaner wording: The reserve is proportional to the priority, so TASK_QUEUE_BO can use all of the reserve, TASK_QUEUE_RUSH 4/5th, TASK_QUEUE_REQ 3/5th etc.

nigoroll · 2018-10-15T11:50:38Z

bugwash conclusion: phk to ponder a bit longer

nigoroll · 2018-10-27T15:48:39Z

rebased, force pushed

bsdphk · 2019-01-14T12:47:47Z

Bugwash generally in favour.

@dridi to give it a whirl and feedback

should be fixed

dridi · 2019-10-29T17:23:06Z

I managed to salvage r2418.vtc from the grave, following my intuition that the new reserve logic would not remove the deadlock.

Show test case

varnishtest "h2 queuing deadlock"

barrier b1 cond 2

# dridi> I didn't work out exactly how I tickled the watchdog, because unlike
# the current reserve system on the master branch I run into a deadlock before
# before even involving a 3rd client. Analysis for later, I was sidetracked by
# a bug (049ce0e73) in varnishtest.

varnish v1 -cliok "param.set thread_pools 1"
varnish v1 -cliok "param.set thread_pool_min 5"
varnish v1 -cliok "param.set thread_pool_max 5"
varnish v1 -cliok "param.set thread_pool_reserve 0"
varnish v1 -cliok "param.set thread_pool_watchdog 2"
varnish v1 -cliok "param.set feature +http2"
varnish v1 -cliok "param.set feature +no_coredump"

varnish v1 -vcl {
	backend be none;
	sub vcl_recv {
		return (synth(200));
	}
} -start

logexpect l1 -v v1 -g raw -q "vxid == 0" {
	expect * 0	Error	"Pool Herder: Queue does not move"
} -start

client c1 {
	txpri
	stream 0 rxsettings -run

	barrier b1 sync

	stream 1 {
		txreq
		# dridi> not sure how relevant this still is
	} -run
} -start

client c2 {
	barrier b1 sync
	txpri
	delay 3
} -start

client c1 -wait
client c2 -wait

logexpect l1 -wait

varnish v1 -cliok panic.show
varnish v1 -cliok panic.clear

varnish v1 -expectexit 0x20

daghf · 2019-11-21T10:45:28Z

I'm no longer able to deadlock Varnish with this patch set in place. Good job :-)

@dridi the test case does not deadlock, the reason we trigger the watchdog is due to the delay 3 being longer than the watchdog timeout.

I wonder if we might still have a theoretical deadlock possibility for h/2 via Upgrade: h2c, where the upgraded request gets dispatched as a stream with prio TASK_QUEUE_REQ in h2_ou_session(). We should probably turn this into TASK_QUEUE_STR.

dridi · 2019-11-21T10:57:07Z

Thank you @daghf for your analysis, when I produced the test case I wasn't sure what was happening yet and when I tried to analyze what was going on I couldn't focus properly.

Besides your suggestion, do you have an idea on how to fix the spurious watchdog trigger? This is caused by a client, so this is something we should either fix as part of this pull request or soon after if we decide to merge it in its current form.

But this is not something we should allow to fall on the back burner.

nigoroll · 2019-11-21T10:58:51Z

@daghf a big thank you also from my side. As @dridi has passed the maintainer hat on to you for this one, would you mind leaving an approval review?

dridi

I think I managed to reproduce the h2 panic in #3142 so there is nothing holding my approval back, except maybe that you might as well squash squashme commits when you rebase against master.

daghf

LGTM

bin/varnishd/cache/cache.h

nigoroll · 2020-01-15T09:57:05Z

review session with @bsdphk and @dridi :

rename TASK_QUEUE_END -> TASK_QUEUE__END
add an assert thread_pool_min.min is >= TASK_QUEUE__END

OK otherwise

@dridi

Previously, we used a minimum number of idle threads (the reserve) to ensure that we do not assign all threads with client requests and no threads left over for backend requests. This was actually only a special case of the more general issue exposed by h2: Lower priority tasks depend on higher priority tasks (for h2, sessions need streams, which need requests, which may need backend requests). To solve this problem, we divide the reserve by the number of priority classes and schedule lower priority tasks only if there are enough idle threads to run higher priority tasks eventually. This change does not guarantee any upper limit on the amount of time it can take for a task to be scheduled (e.g. backend requests could be blocking on arbitrarily long timeouts), so the thread pool watchdog is still warranted. But this change should guarantee that we do make progress eventually. With the reserves, thread_pool_min needs to be no smaller than the number of priority classes (TASK_QUEUE__END). Ideally, we should have an even higher minimum (@dridi rightly suggested to make it 2 * TASK_QUEUE__END), but that would prevent the very useful test t02011.vtc. For now, the value of TASK_QUEUE__END (5) is hardcoded as such for the parameter configuration and documentation because auto-generating it would require include/macro dances which I consider over the top for now. Instead, the respective places are marked and an assert is in place to ensure we do not start a worker with too small a number of workers. I dicided against checks in the manager to avoid include pollution from the worker (cache.h) into the manager. Fixes varnishcache#2418 for real

This test is to detect a deadlock which does not exist any more. IMHO, the only sensible way to test for the lack of it now is to do a load test, which is not what we want in vtc.

nigoroll · 2020-01-15T16:10:25Z

FTR:

add an assert thread_pool_min.min is >= TASK_QUEUE__END

from the commit message:

the respective places are marked and an assert is in place to ensure we do not start a worker with too small a number of workers. I decided against checks in the manager to avoid include pollution from the worker (cache.h) into the manager.

nigoroll force-pushed the prio_class_reserve branch from 71e0cf8 to 87b56f9 Compare October 9, 2018 12:21

nigoroll force-pushed the prio_class_reserve branch from 87b56f9 to a119f98 Compare October 18, 2018 13:23

bsdphk added the a=need bugwash label Oct 22, 2018

nigoroll force-pushed the prio_class_reserve branch from a119f98 to e87de12 Compare October 27, 2018 15:48

nigoroll mentioned this pull request Nov 1, 2018

varnish stops accepting on UDS after panic #2814

Closed

nigoroll force-pushed the prio_class_reserve branch 4 times, most recently from 3683f1b to 438a138 Compare November 6, 2018 10:57

nigoroll force-pushed the prio_class_reserve branch 2 times, most recently from 4006203 to d238ffe Compare November 15, 2018 12:57

nigoroll force-pushed the prio_class_reserve branch 8 times, most recently from fedb594 to 72b46da Compare November 28, 2018 09:25

nigoroll force-pushed the prio_class_reserve branch 3 times, most recently from 44d7734 to 139f777 Compare December 7, 2018 10:52

nigoroll mentioned this pull request Dec 7, 2018

Worker Pool Queue does not move #2862

Closed

bsdphk added a=feedback please and removed a=need bugwash labels Jan 14, 2019

bsdphk assigned dridi Jan 14, 2019

nigoroll removed the a=help wanted label Oct 29, 2019

nigoroll force-pushed the prio_class_reserve branch from 71212b7 to a24e317 Compare October 29, 2019 15:37

nigoroll force-pushed the prio_class_reserve branch from a24e317 to 0751ab2 Compare November 5, 2019 15:38

nigoroll force-pushed the prio_class_reserve branch 2 times, most recently from 1dcc1bf to 54bed57 Compare November 13, 2019 16:49

nigoroll force-pushed the prio_class_reserve branch from 54bed57 to a59618a Compare November 21, 2019 14:20

dridi mentioned this pull request Nov 21, 2019

Watchdog panic caused by h2 traffic #3142

Closed

dridi approved these changes Nov 21, 2019

View reviewed changes

daghf approved these changes Nov 22, 2019

View reviewed changes

bin/varnishd/cache/cache.h Outdated Show resolved Hide resolved

nigoroll force-pushed the prio_class_reserve branch 4 times, most recently from c34388f to 1f3c5e0 Compare December 17, 2019 16:10

nigoroll force-pushed the prio_class_reserve branch from 1f3c5e0 to dcf5915 Compare January 7, 2020 10:33

nigoroll added the a=OK'ed label Jan 15, 2020

nigoroll added 2 commits January 15, 2020 17:09

remove a now pointless vtc

5fe2a46

This test is to detect a deadlock which does not exist any more. IMHO, the only sensible way to test for the lack of it now is to do a load test, which is not what we want in vtc.

nigoroll force-pushed the prio_class_reserve branch from dcf5915 to 5fe2a46 Compare January 15, 2020 16:09

nigoroll merged commit 5fe2a46 into varnishcache:master Jan 15, 2020

nigoroll deleted the prio_class_reserve branch January 15, 2020 16:10

dridi mentioned this pull request Mar 6, 2020

Assert error in EXP_Insert(), cache/cache_expire.c line 151: Condition((oc->flags & OC_F_DYING) == 0) not true. #2999

Closed

dridi mentioned this pull request Apr 14, 2020

Worker scheduling back-ports for 6.0 #3290

Closed

rezan mentioned this pull request Apr 13, 2021

[LTS Backport] H2 thread priority #3590

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reserve a fraction of idle threads for each priority class #2796

reserve a fraction of idle threads for each priority class #2796

nigoroll commented Oct 9, 2018

nigoroll commented Oct 9, 2018

nigoroll commented Oct 15, 2018

nigoroll commented Oct 27, 2018

bsdphk commented Jan 14, 2019

dridi commented Oct 29, 2019

daghf commented Nov 21, 2019

dridi commented Nov 21, 2019

nigoroll commented Nov 21, 2019

dridi left a comment

daghf left a comment

nigoroll commented Jan 15, 2020

nigoroll commented Jan 15, 2020

reserve a fraction of idle threads for each priority class #2796

reserve a fraction of idle threads for each priority class #2796

Conversation

nigoroll commented Oct 9, 2018

nigoroll commented Oct 9, 2018

nigoroll commented Oct 15, 2018

nigoroll commented Oct 27, 2018

bsdphk commented Jan 14, 2019

dridi commented Oct 29, 2019

daghf commented Nov 21, 2019

dridi commented Nov 21, 2019

nigoroll commented Nov 21, 2019

dridi left a comment

Choose a reason for hiding this comment

daghf left a comment

Choose a reason for hiding this comment

nigoroll commented Jan 15, 2020

nigoroll commented Jan 15, 2020