Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reserve a fraction of idle threads for each priority class #2796

Merged
merged 2 commits into from
Jan 15, 2020

Conversation

nigoroll
Copy link
Member

@nigoroll nigoroll commented Oct 9, 2018

Previously, we used a minimum number of idle threads (the reserve) to ensure that we do not assign all threads with client requests and no threads left over for backend requests.

This was actually only a special case of the more general issue exposed by h2: Lower priority tasks depend on higher priority tasks (for h2, sessions need streams, which need requests, which may need backend requests).

To solve this situation, we divide the reserve by the number of priority classes and schedule lower priority tasks only if there are enough idle threads to run higher priority tasks eventually.

This change does not guarantee any upper limit on the amount of time it can take for a task to be scheduled (e.g. backend requests could be blocking on arbitrarily long timeouts), so the thread pool watchdog is still warranted. But this change should guarantee that we do make progress eventually.

@nigoroll
Copy link
Member Author

nigoroll commented Oct 9, 2018

Maybe I could have chosen a cleaner wording: The reserve is proportional to the priority, so TASK_QUEUE_BO can use all of the reserve, TASK_QUEUE_RUSH 4/5th, TASK_QUEUE_REQ 3/5th etc.

@nigoroll
Copy link
Member Author

bugwash conclusion: phk to ponder a bit longer

@nigoroll
Copy link
Member Author

rebased, force pushed

@nigoroll nigoroll force-pushed the prio_class_reserve branch 4 times, most recently from 3683f1b to 438a138 Compare November 6, 2018 10:57
@nigoroll nigoroll force-pushed the prio_class_reserve branch 2 times, most recently from 4006203 to d238ffe Compare November 15, 2018 12:57
@nigoroll nigoroll force-pushed the prio_class_reserve branch 8 times, most recently from fedb594 to 72b46da Compare November 28, 2018 09:25
@nigoroll nigoroll force-pushed the prio_class_reserve branch 3 times, most recently from 44d7734 to 139f777 Compare December 7, 2018 10:52
@bsdphk
Copy link
Contributor

bsdphk commented Jan 14, 2019

Bugwash generally in favour.

@dridi to give it a whirl and feedback

@dridi
Copy link
Member

dridi commented Oct 29, 2019

I managed to salvage r2418.vtc from the grave, following my intuition that the new reserve logic would not remove the deadlock.

Show test case
varnishtest "h2 queuing deadlock"

barrier b1 cond 2

# dridi> I didn't work out exactly how I tickled the watchdog, because unlike
# the current reserve system on the master branch I run into a deadlock before
# before even involving a 3rd client. Analysis for later, I was sidetracked by
# a bug (049ce0e73) in varnishtest.

varnish v1 -cliok "param.set thread_pools 1"
varnish v1 -cliok "param.set thread_pool_min 5"
varnish v1 -cliok "param.set thread_pool_max 5"
varnish v1 -cliok "param.set thread_pool_reserve 0"
varnish v1 -cliok "param.set thread_pool_watchdog 2"
varnish v1 -cliok "param.set feature +http2"
varnish v1 -cliok "param.set feature +no_coredump"

varnish v1 -vcl {
	backend be none;
	sub vcl_recv {
		return (synth(200));
	}
} -start

logexpect l1 -v v1 -g raw -q "vxid == 0" {
	expect * 0	Error	"Pool Herder: Queue does not move"
} -start

client c1 {
	txpri
	stream 0 rxsettings -run

	barrier b1 sync

	stream 1 {
		txreq
		# dridi> not sure how relevant this still is
	} -run
} -start

client c2 {
	barrier b1 sync
	txpri
	delay 3
} -start

client c1 -wait
client c2 -wait

logexpect l1 -wait

varnish v1 -cliok panic.show
varnish v1 -cliok panic.clear

varnish v1 -expectexit 0x20

@nigoroll nigoroll force-pushed the prio_class_reserve branch 2 times, most recently from 1dcc1bf to 54bed57 Compare November 13, 2019 16:49
@daghf
Copy link
Member

daghf commented Nov 21, 2019

I'm no longer able to deadlock Varnish with this patch set in place. Good job :-)

@dridi the test case does not deadlock, the reason we trigger the watchdog is due to the delay 3 being longer than the watchdog timeout.

I wonder if we might still have a theoretical deadlock possibility for h/2 via Upgrade: h2c, where the upgraded request gets dispatched as a stream with prio TASK_QUEUE_REQ in h2_ou_session(). We should probably turn this into TASK_QUEUE_STR.

@dridi
Copy link
Member

dridi commented Nov 21, 2019

Thank you @daghf for your analysis, when I produced the test case I wasn't sure what was happening yet and when I tried to analyze what was going on I couldn't focus properly.

Besides your suggestion, do you have an idea on how to fix the spurious watchdog trigger? This is caused by a client, so this is something we should either fix as part of this pull request or soon after if we decide to merge it in its current form.

But this is not something we should allow to fall on the back burner.

@nigoroll
Copy link
Member Author

@daghf a big thank you also from my side. As @dridi has passed the maintainer hat on to you for this one, would you mind leaving an approval review?

Copy link
Member

@dridi dridi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I managed to reproduce the h2 panic in #3142 so there is nothing holding my approval back, except maybe that you might as well squash squashme commits when you rebase against master.

Copy link
Member

@daghf daghf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

bin/varnishd/cache/cache.h Outdated Show resolved Hide resolved
@nigoroll nigoroll force-pushed the prio_class_reserve branch 4 times, most recently from c34388f to 1f3c5e0 Compare December 17, 2019 16:10
@nigoroll
Copy link
Member Author

review session with @bsdphk and @dridi :

  • rename TASK_QUEUE_END -> TASK_QUEUE__END
  • add an assert thread_pool_min.min is >= TASK_QUEUE__END

OK otherwise

Previously, we used a minimum number of idle threads (the reserve) to
ensure that we do not assign all threads with client requests and no
threads left over for backend requests.

This was actually only a special case of the more general issue
exposed by h2: Lower priority tasks depend on higher priority tasks
(for h2, sessions need streams, which need requests, which may need
backend requests).

To solve this problem, we divide the reserve by the number of priority
classes and schedule lower priority tasks only if there are enough
idle threads to run higher priority tasks eventually.

This change does not guarantee any upper limit on the amount of time
it can take for a task to be scheduled (e.g. backend requests could be
blocking on arbitrarily long timeouts), so the thread pool watchdog is
still warranted. But this change should guarantee that we do make
progress eventually.

With the reserves, thread_pool_min needs to be no smaller than the
number of priority classes (TASK_QUEUE__END). Ideally, we should have
an even higher minimum (@dridi rightly suggested to make it 2 *
TASK_QUEUE__END), but that would prevent the very useful test
t02011.vtc.

For now, the value of TASK_QUEUE__END (5) is hardcoded as such for the
parameter configuration and documentation because auto-generating it
would require include/macro dances which I consider over the top for
now. Instead, the respective places are marked and an assert is in
place to ensure we do not start a worker with too small a number of
workers. I dicided against checks in the manager to avoid include
pollution from the worker (cache.h) into the manager.

Fixes varnishcache#2418 for real
This test is to detect a deadlock which does not exist any more. IMHO,
the only sensible way to test for the lack of it now is to do a load
test, which is not what we want in vtc.
@nigoroll nigoroll merged commit 5fe2a46 into varnishcache:master Jan 15, 2020
@nigoroll
Copy link
Member Author

FTR:

add an assert thread_pool_min.min is >= TASK_QUEUE__END

from the commit message:

the respective places are marked and an assert is in place to ensure we do not start a worker with too small a number of workers. I decided against checks in the manager to avoid include pollution from the worker (cache.h) into the manager.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants