Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker scheduling back-ports for 6.0 #3290

Closed
wants to merge 7 commits into from
Closed

Conversation

dridi
Copy link
Member

@dridi dridi commented Apr 14, 2020

This is a back-port of #2942 and #2796 with their dependencies to avoid conflicts.

Conflicts still exist, mostly because I didn't port #3099 in the process.

nigoroll and others added 7 commits April 14, 2020 17:53
it should remove the worker it returns from the idle queue for clarity
When getidleworker signals the pool herder to spawn a thread, it increases
the dry counter, and the herder resets dry again when spawning a single
thread. This will in many cases only create a single thread even though
the herder was signaled dry multiple times, and may cause a situation
where the cache acceptor is queued and no new thread is created. Together
with long lasting tasks (ie endless pipelines), and all other tasks having
higher priority, this will prevent the cache acceptor from being
rescheduled.

c00096.vtc demonstrates how this can lock up.

To fix this, spawn threads if we have queued tasks and we are below the
thread maximum level.
Fix a small memory leak when failing to spawn a new thread.
Previously, we used a minimum number of idle threads (the reserve) to
ensure that we do not assign all threads with client requests and no
threads left over for backend requests.

This was actually only a special case of the more general issue
exposed by h2: Lower priority tasks depend on higher priority tasks
(for h2, sessions need streams, which need requests, which may need
backend requests).

To solve this problem, we divide the reserve by the number of priority
classes and schedule lower priority tasks only if there are enough
idle threads to run higher priority tasks eventually.

This change does not guarantee any upper limit on the amount of time
it can take for a task to be scheduled (e.g. backend requests could be
blocking on arbitrarily long timeouts), so the thread pool watchdog is
still warranted. But this change should guarantee that we do make
progress eventually.

With the reserves, thread_pool_min needs to be no smaller than the
number of priority classes (TASK_QUEUE__END). Ideally, we should have
an even higher minimum (@dridi rightly suggested to make it 2 *
TASK_QUEUE__END), but that would prevent the very useful test
t02011.vtc.

For now, the value of TASK_QUEUE__END (5) is hardcoded as such for the
parameter configuration and documentation because auto-generating it
would require include/macro dances which I consider over the top for
now. Instead, the respective places are marked and an assert is in
place to ensure we do not start a worker with too small a number of
workers. I dicided against checks in the manager to avoid include
pollution from the worker (cache.h) into the manager.

Fixes varnishcache#2418 for real

Conflicts:
	bin/varnishd/cache/cache_wrk.c
	bin/varnishd/mgt/mgt_pool.c
... introduced with 3bb8b84:

in Pool_Work_Thread(), we could break out of the for (i = 0; i <
TASK_QUEUE__END; i++) loop with tp set to the value from the previous
iteration of the top while() loop where if should have been NULL (for no
task found).

Noticed staring at varnishcache#3192 - unclear yet if related
This test is to detect a deadlock which does not exist any more. IMHO,
the only sensible way to test for the lack of it now is to do a load
test, which is not what we want in vtc.

Conflicts:
	bin/varnishtest/tests/r02418.vtc
@rezan
Copy link
Member

rezan commented Jun 16, 2020

ok to close? This was completed in #3323.

@dridi dridi closed this Jun 16, 2020
@dridi dridi deleted the dry-reserve branch June 16, 2020 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants