Worker scheduling back-ports for 6.0 #3290

dridi · 2020-04-14T16:11:13Z

This is a back-port of #2942 and #2796 with their dependencies to avoid conflicts.

Conflicts still exist, mostly because I didn't port #3099 in the process.

it should remove the worker it returns from the idle queue for clarity

When getidleworker signals the pool herder to spawn a thread, it increases the dry counter, and the herder resets dry again when spawning a single thread. This will in many cases only create a single thread even though the herder was signaled dry multiple times, and may cause a situation where the cache acceptor is queued and no new thread is created. Together with long lasting tasks (ie endless pipelines), and all other tasks having higher priority, this will prevent the cache acceptor from being rescheduled. c00096.vtc demonstrates how this can lock up. To fix this, spawn threads if we have queued tasks and we are below the thread maximum level.

Fix a small memory leak when failing to spawn a new thread.

@dridi

Previously, we used a minimum number of idle threads (the reserve) to ensure that we do not assign all threads with client requests and no threads left over for backend requests. This was actually only a special case of the more general issue exposed by h2: Lower priority tasks depend on higher priority tasks (for h2, sessions need streams, which need requests, which may need backend requests). To solve this problem, we divide the reserve by the number of priority classes and schedule lower priority tasks only if there are enough idle threads to run higher priority tasks eventually. This change does not guarantee any upper limit on the amount of time it can take for a task to be scheduled (e.g. backend requests could be blocking on arbitrarily long timeouts), so the thread pool watchdog is still warranted. But this change should guarantee that we do make progress eventually. With the reserves, thread_pool_min needs to be no smaller than the number of priority classes (TASK_QUEUE__END). Ideally, we should have an even higher minimum (@dridi rightly suggested to make it 2 * TASK_QUEUE__END), but that would prevent the very useful test t02011.vtc. For now, the value of TASK_QUEUE__END (5) is hardcoded as such for the parameter configuration and documentation because auto-generating it would require include/macro dances which I consider over the top for now. Instead, the respective places are marked and an assert is in place to ensure we do not start a worker with too small a number of workers. I dicided against checks in the manager to avoid include pollution from the worker (cache.h) into the manager. Fixes varnishcache#2418 for real Conflicts: bin/varnishd/cache/cache_wrk.c bin/varnishd/mgt/mgt_pool.c

... introduced with 3bb8b84: in Pool_Work_Thread(), we could break out of the for (i = 0; i < TASK_QUEUE__END; i++) loop with tp set to the value from the previous iteration of the top while() loop where if should have been NULL (for no task found). Noticed staring at varnishcache#3192 - unclear yet if related

This test is to detect a deadlock which does not exist any more. IMHO, the only sensible way to test for the lack of it now is to do a load test, which is not what we want in vtc. Conflicts: bin/varnishtest/tests/r02418.vtc

rezan · 2020-06-16T17:19:51Z

ok to close? This was completed in #3323.

nigoroll and others added 7 commits April 14, 2020 17:53

polish: move duplicated code into pool_getidleworker()

2c126b8

it should remove the worker it returns from the idle queue for clarity

Fix a small memory leak

050caef

Fix a small memory leak when failing to spawn a new thread.

SQUASHME: c96 stabilization

876e174

remove a now pointless vtc

d49ba3b

This test is to detect a deadlock which does not exist any more. IMHO, the only sensible way to test for the lack of it now is to do a load test, which is not what we want in vtc. Conflicts: bin/varnishtest/tests/r02418.vtc

dridi added c=varnishd r=6.0 labels Apr 14, 2020

dridi closed this Jun 16, 2020

dridi deleted the dry-reserve branch June 16, 2020 18:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Worker scheduling back-ports for 6.0 #3290

Worker scheduling back-ports for 6.0 #3290

dridi commented Apr 14, 2020

rezan commented Jun 16, 2020

Worker scheduling back-ports for 6.0 #3290

Worker scheduling back-ports for 6.0 #3290

Conversation

dridi commented Apr 14, 2020

rezan commented Jun 16, 2020