fix(threadpool): per-task nursery Scope so spawned coroutines can't outlive the task snapshot (UAF)#159
Closed
EdmondDantes wants to merge 3 commits into
Closed
fix(threadpool): per-task nursery Scope so spawned coroutines can't outlive the task snapshot (UAF)#159EdmondDantes wants to merge 3 commits into
EdmondDantes wants to merge 3 commits into
Conversation
…shot UAF) submit() deep-copies the task closure and every nested closure into a per-task snapshot arena. The non-coroutine worker freed that arena the instant the task body returned, but any coroutine the task had spawn()ed was still pending in the scheduler — its op_array lived in the just-freed arena, so the scheduler ran it against freed memory: 0xC0000005 on the Windows debug heap, latent (ASAN-detectable) on Linux. Fix: run each task body under its own per-task Scope in NOT-safe mode (a nursery). Async\spawn uses the current coroutine's scope, so coroutines the task spawns now land in task_scope. On task exit the worker restores its own scope, cancels task_scope and awaits its teardown (cancel + waker-resume on the scope event + SUSPEND, mirroring Scope::awaitAfterCancellation at the C level) BEFORE the snapshot is freed — so no spawned coroutine can outlive the snapshot. NOT-safe cancel means un-awaited children are cancelled at task exit, never awaited as zombies (bounded by the runtime grace). Verified on Windows: true-async/server core/007-server-transfer-handler- dispatch goes red->green; a minimal repro (ThreadPool task that spawns an un-awaited coroutine) no longer crashes; ext/async thread_pool suite 65 pass / 0 fail (incl. the #146/#154 lifetime tests). Regression test tests/thread_pool/065-task_scope_nursery_no_uaf.phpt. Follow-up: coroutine-mode submit already runs each task in its own scope but frees the snapshot on the task coroutine's dispose rather than on scope completion — same latent UAF, not yet fixed here.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
…outine count) The first cut gated the post-cancel await on IS_COMPLETELY_DONE, which (can_be_disposed with_zombies=true) flips true the moment children are cancelled — while their coroutine objects, holding op_arrays that live in the snapshot arena, are still queued for disposal on a later scheduler tick. The worker skipped the await and freed the arena; the deferred coroutine_dispose then ran destroy_op_array against freed memory (ASAN heap-use-after-free on Linux; masked by timing on the Windows debug heap). It also called resume_when on an already-terminated scope event when the task body had run its own children to completion (core/007: "event cannot be used after it has been terminated"). Mirror Scope::awaitAfterCancellation: skip when the scope is already CLOSED (children disposed, event gone), otherwise await while live coroutines or child scopes remain (async_scope_t.coroutines.length). The SUSPEND now lets each child run to full disposal while the arena is alive; cooperative scheduling guarantees they finish before the worker resumes and frees it. core/007 green 3/3; ext/async thread_pool 66/0.
…free ASAN caught a second UAF — this one on the task scope itself. When the last child coroutine finishes during the worker's SUSPEND, async_scope_notify_ coroutine_finished -> scope_try_to_dispose -> scope_dispose frees the scope; the worker then resumes and touches the freed scope (CLR_OWNER_PINNED / RELEASE) -> heap-use-after-free at thread_pool.c. Pin the scope with ZEND_ASYNC_SCOPE_SET_OWNER_PINNED before the cancel/await (same pattern pool_scope uses to outlive its tasks) and clear it right before RELEASE, so the scope survives the await and the worker owns its disposal. core/007 green 3/3; ext/async thread_pool 66/0.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
ThreadPool::submit()deep-copies the task closure and every nested closure into a per-task snapshot arena (the bytecode/op_array). The non-coroutine worker freed that arena the instant the task body returned (thread_pool.ctask_cleanup) — but any coroutine the task hadspawn()ed was still pending in the scheduler. That coroutine'sop_arraylives in the just-freed arena, so the scheduler later ran it against freed memory:0xC0000005(hard crash) on the Windows debug heap (0xFEEEFEEEfreed-fill).free()doesn't clobber) — and the server suite has no ASAN job, so it went unnoticed there.Minimal, server-free repro (crashes before this fix):
Surfaced by true-async/server
core/007-server-transfer-handler-dispatch(its task doesspawn(stop-coroutine)+$server->start()).Fix — structured concurrency (nursery)
Run each task body under its own per-task
Scopein NOT-safe mode.Async\spawnparents to the current coroutine's scope, so coroutines the task spawns now land intask_scope. On task exit the worker:task_scope(NOT-safe → stragglers are cancelled, never awaited as zombies),ZEND_ASYNC_WAKER_NEW+zend_async_resume_when(&task_scope->event, …)+SUSPEND, mirroringScope::awaitAfterCancellationat the C level,…all before the snapshot is freed. So no spawned coroutine can outlive the snapshot. Un-awaited children are cancelled at task exit (bounded by the runtime cancellation grace); a task that wants a child to finish must
awaitit.Verification (Windows Debug_TS)
core/007-server-transfer-handler-dispatch: red → greenext/async/tests/thread_pool/: 65 pass / 0 fail (incl. the flock(): stack-use-after-return when coroutine is cancelled mid-flock (real ASAN finding) #146/ThreadPool/ThreadChannel: no test coverage for transferring $this-bound closures (incl. method-as-closure) #154 lifetime tests) — no regressionstests/thread_pool/065-task_scope_nursery_no_uaf.phpt(will trip ASAN on Linux without the fix)Follow-up (not in this PR)
Coroutine-mode
submitalready runs each task in its owntask_scope, but frees the snapshot on the task coroutine's dispose rather than on scope completion — the same latent UAF (a coroutine-mode minimal repro also crashes today). Tracking separately.