Skip to content

fix(threadpool): per-task nursery Scope so spawned coroutines can't outlive the task snapshot (UAF)#159

Closed
EdmondDantes wants to merge 3 commits into
mainfrom
windows-threadpool-snapshot-uaf
Closed

fix(threadpool): per-task nursery Scope so spawned coroutines can't outlive the task snapshot (UAF)#159
EdmondDantes wants to merge 3 commits into
mainfrom
windows-threadpool-snapshot-uaf

Conversation

@EdmondDantes
Copy link
Copy Markdown
Contributor

Problem

ThreadPool::submit() deep-copies the task closure and every nested closure into a per-task snapshot arena (the bytecode/op_array). The non-coroutine worker freed that arena the instant the task body returned (thread_pool.c task_cleanup) — but any coroutine the task had spawn()ed was still pending in the scheduler. That coroutine's op_array lives in the just-freed arena, so the scheduler later ran it against freed memory:

  • 0xC0000005 (hard crash) on the Windows debug heap (0xFEEEFEEE freed-fill).
  • Latent on Linux (glibc free() doesn't clobber) — and the server suite has no ASAN job, so it went unnoticed there.

Minimal, server-free repro (crashes before this fix):

$pool = new Async\ThreadPool(1);
$f = $pool->submit(function () {
    Async\spawn(function () { echo "child\n"; });  // pending when the body returns
});
Async\await($f);

Surfaced by true-async/server core/007-server-transfer-handler-dispatch (its task does spawn(stop-coroutine) + $server->start()).

Fix — structured concurrency (nursery)

Run each task body under its own per-task Scope in NOT-safe mode. Async\spawn parents to the current coroutine's scope, so coroutines the task spawns now land in task_scope. On task exit the worker:

  1. restores its own scope,
  2. cancels task_scope (NOT-safe → stragglers are cancelled, never awaited as zombies),
  3. awaits the scope's teardown — ZEND_ASYNC_WAKER_NEW + zend_async_resume_when(&task_scope->event, …) + SUSPEND, mirroring Scope::awaitAfterCancellation at the C level,

…all before the snapshot is freed. So no spawned coroutine can outlive the snapshot. Un-awaited children are cancelled at task exit (bounded by the runtime cancellation grace); a task that wants a child to finish must await it.

Verification (Windows Debug_TS)

Follow-up (not in this PR)

Coroutine-mode submit already runs each task in its own task_scope, but frees the snapshot on the task coroutine's dispose rather than on scope completion — the same latent UAF (a coroutine-mode minimal repro also crashes today). Tracking separately.

…shot UAF)

submit() deep-copies the task closure and every nested closure into a
per-task snapshot arena. The non-coroutine worker freed that arena the
instant the task body returned, but any coroutine the task had spawn()ed
was still pending in the scheduler — its op_array lived in the just-freed
arena, so the scheduler ran it against freed memory: 0xC0000005 on the
Windows debug heap, latent (ASAN-detectable) on Linux.

Fix: run each task body under its own per-task Scope in NOT-safe mode (a
nursery). Async\spawn uses the current coroutine's scope, so coroutines the
task spawns now land in task_scope. On task exit the worker restores its own
scope, cancels task_scope and awaits its teardown (cancel + waker-resume on
the scope event + SUSPEND, mirroring Scope::awaitAfterCancellation at the C
level) BEFORE the snapshot is freed — so no spawned coroutine can outlive
the snapshot. NOT-safe cancel means un-awaited children are cancelled at
task exit, never awaited as zombies (bounded by the runtime grace).

Verified on Windows: true-async/server core/007-server-transfer-handler-
dispatch goes red->green; a minimal repro (ThreadPool task that spawns an
un-awaited coroutine) no longer crashes; ext/async thread_pool suite 65
pass / 0 fail (incl. the #146/#154 lifetime tests). Regression test
tests/thread_pool/065-task_scope_nursery_no_uaf.phpt.

Follow-up: coroutine-mode submit already runs each task in its own scope
but frees the snapshot on the task coroutine's dispose rather than on scope
completion — same latent UAF, not yet fixed here.
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…outine count)

The first cut gated the post-cancel await on IS_COMPLETELY_DONE, which
(can_be_disposed with_zombies=true) flips true the moment children are
cancelled — while their coroutine objects, holding op_arrays that live in
the snapshot arena, are still queued for disposal on a later scheduler tick.
The worker skipped the await and freed the arena; the deferred
coroutine_dispose then ran destroy_op_array against freed memory (ASAN
heap-use-after-free on Linux; masked by timing on the Windows debug heap).
It also called resume_when on an already-terminated scope event when the
task body had run its own children to completion (core/007: "event cannot
be used after it has been terminated").

Mirror Scope::awaitAfterCancellation: skip when the scope is already CLOSED
(children disposed, event gone), otherwise await while live coroutines or
child scopes remain (async_scope_t.coroutines.length). The SUSPEND now lets
each child run to full disposal while the arena is alive; cooperative
scheduling guarantees they finish before the worker resumes and frees it.

core/007 green 3/3; ext/async thread_pool 66/0.
…free

ASAN caught a second UAF — this one on the task scope itself. When the last
child coroutine finishes during the worker's SUSPEND, async_scope_notify_
coroutine_finished -> scope_try_to_dispose -> scope_dispose frees the scope;
the worker then resumes and touches the freed scope (CLR_OWNER_PINNED /
RELEASE) -> heap-use-after-free at thread_pool.c.

Pin the scope with ZEND_ASYNC_SCOPE_SET_OWNER_PINNED before the cancel/await
(same pattern pool_scope uses to outlive its tasks) and clear it right
before RELEASE, so the scope survives the await and the worker owns its
disposal. core/007 green 3/3; ext/async thread_pool 66/0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant