Skip to content

#141 Pool: wake parked waiter when broken-release destroys resource#142

Merged
EdmondDantes merged 2 commits into
mainfrom
141-pdo-pool-deadlocks-when-over-saturated-against-a-permanently-failing-connection-chaos-finding
May 24, 2026
Merged

#141 Pool: wake parked waiter when broken-release destroys resource#142
EdmondDantes merged 2 commits into
mainfrom
141-pdo-pool-deadlocks-when-over-saturated-against-a-permanently-failing-connection-chaos-finding

Conversation

@EdmondDantes
Copy link
Copy Markdown
Contributor

Summary

Closes #141.

Root cause in zend_async_pool_release: when beforeRelease returned false (PDO marks conn_broken=true after a Toxiproxy reset_peer), the pool destroyed the resource and decremented active_count but never woke a parked waiter. The slot was conceptually free but only release() wakes waiters — with every connection in flight failing, parked coroutines deadlocked until the global Async\DeadlockError detector fired.

Fix

  • zend_async_pool_release (primary): after destroying a broken resource, call pool_wake_waiter so a parked coroutine retries the factory. If that factory also fails, the cascade continues — each waiter propagates its own exception cleanly.
  • zend_async_pool_acquire (defensive): factory failure on the slot-reservation path now also wakes a waiter, and throws PoolException when the factory returns false without setting an exception — fail-fast instead of falling through to pool_wait_for_resource with nothing able to wake it.

Test plan

  • Unit regression: tests/pool/054-pool_broken_release_wakes_waiters.phpt (fails without fix, passes with — verified by reverting fix and re-running)
  • Full pool suite: 54/54 pass (ext/async/tests/pool/)
  • Issue reproducer (toxiproxy + MySQL, max=1, 3 coroutines, reset_peer): all 3 coroutines get clean PDOException, no DeadlockError, no global deadlock detector firing. 5/5 stable runs.
  • Healthy-pool sanity (5 coroutines on max=2): all succeed.

beforeRelease=false (e.g. PDO conn_broken after Toxiproxy reset_peer)
destroyed the resource and decremented active_count but never woke a
parked waiter. The slot was conceptually free but only release() wakes
waiters — with every connection in flight failing, parked coroutines
deadlocked until the global Async\DeadlockError detector fired.

Fix in zend_async_pool_release: after destroying a broken resource,
call pool_wake_waiter so a parked coroutine retries the factory (and
either succeeds or propagates its own exception, cascading to the next
waiter).

Defensive fix in zend_async_pool_acquire: factory failure on the
slot-reservation path now also wakes a waiter and throws PoolException
when the factory returns false without setting an exception — fail-fast
instead of falling through to pool_wait_for_resource with nothing able
to wake it.

Regression test: tests/pool/054-pool_broken_release_wakes_waiters.phpt
(fails without fix, passes with).
Both LINUX_X64_DEBUG_ZTS and LINUX_X64_RELEASE_NTS on PR #142 failed at
"Install build dependencies" with a 502 Bad Gateway from github.com while
fetching curl-8.5.0.tar.gz. Pure infra flake — but the bare wget gave up
on first 5xx with no retry. Add --tries=5 --waitretry=10
--retry-connrefused --retry-on-http-error=429,500,502,503,504 so the next
transient GitHub hiccup doesn't tank an otherwise-green PR.
@EdmondDantes EdmondDantes merged commit fc29741 into main May 24, 2026
7 checks passed
@EdmondDantes EdmondDantes deleted the 141-pdo-pool-deadlocks-when-over-saturated-against-a-permanently-failing-connection-chaos-finding branch May 24, 2026 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PDO pool deadlocks when over-saturated against a permanently-failing connection (chaos finding)

1 participant