You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The PDO connection pool can deadlock waiters when the underlying connection establishment fails permanently. With pool max=N and M > N concurrent coroutines, the first N coroutines acquire a slot, hit the failure, release the slot — but the remaining M-N waiters never acquire one, because every new connection attempt fails too. Pool reports Pool(idle=0, active=0, max=N) — the slot is conceptually free, but the pool can't establish a connection to back it. Waiters park until the global Async\DeadlockError detector trips.
Surfaced while writing db/pool_max_reset_chaos.feature for #138 (see #140) and documented in that feature's header as out-of-scope for the chaos backstop.
coro 1: PDOException: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away
=== DEADLOCK REPORT START ===
Coroutines waiting: 3, active_events: 0
Coroutine 5 spawned at :0, suspended at /tmp/pool_deadlock_repro.php:58 (main)
waiting for:
- Coroutine 9 spawned at … line 50 ({closure})
- Coroutine 11 spawned at … line 50 ({closure})
Coroutine 9 …
waiting for:
- Pool(idle=0, active=0, max=1)
Coroutine 11 …
waiting for:
- Pool(idle=0, active=0, max=1)
=== DEADLOCK REPORT END ===
coro 2: PDOException: SQLSTATE[HY000]: General error: Failed to acquire connection from pool
coro 3: PDOException: SQLSTATE[HY000]: General error: Failed to acquire connection from pool
Fatal error: Uncaught Async\DeadlockError: Deadlock detected …
Expected
Either:
Fail-fast on the pool acquire path — when the pool tries to back a free slot with a fresh connection and that connection establishment fails, the waiter should receive the same "Failed to acquire connection from pool" PDOException immediately, not after the global deadlock detector trips.
Bounded retry budget — N retries with backoff, then fail-fast.
Crucially the Async\DeadlockError thrown at process level shouldn't be the mechanism that wakes the waiters — it's a system-wide signal, not a per-pool one, and it makes the failure look like a runtime bug to the application code.
Note that coro 2/coro 3 do eventually print their "Failed to acquire connection from pool" — meaning the pool already has the fail-fast code path. The bug is the ordering: the per-waiter fail-fast triggers after the deadlock detector, instead of being the proximate cause of the waiter waking up.
Summary
The PDO connection pool can deadlock waiters when the underlying connection establishment fails permanently. With pool
max=NandM > Nconcurrent coroutines, the firstNcoroutines acquire a slot, hit the failure, release the slot — but the remainingM-Nwaiters never acquire one, because every new connection attempt fails too. Pool reportsPool(idle=0, active=0, max=N)— the slot is conceptually free, but the pool can't establish a connection to back it. Waiters park until the globalAsync\DeadlockErrordetector trips.Surfaced while writing
db/pool_max_reset_chaos.featurefor #138 (see #140) and documented in that feature's header as out-of-scope for the chaos backstop.Reproducer (~1 s, requires Toxiproxy + MySQL)
Observed output
Expected
Either:
Crucially the
Async\DeadlockErrorthrown at process level shouldn't be the mechanism that wakes the waiters — it's a system-wide signal, not a per-pool one, and it makes the failure look like a runtime bug to the application code.Note that
coro 2/coro 3do eventually print their "Failed to acquire connection from pool" — meaning the pool already has the fail-fast code path. The bug is the ordering: the per-waiter fail-fast triggers after the deadlock detector, instead of being the proximate cause of the waiter waking up.Reproduction environment
true-async-stable/ php-asyncmain(post-Chaos test coverage gaps: Layer 1 IO + UDP/TLS/DNS/signal/watcher #138 merge)127.0.0.1:8474127.0.0.1:3306, usertest/test, dbchaos_testRelated
fuzzy-tests/db/pool_max_reset_chaos.featureheader (PR #138 chaos test coverage gaps — Layer 1 IO + DB #140).N coroutines == pool_size) precisely because of this deadlock.Scope
This is the pool-acquire path; it does not affect the in-flight query teardown (those raise PDOException normally —
coro 1above demonstrates).