Skip to content

PoolInner::acquire() does not try the idle queue after a transient connection failure #2848

@abonander

Description

@abonander

While thinking about potential sources of the infamous PoolTimedOut error, I realized that there's an interesting failure mode to acquire().

Once it decides to open a new connection, that's all it tries to do: https://github.com/launchbadge/sqlx/blob/e1ac3881734293cb33674a8b0b1d983132b9c2b1/sqlx-core/src/pool/inner.rs#L283-L284

If a nonfatal connection error happens, it just continues in the backoff loop in connect() and never touches the idle queue again: https://github.com/launchbadge/sqlx/blob/e1ac3881734293cb33674a8b0b1d983132b9c2b1/sqlx-core/src/pool/inner.rs#L348

It will continue to do this until the timeout if the transient error does not resolve itself.

Right now, only the Postgres driver overrides DatabaseError::is_transient_in_connect_phase(), but one of the error codes it considers transient is the "too many connections" error: https://github.com/launchbadge/sqlx/blob/e1ac3881734293cb33674a8b0b1d983132b9c2b1/sqlx-postgres/src/error.rs#L192-L195

This means that if the max_connections of the pool exceeds what is currently available on the server, tasks can get stuck in a loop trying to open new connections despite there being idle connections available, leading to surprising PoolTimedOut errors.

This is potentially the cause of some such issues being reported, although it's only likely to occur with the Postgres driver.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugpoolRelated to SQLx's included connection pool

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions