fix(db_utils): make acquire_with_retry yield exactly once#1880
Merged
nicoloboschi merged 1 commit intoJun 1, 2026
Merged
Conversation
acquire_with_retry's retry loop wrapped the yield, violating
@asynccontextmanager's single-yield contract. When user code inside
the async with block raised a retryable exception, the loop iterated
and tried to yield again, producing RuntimeError("generator didn't
stop after athrow()") on every retryable inner error. This masked
the real cause and was the root of 1,934 identical failed
consolidation ops on shurick-memory in production since 2026-03-30.
Retry now wraps only the acquire (via AsyncExitStack). User-code
exceptions inside the block propagate as their real types — strictly
better for observability, since the prior retry-of-user-code branch
was already non-functional (always crashed with the RuntimeError above).
Includes a regression unit test asserting (a) the original retryable
exception propagates unchanged and (b) the connection is released
exactly once.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
acquire_with_retryis an@asynccontextmanagerwhose retry loop wrapsyield conn. That violates the single-yield contract: when the caller's body raises a retryable-looking exception (e.g.ConnectionError,asyncio.TimeoutError), the loop iterates and tries to yield a second time, and Python surfaces the resulting generator-protocol error as:— masking the real cause of the failure on every retryable inner error.
Fix: wrap only the acquire in the retry loop via
contextlib.AsyncExitStack. Theyieldruns exactly once, after a connection is successfully acquired. User-code exceptions propagate unchanged.Why this matters
_is_retryablematches a fairly broad set of error classes by name (InterfaceError,ConnectionDoesNotExistError,TooManyConnectionsError,DeadlockDetectedError) plusOSError/ConnectionError/asyncio.TimeoutError/ Oracle ORA-00060. Anything inside anasync with acquire_with_retry(backend) as conn:block that raised one of those would surface asRuntimeError("generator didn't stop after athrow()"). In our deployment that single masked bug accounted for ~1.9k identical failed consolidation operations on one large bank over ~57 days before we caught it — diagnosis was hard because the visible error pointed at the generator machinery, not the underlying transient.The prior retry-of-user-code branch was also already non-functional (always crashed with the
RuntimeErrorabove before any retry could happen), so removing it is strictly an observability and correctness improvement, not a behaviour change for the happy path.What changes
hindsight-api-slim/hindsight_api/engine/db_utils.py(backend path only — the legacyasyncpg.Poolpath is untouched)Before:
After:
Docstring extended to spell out the contract: the retry covers the acquire step; user-code exceptions are not retried.
Test plan
tests/test_db_utils.py::test_retryable_user_code_exception_propagates_unchangedasserts:ConnectionErrorraised inside theasync withbody propagates asConnectionError(notRuntimeError)__aexit__is called exactly once (no double-release onAsyncExitStackcleanup)uv run pytest tests/test_db_utils.py -v→ 1/1 passinguv run ruff check hindsight_api/engine/db_utils.py tests/test_db_utils.py→ cleanRefs
Original fix landed in our fork at xsolla/hindsight-agentic-memory#29; this PR cherry-picks the same commit onto
vectorize-io/hindsight:main.🤖 Generated with Claude Code