Skip to content

fix(worker): probe pg_proc before calling optional schemas_with_pending_work()#1503

Merged
nicoloboschi merged 2 commits intomainfrom
fix/optional-pg-routines
May 7, 2026
Merged

fix(worker): probe pg_proc before calling optional schemas_with_pending_work()#1503
nicoloboschi merged 2 commits intomainfrom
fix/optional-pg-routines

Conversation

@nicoloboschi
Copy link
Copy Markdown
Collaborator

Summary

  • Fixes Worker poller _scan_active_schemas calls non-existent function, causing noisy PG logs and breaking single-tenant #1408. Stops the worker poller from calling schemas_with_pending_work() unconditionally, which was logging a server-side function does not exist error in Postgres every ~30s on fresh deployments where the optional PL/pgSQL routine isn't installed.
  • Adds a small centralised registry hindsight_api/engine/db/optional_routines.py that probes pg_proc once per process and memoises the result. Single source of truth for the install SQL — anyone grepping the routine name lands on the canonical definition.
  • Wires WorkerPoller._scan_active_schemas through the registry: if is_installed("schemas_with_pending_work") returns True it uses the server-side path, otherwise it runs the existing per-schema EXISTS fallback. Non-PG backends short-circuit to False without touching the DB.

Notes

  • "Routine" is the SQL-standard umbrella covering both PG functions and procedures (both stored in pg_proc). Module/abstraction uses that term so a future CREATE PROCEDURE slots in without a rename.
  • Cache is permanent for the life of the process. Installing the routine on a running cluster requires a worker restart — intentional, since these routines are deploy-time concerns and probe-per-poll would defeat the optimisation. Tell me if you'd rather have a TTL.
  • Issue Worker poller _scan_active_schemas calls non-existent function, causing noisy PG logs and breaking single-tenant #1408 also flags that the cloud's routine body only checks tenant_% schemas (broken for single-tenant). That's a property of the cloud Helm hook's SQL, not this code path — the per-schema EXISTS fallback already handles single-tenant correctly. Out of scope for this PR.

Test plan

  • New test_scan_uses_optional_routine_when_installed installs the routine, spies on PostgresConnection.fetch/fetchval, and asserts (a) pg_proc was probed, (b) schemas_with_pending_work() was invoked, (c) the per-schema EXISTS fallback did NOT run, (d) the cache records True. Drops the routine in finally.
  • Existing test_scan_finds_schemas_with_pending_work (uninstalled routine, fallback path) still passes.
  • Full TestClaimBatchRotation class (8 tests) passes locally.
  • ./scripts/hooks/lint.sh clean.

…ng_work() (#1408)

The poller called the optional PL/pgSQL routine `schemas_with_pending_work()`
unconditionally on every cycle. When the routine isn't installed (the default
for fresh deployments), Postgres logs a server-side `function does not exist`
error every ~30s even though the Python code silently caught the exception.

This adds a small `OptionalRoutines` registry/cache in
`hindsight_api/engine/db/optional_routines.py` that probes `pg_proc` once on
first lookup and memoises the result for the life of the process. The poller
now calls the routine only when it's actually installed and falls back to the
per-schema EXISTS path otherwise — without any spurious server-side errors.

The registry also carries the canonical install SQL for each routine inline,
so anyone touching the optimisation has a single source of truth (the previous
docstring lived only on `_scan_active_schemas`).

Tradeoffs:
- Probe is permanently cached: installing the routine on a running cluster
  requires a worker restart. Acceptable because these routines are expected
  to be installed once at deploy time, and a probe-per-poll would defeat the
  optimisation.
- Non-PG backends short-circuit to False without touching the DB.
…instead

Hindsight never installs schemas_with_pending_work() — operators do. Keeping
the SQL body in the API repo would drift from whatever is actually deployed
and falsely imply ownership. Replace the install_sql field on OptionalRoutine
with a contract docstring describing the expected signature, return shape,
and semantic constraints, so any operator-supplied implementation is
interchangeable as long as it matches.

The test installs a minimal contract-satisfying stub locally rather than
relying on a registry-supplied body.
@nicoloboschi nicoloboschi merged commit a1c1b7d into main May 7, 2026
135 of 138 checks passed
@nicoloboschi nicoloboschi deleted the fix/optional-pg-routines branch May 7, 2026 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Worker poller _scan_active_schemas calls non-existent function, causing noisy PG logs and breaking single-tenant

1 participant