feat(oracle): unify migrations under Alembic with dialect dispatcher by nicoloboschi · Pull Request #1330 · vectorize-io/hindsight

nicoloboschi · 2026-04-29T14:19:49Z

Summary

Bring Oracle migrations into Alembic so both PG and Oracle share a single revision tree, version table, and tenant-isolation story before we ship Oracle support.
Each migration file dispatches via run_for_dialect(pg=..., oracle=...); _pg_upgrade / _oracle_upgrade slots make per-dialect intent explicit and visible in review.
A pytest lint (tests/test_migration_shape.py) fails CI if a migration skips the dispatcher, which is the guardrail that keeps PG/Oracle drift from coming back.

What changed

New alembic/_dialect.py helper + dialect-aware alembic/env.py (PG keeps search_path / RW-session setup; Oracle uses CURRENT_SCHEMA + DDL_LOCK_TIMEOUT).
59 existing PG migrations mechanically refactored (bodies → _pg_upgrade, top-level dispatcher added). No behavioural change on PG.
New o1a2b3c4d5e6_oracle_baseline.py brings a fresh Oracle 23ai DB to current schema in one step. Dropped the legacy partition-conversion / memory_links dedup / observation_sources backfill — those existed only for pre-baseline Oracle installs we explicitly aren't supporting.
OracleBackend.run_migrations() now calls the shared Alembic pipeline. migrations.py skips the PG-only advisory-lock + pgvector setup on Oracle URLs.
migrations_oracle.py deleted (636 lines). Tests updated to call run_migrations().
script.py.mako scaffolds the new pattern; CLAUDE.md updated with the template and the dialect-asymmetry rule.

Open questions for review

Dropping the post-baseline patches. The old file converted memory_units to partitioning, deduped memory_links, and backfilled observation_sources on every run. Are we OK that the new baseline can't migrate any pre-existing Oracle install? My read was yes — Oracle hasn't shipped — but worth a confirm.
k6l7m8n9o0p1 conflict resolution. Resolved to keep the PG-no-op behaviour from main; updated the docstring to point at the new baseline. Worth a sanity check.
Concurrency. Oracle path skips PG advisory locks. Oracle DDL is autocommit and IF NOT EXISTS / 955-swallowing makes concurrent runs safe in practice, but if you want true cross-worker mutual exclusion later we should add DBMS_LOCK.

Test plan

./scripts/hooks/lint.sh — clean.
tests/test_migration_shape.py — 60/60 pass.
tests/test_db_abstraction.py — 140/140 pass.
Alembic tree resolves to a single head (o1a2b3c4d5e6).
Live PG migration run end-to-end (couldn't validate locally — pg0 fixture failed to bind a TCP socket; failure was unrelated to this change).
Live Oracle migration run against Oracle 23ai (needs a running instance — please verify before merge).
Manual eyeball of CREATE INDEX CONCURRENTLY migrations (e.g. c1a2b3d4e5f6); they use op.execute("COMMIT") which still works through the dispatcher but is the trickiest case.

Oracle DDL was a 636-line idempotent file (`migrations_oracle.py`) outside Alembic, which meant no version tracking, no per-tenant version table, and schema drift every time a PG migration was added without a corresponding Oracle change. This unifies both backends behind a single Alembic tree. - New `alembic/_dialect.py::run_for_dialect(pg=, oracle=)` helper. Each migration declares `_pg_upgrade` / `_oracle_upgrade` and dispatches based on the live connection's dialect. - `alembic/env.py` is dialect-aware: PG keeps the existing search_path / read-write session setup; Oracle uses `ALTER SESSION SET CURRENT_SCHEMA` and `DDL_LOCK_TIMEOUT`. - `alembic/script.py.mako` scaffolds the new pattern by default. - All 59 existing PG migrations refactored mechanically — bodies moved into `_pg_upgrade` / `_pg_downgrade`, top-level dispatchers added. - New `o1a2b3c4d5e6_oracle_baseline` migration brings a fresh Oracle 23ai database to the current schema in one step (PG = no-op). Drops the legacy partition-conversion / dedup / `observation_sources` backfill since those only existed for pre-baseline Oracle installs we explicitly are not supporting. - `OracleBackend.run_migrations()` now goes through the unified Alembic pipeline; `migrations.py` skips the PG-specific advisory lock + pgvector setup when the URL is Oracle. - `migrations_oracle.py` deleted; tests updated to use `run_migrations()`. - New `tests/test_migration_shape.py` lint fails CI if any migration omits `run_for_dialect` — keeps drift from re-emerging. - CLAUDE.md updated with the new template and dialect-asymmetry guidance.

Adds test-python-client-oracle and test-typescript-client-oracle. These mirror the existing test-python-client / test-typescript-client jobs but spin up Oracle 23ai as a service container and point the API server at it via HINDSIGHT_API_DATABASE_BACKEND=oracle + DATABASE_URL. Why a new job instead of matrixing the existing one: Oracle Free's image takes ~2min to start and is network-heavy, so we don't want to pay that cost on every PR — only when oracle-tests is opted in via the PR label, matching the existing test-api-oracle gate. Why client tests, not unit tests: the unit suite already runs against both backends via the abstraction layer. Only the client tests exercise full HTTP round-trips with real serialized payloads, so they catch API changes that work on PG but break on Oracle (or vice versa) in ways the abstraction can't see.

- Move is_oracle_url to db_url.py and import from there in env.py and migrations.py — was duplicated in both. - Type-annotate _configure_pg_session / _configure_oracle_session params (Engine, Connection); ty checks pass. - Update the Oracle baseline comment around vector + text index creation to make the hard requirement explicit: VECTOR + CTXSYS must be available, the migration fails hard if either is missing. The swallow-only-ORA-00955 behavior was already correct; the previous comment misleadingly called it "best-effort".

…ated-files green Three formatting-only changes prettier wants to make. They've been stale on main; CI's verify-generated-files runs lint with LINT_ALL=1 (vs the "only changed integrations" local default), which surfaces them on every unrelated PR. Folding them in here so this PR can land.

Line 312 of fact_storage.py references ``ops`` without ``handle_document_tracking`` declaring it as a parameter — straight NameError on every retain that walks the upsert path. Bug landed on main in d8ec2d7 (#1325) when ``delete_stale_observations_for_memories`` started taking a backend-aware ``ops`` to choose between the PG array operator and the Oracle junction table; the call site was added but the parameter wasn't threaded into the enclosing function. Fix: add ``ops=None`` to ``handle_document_tracking`` and pass ``pool.ops`` from each of the three call sites in orchestrator.py. This is unrelated to the Alembic dialect-dispatcher refactor in this PR but is what's blocking it — the NameError caused 17 retain tests to fail (and left a pytest-xdist worker in a state that hung the whole job at 99%).

The test calls fact_storage.handle_document_tracking directly, which delegates to delete_stale_observations_for_memories(ops=ops). With ops=None the helper falls back to the Oracle junction-table query and fails on PG with "relation public.observation_sources does not exist". Real callers (orchestrator, _delete_stale_observations_for_memories wrapper) all pass self._backend.ops; the test just needs to do the same.

Reserve the "oracle-tests" label for the heavy test-api-oracle (full unit suite). The two client integration jobs against Oracle should run on every API/client change just like their PG counterparts — the whole point is to catch PG/Oracle drift before merge, which doesn't work if you have to remember to label every PR. test-api-oracle keeps its label gate because the full suite is too slow to run on every push.

Oracle Free / Autonomous DB only register a service name with the listener, but SQLAlchemy's oracle+oracledb dialect interprets the URL path as a SID. That mismatch crashes alembic migrations on first connect: DPY-6003: SID "FREEPDB1" is not registered with the listener Rewrite ``oracle://user:pass@host:port/SERVICE`` to ``oracle+oracledb://user:pass@host:port/?service_name=SERVICE`` so the dialect uses the correct connect descriptor. ``?sid=`` and ``?service_name=`` already in the URL are passed through untouched. Also adds scripts/dev/start-oracle.sh / stop-oracle.sh that spin up the same Oracle 23ai Free image CI uses (``container-registry.oracle.com/database/free``) and bootstrap the HINDSIGHT_TEST user, so we can repro this kind of issue locally without round-tripping through GitHub Actions.

On Oracle, alembic runs each migration with transactional_ddl=False ("Will assume non-transactional DDL"). Each CREATE TABLE auto-commits, but the trailing ``UPDATE alembic_version SET version_num = ...`` is plain DML that needs an explicit COMMIT. Without it the connection close rolls the update back, leaving the schema fully created but the version row one revision behind — so ``run_migrations`` reports success while the head row sits at the previous revision. Caught locally with the new scripts/dev/start-oracle.sh harness running the same Oracle 23ai Free image CI uses; alembic_version was stuck at ``k6l7m8n9o0p1`` even though every table from the ``o1a2b3c4d5e6`` baseline existed. After the fix it correctly advances to ``o1a2b3c4d5e6``, and a second run is a no-op as expected. PG already needs the same commit (Supabase RW-mode SET), so just drop the ``if not is_oracle`` guard.

The python client pyproject.toml defaults to -n auto (pytest-xdist). Against Oracle that hits row-level deadlocks during retain cleanup — ORA-00060 is logged repeatedly in the API server output and most tests fail with "Internal Server Error" at fixture teardown. Same shape as the existing test-api-oracle issue, which is already pinned to -n0. Override to -n0 in the Oracle client job (only). The PG client job stays parallel since pgvector + advisory locks handle concurrent retain fine. TS client tests are unaffected — they run via vitest, not pytest.

nicoloboschi added 6 commits April 29, 2026 16:19

nicoloboschi added the oracle-tests Enable Oracle 23ai integration tests in CI label Apr 30, 2026

nicoloboschi removed the oracle-tests Enable Oracle 23ai integration tests in CI label Apr 30, 2026

nicoloboschi added 3 commits April 30, 2026 12:37

nicoloboschi merged commit d39a2ca into main Apr 30, 2026
61 of 63 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(oracle): unify migrations under Alembic with dialect dispatcher#1330

feat(oracle): unify migrations under Alembic with dialect dispatcher#1330
nicoloboschi merged 10 commits intomainfrom
feat/oracle-alembic-migrations

nicoloboschi commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nicoloboschi commented Apr 29, 2026

Summary

What changed

Open questions for review

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant