feat(oracle): unify migrations under Alembic with dialect dispatcher#1330
Merged
nicoloboschi merged 10 commits intomainfrom Apr 30, 2026
Merged
feat(oracle): unify migrations under Alembic with dialect dispatcher#1330nicoloboschi merged 10 commits intomainfrom
nicoloboschi merged 10 commits intomainfrom
Conversation
Oracle DDL was a 636-line idempotent file (`migrations_oracle.py`) outside Alembic, which meant no version tracking, no per-tenant version table, and schema drift every time a PG migration was added without a corresponding Oracle change. This unifies both backends behind a single Alembic tree. - New `alembic/_dialect.py::run_for_dialect(pg=, oracle=)` helper. Each migration declares `_pg_upgrade` / `_oracle_upgrade` and dispatches based on the live connection's dialect. - `alembic/env.py` is dialect-aware: PG keeps the existing search_path / read-write session setup; Oracle uses `ALTER SESSION SET CURRENT_SCHEMA` and `DDL_LOCK_TIMEOUT`. - `alembic/script.py.mako` scaffolds the new pattern by default. - All 59 existing PG migrations refactored mechanically — bodies moved into `_pg_upgrade` / `_pg_downgrade`, top-level dispatchers added. - New `o1a2b3c4d5e6_oracle_baseline` migration brings a fresh Oracle 23ai database to the current schema in one step (PG = no-op). Drops the legacy partition-conversion / dedup / `observation_sources` backfill since those only existed for pre-baseline Oracle installs we explicitly are not supporting. - `OracleBackend.run_migrations()` now goes through the unified Alembic pipeline; `migrations.py` skips the PG-specific advisory lock + pgvector setup when the URL is Oracle. - `migrations_oracle.py` deleted; tests updated to use `run_migrations()`. - New `tests/test_migration_shape.py` lint fails CI if any migration omits `run_for_dialect` — keeps drift from re-emerging. - CLAUDE.md updated with the new template and dialect-asymmetry guidance.
Adds test-python-client-oracle and test-typescript-client-oracle. These mirror the existing test-python-client / test-typescript-client jobs but spin up Oracle 23ai as a service container and point the API server at it via HINDSIGHT_API_DATABASE_BACKEND=oracle + DATABASE_URL. Why a new job instead of matrixing the existing one: Oracle Free's image takes ~2min to start and is network-heavy, so we don't want to pay that cost on every PR — only when oracle-tests is opted in via the PR label, matching the existing test-api-oracle gate. Why client tests, not unit tests: the unit suite already runs against both backends via the abstraction layer. Only the client tests exercise full HTTP round-trips with real serialized payloads, so they catch API changes that work on PG but break on Oracle (or vice versa) in ways the abstraction can't see.
- Move is_oracle_url to db_url.py and import from there in env.py and migrations.py — was duplicated in both. - Type-annotate _configure_pg_session / _configure_oracle_session params (Engine, Connection); ty checks pass. - Update the Oracle baseline comment around vector + text index creation to make the hard requirement explicit: VECTOR + CTXSYS must be available, the migration fails hard if either is missing. The swallow-only-ORA-00955 behavior was already correct; the previous comment misleadingly called it "best-effort".
…ated-files green Three formatting-only changes prettier wants to make. They've been stale on main; CI's verify-generated-files runs lint with LINT_ALL=1 (vs the "only changed integrations" local default), which surfaces them on every unrelated PR. Folding them in here so this PR can land.
Line 312 of fact_storage.py references ``ops`` without ``handle_document_tracking`` declaring it as a parameter — straight NameError on every retain that walks the upsert path. Bug landed on main in d8ec2d7 (#1325) when ``delete_stale_observations_for_memories`` started taking a backend-aware ``ops`` to choose between the PG array operator and the Oracle junction table; the call site was added but the parameter wasn't threaded into the enclosing function. Fix: add ``ops=None`` to ``handle_document_tracking`` and pass ``pool.ops`` from each of the three call sites in orchestrator.py. This is unrelated to the Alembic dialect-dispatcher refactor in this PR but is what's blocking it — the NameError caused 17 retain tests to fail (and left a pytest-xdist worker in a state that hung the whole job at 99%).
The test calls fact_storage.handle_document_tracking directly, which delegates to delete_stale_observations_for_memories(ops=ops). With ops=None the helper falls back to the Oracle junction-table query and fails on PG with "relation public.observation_sources does not exist". Real callers (orchestrator, _delete_stale_observations_for_memories wrapper) all pass self._backend.ops; the test just needs to do the same.
Reserve the "oracle-tests" label for the heavy test-api-oracle (full unit suite). The two client integration jobs against Oracle should run on every API/client change just like their PG counterparts — the whole point is to catch PG/Oracle drift before merge, which doesn't work if you have to remember to label every PR. test-api-oracle keeps its label gate because the full suite is too slow to run on every push.
Oracle Free / Autonomous DB only register a service name with the listener, but SQLAlchemy's oracle+oracledb dialect interprets the URL path as a SID. That mismatch crashes alembic migrations on first connect: DPY-6003: SID "FREEPDB1" is not registered with the listener Rewrite ``oracle://user:pass@host:port/SERVICE`` to ``oracle+oracledb://user:pass@host:port/?service_name=SERVICE`` so the dialect uses the correct connect descriptor. ``?sid=`` and ``?service_name=`` already in the URL are passed through untouched. Also adds scripts/dev/start-oracle.sh / stop-oracle.sh that spin up the same Oracle 23ai Free image CI uses (``container-registry.oracle.com/database/free``) and bootstrap the HINDSIGHT_TEST user, so we can repro this kind of issue locally without round-tripping through GitHub Actions.
On Oracle, alembic runs each migration with transactional_ddl=False
("Will assume non-transactional DDL"). Each CREATE TABLE auto-commits, but
the trailing ``UPDATE alembic_version SET version_num = ...`` is plain DML
that needs an explicit COMMIT. Without it the connection close rolls the
update back, leaving the schema fully created but the version row one
revision behind — so ``run_migrations`` reports success while the head row
sits at the previous revision.
Caught locally with the new scripts/dev/start-oracle.sh harness running the
same Oracle 23ai Free image CI uses; alembic_version was stuck at
``k6l7m8n9o0p1`` even though every table from the ``o1a2b3c4d5e6`` baseline
existed. After the fix it correctly advances to ``o1a2b3c4d5e6``, and a
second run is a no-op as expected.
PG already needs the same commit (Supabase RW-mode SET), so just drop the
``if not is_oracle`` guard.
The python client pyproject.toml defaults to -n auto (pytest-xdist). Against Oracle that hits row-level deadlocks during retain cleanup — ORA-00060 is logged repeatedly in the API server output and most tests fail with "Internal Server Error" at fixture teardown. Same shape as the existing test-api-oracle issue, which is already pinned to -n0. Override to -n0 in the Oracle client job (only). The PG client job stays parallel since pgvector + advisory locks handle concurrent retain fine. TS client tests are unaffected — they run via vitest, not pytest.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
run_for_dialect(pg=..., oracle=...);_pg_upgrade/_oracle_upgradeslots make per-dialect intent explicit and visible in review.tests/test_migration_shape.py) fails CI if a migration skips the dispatcher, which is the guardrail that keeps PG/Oracle drift from coming back.What changed
alembic/_dialect.pyhelper + dialect-awarealembic/env.py(PG keepssearch_path/ RW-session setup; Oracle usesCURRENT_SCHEMA+DDL_LOCK_TIMEOUT)._pg_upgrade, top-level dispatcher added). No behavioural change on PG.o1a2b3c4d5e6_oracle_baseline.pybrings a fresh Oracle 23ai DB to current schema in one step. Dropped the legacy partition-conversion /memory_linksdedup /observation_sourcesbackfill — those existed only for pre-baseline Oracle installs we explicitly aren't supporting.OracleBackend.run_migrations()now calls the shared Alembic pipeline.migrations.pyskips the PG-only advisory-lock + pgvector setup on Oracle URLs.migrations_oracle.pydeleted (636 lines). Tests updated to callrun_migrations().script.py.makoscaffolds the new pattern; CLAUDE.md updated with the template and the dialect-asymmetry rule.Open questions for review
memory_unitsto partitioning, dedupedmemory_links, and backfilledobservation_sourceson every run. Are we OK that the new baseline can't migrate any pre-existing Oracle install? My read was yes — Oracle hasn't shipped — but worth a confirm.k6l7m8n9o0p1conflict resolution. Resolved to keep the PG-no-op behaviour frommain; updated the docstring to point at the new baseline. Worth a sanity check.IF NOT EXISTS/ 955-swallowing makes concurrent runs safe in practice, but if you want true cross-worker mutual exclusion later we should addDBMS_LOCK.Test plan
./scripts/hooks/lint.sh— clean.tests/test_migration_shape.py— 60/60 pass.tests/test_db_abstraction.py— 140/140 pass.o1a2b3c4d5e6).CREATE INDEX CONCURRENTLYmigrations (e.g.c1a2b3d4e5f6); they useop.execute("COMMIT")which still works through the dispatcher but is the trickiest case.