fix(db): wrap withOrgContext in $transaction so session var survives …#97
Merged
Merged
Conversation
…pool (Phase 0a)
Previously withOrgContext called client.$executeRawUnsafe to SET
app.current_org_id, then ran fn(client) as a separate call. With
Prisma connection pooling and Railway's 2 replicas, these two
operations can land on different connections — the session
variable evaporates between the set_config and the user's
queries, and any RLS policy that reads app.current_org_id sees
NULL.
Today this bug is masked because the application connects as a
superuser (RLS bypassed at role level). Once Phase 0b introduces
the non-bypass app_user role, every tenant-scoped query under
this helper would silently return [].
The fix wraps the helper in client.$transaction(...), which pins
a single connection for the duration of the callback. set_config
is invoked on the tx client (not the outer pooled client) using
$executeRaw with a tagged template (safer than the previous
$executeRawUnsafe). Isolation, maxWait, and timeout are set per
the rollout plan.
The callback signature changes from
fn: (client: PrismaClient) => Promise<T>
to
fn: (tx: Prisma.TransactionClient) => Promise<T>
because callers MUST use the tx for their queries — using the
outer client would bypass the org context. withOrgContext is
not yet called from any production route, so this signature
change does not break any consumer in this repo.
Tests:
- 5 existing unit tests rewritten against the new
$transaction-based shape. They verify: options bag is
correct (ReadCommitted, maxWait=5000, timeout=30000),
set_config runs on the tx (not the outer client), the
callback receives the tx instance, the callback's return
value is propagated, errors propagate.
- 1 new regression-guard test ensures the callback receives
the tx, not the outer client — guards against accidentally
reverting to the pool-leak shape.
- 1 it.skip() integration placeholder with TODO for Phase 1+
real-Postgres harness. The mocked unit tests verify SHAPE
of the fix but cannot verify the underlying pool-survival
semantics — that requires a real connection pool.
- 3 registerRLSMiddleware tests left untouched (legacy
$use path, now marked @deprecated in JSDoc).
Refs:
- skill-rls-rollout-PLAN-V2.md §4.1 (Phase 0a, CRITICAL #1)
- skill-rls-rollout-FORENSIC-REPORT.md §1 bullet 1, §3 #1, HAL-1
Verification:
- vitest src/lib/db/__tests__/rls-middleware.test.ts: 8 passed, 1 skipped
- tsc --noEmit: exit 0
Owner
Author
Self-reviewSolo repo — no second reviewer available. Documenting due diligence Diff verification
Caller verification
Test verification
CI verification
Scope verification
Proceeding with admin merge (bypass rules) under solo-repo policy. |
This was referenced May 18, 2026
webdevcom01-cell
added a commit
that referenced
this pull request
May 18, 2026
Same root cause as Phase 0a (PR #97): a SET LOCAL Postgres session variable and the subsequent query were issued as separate Prisma calls. With pool connections, the SET LOCAL can land on one connection and the SELECT on another — the ef_search tuning silently reverts to the server default before the vector search runs. Phase 0a fixed this for the org-context session variable in withOrgContext. Phase 0e applies the same $transaction wrap to all three SET LOCAL hnsw.ef_search call sites surfaced during STEP 1 audit: 1. src/lib/knowledge/search.ts:201 (searchKnowledgeBase) - Hot path: 6+ production callers (KB chat/eval API routes, kb-search-handler runtime, MCP tool exposure, rag-inject, codebase-rag SDLC). - Dynamic ef_search (40 | 60 | 100) based on query word count. - Vector search on KBChunk. 2. src/lib/memory/hot-cold-tier.ts:113 (getColdMemories) - No production callers in src/ today, but exported and reachable dynamically. Patched for consistency with the other two sites so the pattern is uniform across the codebase. - ef_search = 40 (hardcoded). - Vector search on AgentMemory. 3. src/lib/runtime/handlers/memory-read-handler.ts:168 (memoryReadHandler vector search path) - Runtime handler for the memory_read node type. Triggered for every agent run that includes a memory_read node. - ef_search = 40 (hardcoded). - Vector search on AgentMemory. PLAN-V2 §4.5 named only search.ts; the audit instruction in that section ('Apply same pattern to other SET LOCAL callers if discovered during STEP 1 audit') is honored here with all three sites in one PR. FORENSIC HAL-9 covers the same finding. Tests: - 63 existing tests still pass (search.test.ts utility tests: 30; hot-cold-tier.test.ts: 16; memory-read-handler.test.ts: 17). - 6 new regression-guard tests added in two new files: src/lib/knowledge/__tests__/search-vector-shape.test.ts (3) src/lib/runtime/handlers/__tests__/memory-read-handler-vector-shape.test.ts (3) - hot-cold-tier.test.ts mock extended with $transaction delegating to a tx mock; existing assertions on $executeRawUnsafe still hold via tx-aliased fn references. - Each shape test asserts: (a) $transaction is invoked, (b) SET LOCAL runs on the tx, not the outer prisma, (c) callback receives tx, (d) outer prisma's direct exec/query paths are NOT used. - Mocked tests verify SHAPE of the fix. They cannot verify pool-survival semantics; that needs a real Postgres integration harness (TODO comments left for Phase 1+). Risk: - search.ts is on the KB hot path. $transaction holds a pool connection for the duration of the callback (one SET LOCAL + one SELECT, both microsecond-millisecond). Should not materially affect pool exhaustion under load; verify in staging if concerned. - hot-cold-tier.ts getColdMemories has no production callers, so functional risk is zero. - memory-read-handler.ts vector search is per-agent-run when memory_read node fires. Similar transaction profile to search.ts. Out of scope (intentional): - Migrating $executeRawUnsafe -> $executeRaw on hot-cold-tier.ts and memory-read-handler.ts. Both keep the existing Unsafe variant; SET LOCAL takes no parameters, no SQL injection vector. Refactoring to safer APIs is a separate workstream. Verification: - vitest run (search.test.ts + search-vector-shape.test.ts + hot-cold-tier.test.ts + memory-read-handler.test.ts + memory-read-handler-vector-shape.test.ts): 69 passed - tsc --noEmit -p tsconfig.json: exit 0 Refs: - skill-rls-rollout-PLAN-V2.md §4.5 (Phase 0e bundled bug fix) - skill-rls-rollout-FORENSIC-REPORT.md HAL-9 - PR #97 (Phase 0a — same fix pattern for withOrgContext) - PR #98 (docs/rls-tech-debt.md — tracking)
webdevcom01-cell
added a commit
that referenced
this pull request
May 18, 2026
Running list of issues surfaced during Phase 0a that were intentionally not addressed in PR #97. Each item is scoped to a specific RLS phase it must be resolved before, so the rollout itself doesn't accumulate hidden surprises. Open items: 1. CI Docker Build & Push timeout (Medium, resolve before Phase 0b) - 30-min cap was hit by 29s on PR #97 during Collecting build traces. Compile alone takes 15.3 min. 2. MCP pool shutdown noise during SSG (Low, resolve before Phase 0e for cleaner build logs). 3. Railway 'Wait for CI' toggle is OFF (HIGH, resolve before Phase 0b — hard requirement). - Phase 0b adds migrations and would deploy app code against a not-yet-migrated database without this gate. Plus a 'Future-watch' section: preview deploys, deployment status reporting back to GitHub, auto-rollback on healthcheck failure. This is a documentation-only change. No code paths affected. typecheck: exit 0 (no TS touched). Refs: - skill-rls-rollout-PLAN-V2.md - skill-rls-rollout-FORENSIC-REPORT.md HAL-7, §3 #6 - PR #97 (Phase 0a)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…pool (Phase 0a)
Previously withOrgContext called client.$executeRawUnsafe to SET app.current_org_id, then ran fn(client) as a separate call. With Prisma connection pooling and Railway's 2 replicas, these two operations can land on different connections — the session variable evaporates between the set_config and the user's queries, and any RLS policy that reads app.current_org_id sees NULL.
Today this bug is masked because the application connects as a superuser (RLS bypassed at role level). Once Phase 0b introduces the non-bypass app_user role, every tenant-scoped query under this helper would silently return [].
The fix wraps the helper in client.$transaction(...), which pins a single connection for the duration of the callback. set_config is invoked on the tx client (not the outer pooled client) using $executeRaw with a tagged template (safer than the previous $executeRawUnsafe). Isolation, maxWait, and timeout are set per the rollout plan.
The callback signature changes from
fn: (client: PrismaClient) => Promise
to
fn: (tx: Prisma.TransactionClient) => Promise
because callers MUST use the tx for their queries — using the outer client would bypass the org context. withOrgContext is not yet called from any production route, so this signature change does not break any consumer in this repo.
Tests:
Refs:
Verification:
Description
Briefly describe the changes in this PR and the motivation behind them.
Type of change
Checklist
any, Tailwind only)pnpm test)pnpm lint)pnpm typecheck)console.logstatements left in code