fix(db): wrap withOrgContext in $transaction so session var survives … by webdevcom01-cell · Pull Request #97 · webdevcom01-cell/agent-studio

webdevcom01-cell · 2026-05-18T09:28:13Z

…pool (Phase 0a)

Previously withOrgContext called client.$executeRawUnsafe to SET app.current_org_id, then ran fn(client) as a separate call. With Prisma connection pooling and Railway's 2 replicas, these two operations can land on different connections — the session variable evaporates between the set_config and the user's queries, and any RLS policy that reads app.current_org_id sees NULL.

Today this bug is masked because the application connects as a superuser (RLS bypassed at role level). Once Phase 0b introduces the non-bypass app_user role, every tenant-scoped query under this helper would silently return [].

The fix wraps the helper in client.$transaction(...), which pins a single connection for the duration of the callback. set_config is invoked on the tx client (not the outer pooled client) using $executeRaw with a tagged template (safer than the previous $executeRawUnsafe). Isolation, maxWait, and timeout are set per the rollout plan.

The callback signature changes from
fn: (client: PrismaClient) => Promise
to
fn: (tx: Prisma.TransactionClient) => Promise
because callers MUST use the tx for their queries — using the outer client would bypass the org context. withOrgContext is not yet called from any production route, so this signature change does not break any consumer in this repo.

Tests:

5 existing unit tests rewritten against the new $transaction-based shape. They verify: options bag is correct (ReadCommitted, maxWait=5000, timeout=30000), set_config runs on the tx (not the outer client), the callback receives the tx instance, the callback's return value is propagated, errors propagate.
1 new regression-guard test ensures the callback receives the tx, not the outer client — guards against accidentally reverting to the pool-leak shape.
1 it.skip() integration placeholder with TODO for Phase 1+ real-Postgres harness. The mocked unit tests verify SHAPE of the fix but cannot verify the underlying pool-survival semantics — that requires a real connection pool.
3 registerRLSMiddleware tests left untouched (legacy $use path, now marked @deprecated in JSDoc).

Refs:

skill-rls-rollout-PLAN-V2.md §4.1 (Phase 0a, CRITICAL chore(deps): bump actions/setup-node from 4.4.0 to 6.3.0 #1)
skill-rls-rollout-FORENSIC-REPORT.md §1 bullet 1, §3 chore(deps): bump actions/setup-node from 4.4.0 to 6.3.0 #1, HAL-1

Verification:

vitest src/lib/db/tests/rls-middleware.test.ts: 8 passed, 1 skipped
tsc --noEmit: exit 0

Description

Briefly describe the changes in this PR and the motivation behind them.

Type of change

Checklist

I have read the Contributing Guide
My code follows the project code standards (TypeScript strict, no any, Tailwind only)
I have added tests that cover my changes
All tests pass (pnpm test)
Linting passes (pnpm lint)
Type checking passes (pnpm typecheck)
I have updated documentation if needed
No console.log statements left in code
No hardcoded secrets or internal URLs

@deprecated

…pool (Phase 0a) Previously withOrgContext called client.$executeRawUnsafe to SET app.current_org_id, then ran fn(client) as a separate call. With Prisma connection pooling and Railway's 2 replicas, these two operations can land on different connections — the session variable evaporates between the set_config and the user's queries, and any RLS policy that reads app.current_org_id sees NULL. Today this bug is masked because the application connects as a superuser (RLS bypassed at role level). Once Phase 0b introduces the non-bypass app_user role, every tenant-scoped query under this helper would silently return []. The fix wraps the helper in client.$transaction(...), which pins a single connection for the duration of the callback. set_config is invoked on the tx client (not the outer pooled client) using $executeRaw with a tagged template (safer than the previous $executeRawUnsafe). Isolation, maxWait, and timeout are set per the rollout plan. The callback signature changes from fn: (client: PrismaClient) => Promise<T> to fn: (tx: Prisma.TransactionClient) => Promise<T> because callers MUST use the tx for their queries — using the outer client would bypass the org context. withOrgContext is not yet called from any production route, so this signature change does not break any consumer in this repo. Tests: - 5 existing unit tests rewritten against the new $transaction-based shape. They verify: options bag is correct (ReadCommitted, maxWait=5000, timeout=30000), set_config runs on the tx (not the outer client), the callback receives the tx instance, the callback's return value is propagated, errors propagate. - 1 new regression-guard test ensures the callback receives the tx, not the outer client — guards against accidentally reverting to the pool-leak shape. - 1 it.skip() integration placeholder with TODO for Phase 1+ real-Postgres harness. The mocked unit tests verify SHAPE of the fix but cannot verify the underlying pool-survival semantics — that requires a real connection pool. - 3 registerRLSMiddleware tests left untouched (legacy $use path, now marked @deprecated in JSDoc). Refs: - skill-rls-rollout-PLAN-V2.md §4.1 (Phase 0a, CRITICAL #1) - skill-rls-rollout-FORENSIC-REPORT.md §1 bullet 1, §3 #1, HAL-1 Verification: - vitest src/lib/db/__tests__/rls-middleware.test.ts: 8 passed, 1 skipped - tsc --noEmit: exit 0

webdevcom01-cell · 2026-05-18T10:15:30Z

Same root cause as Phase 0a (PR #97): a SET LOCAL Postgres session variable and the subsequent query were issued as separate Prisma calls. With pool connections, the SET LOCAL can land on one connection and the SELECT on another — the ef_search tuning silently reverts to the server default before the vector search runs. Phase 0a fixed this for the org-context session variable in withOrgContext. Phase 0e applies the same $transaction wrap to all three SET LOCAL hnsw.ef_search call sites surfaced during STEP 1 audit: 1. src/lib/knowledge/search.ts:201 (searchKnowledgeBase) - Hot path: 6+ production callers (KB chat/eval API routes, kb-search-handler runtime, MCP tool exposure, rag-inject, codebase-rag SDLC). - Dynamic ef_search (40 | 60 | 100) based on query word count. - Vector search on KBChunk. 2. src/lib/memory/hot-cold-tier.ts:113 (getColdMemories) - No production callers in src/ today, but exported and reachable dynamically. Patched for consistency with the other two sites so the pattern is uniform across the codebase. - ef_search = 40 (hardcoded). - Vector search on AgentMemory. 3. src/lib/runtime/handlers/memory-read-handler.ts:168 (memoryReadHandler vector search path) - Runtime handler for the memory_read node type. Triggered for every agent run that includes a memory_read node. - ef_search = 40 (hardcoded). - Vector search on AgentMemory. PLAN-V2 §4.5 named only search.ts; the audit instruction in that section ('Apply same pattern to other SET LOCAL callers if discovered during STEP 1 audit') is honored here with all three sites in one PR. FORENSIC HAL-9 covers the same finding. Tests: - 63 existing tests still pass (search.test.ts utility tests: 30; hot-cold-tier.test.ts: 16; memory-read-handler.test.ts: 17). - 6 new regression-guard tests added in two new files: src/lib/knowledge/__tests__/search-vector-shape.test.ts (3) src/lib/runtime/handlers/__tests__/memory-read-handler-vector-shape.test.ts (3) - hot-cold-tier.test.ts mock extended with $transaction delegating to a tx mock; existing assertions on $executeRawUnsafe still hold via tx-aliased fn references. - Each shape test asserts: (a) $transaction is invoked, (b) SET LOCAL runs on the tx, not the outer prisma, (c) callback receives tx, (d) outer prisma's direct exec/query paths are NOT used. - Mocked tests verify SHAPE of the fix. They cannot verify pool-survival semantics; that needs a real Postgres integration harness (TODO comments left for Phase 1+). Risk: - search.ts is on the KB hot path. $transaction holds a pool connection for the duration of the callback (one SET LOCAL + one SELECT, both microsecond-millisecond). Should not materially affect pool exhaustion under load; verify in staging if concerned. - hot-cold-tier.ts getColdMemories has no production callers, so functional risk is zero. - memory-read-handler.ts vector search is per-agent-run when memory_read node fires. Similar transaction profile to search.ts. Out of scope (intentional): - Migrating $executeRawUnsafe -> $executeRaw on hot-cold-tier.ts and memory-read-handler.ts. Both keep the existing Unsafe variant; SET LOCAL takes no parameters, no SQL injection vector. Refactoring to safer APIs is a separate workstream. Verification: - vitest run (search.test.ts + search-vector-shape.test.ts + hot-cold-tier.test.ts + memory-read-handler.test.ts + memory-read-handler-vector-shape.test.ts): 69 passed - tsc --noEmit -p tsconfig.json: exit 0 Refs: - skill-rls-rollout-PLAN-V2.md §4.5 (Phase 0e bundled bug fix) - skill-rls-rollout-FORENSIC-REPORT.md HAL-9 - PR #97 (Phase 0a — same fix pattern for withOrgContext) - PR #98 (docs/rls-tech-debt.md — tracking)

Running list of issues surfaced during Phase 0a that were intentionally not addressed in PR #97. Each item is scoped to a specific RLS phase it must be resolved before, so the rollout itself doesn't accumulate hidden surprises. Open items: 1. CI Docker Build & Push timeout (Medium, resolve before Phase 0b) - 30-min cap was hit by 29s on PR #97 during Collecting build traces. Compile alone takes 15.3 min. 2. MCP pool shutdown noise during SSG (Low, resolve before Phase 0e for cleaner build logs). 3. Railway 'Wait for CI' toggle is OFF (HIGH, resolve before Phase 0b — hard requirement). - Phase 0b adds migrations and would deploy app code against a not-yet-migrated database without this gate. Plus a 'Future-watch' section: preview deploys, deployment status reporting back to GitHub, auto-rollback on healthcheck failure. This is a documentation-only change. No code paths affected. typecheck: exit 0 (no TS touched). Refs: - skill-rls-rollout-PLAN-V2.md - skill-rls-rollout-FORENSIC-REPORT.md HAL-7, §3 #6 - PR #97 (Phase 0a)

webdevcom01-cell merged commit 35140d3 into main May 18, 2026
7 of 8 checks passed

webdevcom01-cell deleted the phase-0a/with-org-context-fix branch May 18, 2026 10:16

This was referenced May 18, 2026

docs(rls): track 3 deferred items from Phase 0a rollout #98

Merged

fix(db): wrap SET LOCAL hnsw.ef_search in $transaction (Phase 0e) #99

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(db): wrap withOrgContext in $transaction so session var survives …#97

fix(db): wrap withOrgContext in $transaction so session var survives …#97
webdevcom01-cell merged 1 commit into
mainfrom
phase-0a/with-org-context-fix

webdevcom01-cell commented May 18, 2026

Uh oh!

webdevcom01-cell commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

webdevcom01-cell commented May 18, 2026

Description

Type of change

Checklist

Uh oh!

webdevcom01-cell commented May 18, 2026

Self-review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant