Skip to content

PostgreSQL dialect refactor - PR1 of 4 - Dialect#276

Merged
wesm merged 5 commits intowesm:mainfrom
webgress:pr1-branch-dialect-extraction
Apr 21, 2026
Merged

PostgreSQL dialect refactor - PR1 of 4 - Dialect#276
wesm merged 5 commits intowesm:mainfrom
webgress:pr1-branch-dialect-extraction

Conversation

@webgress
Copy link
Copy Markdown
Contributor

Adding PostgreSQL as an opt-in database backend (SQLite stays the default).
Details in pg_refactor_docs/.
This is split into 4 PRs to make it easier to review:

  • PR1 — Dialect extraction (this PR): Introduces a Dialect interface and SQLiteDialect, moves all SQLite-specific SQL out of the store layer. No new dependencies, no behavior change.
  • PR2 — PostgreSQL dialect: Adds PostgreSQLDialect, pgx driver, PG schema with tsvector FTS, and dual-backend test support.
  • PR3 — PostgreSQL functional: Portable query layer (Rebind, RETURNING, parameterized FTS), PG-native schema migrations, connection pool config, and full test coverage across both backends.
  • PR4 — Data migration: migrate-db command for copying data between SQLite and PostgreSQL in either direction.

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 17, 2026

roborev: Combined Review (7f47f5c)

Verdict: the PR is not ready to merge; it introduces 1 high-severity regression risk and 4 medium-severity portability bugs.

High

  • internal/store/store.go:401 (InitSchema())
    Dialects that return "" from SchemaFTS() never mark FTS as available, so search indexing/backfill will silently stay disabled for those backends. This breaks the intended contract for dialects where FTS lives in the main schema.
    Fix: Set FTS capability independently of the SchemaFTS() branch, e.g. by probing via the dialect after schema init.

Medium

  • internal/store/api.go:179
    SearchMessages() combines dialect-provided FTS SQL with hardcoded ? placeholders for LIMIT/OFFSET, then executes the raw SQL without rebinding. That will fail on non-SQLite dialects.
    Fix: Build the full query with ? placeholders, then run s.Rebind() on the completed SQL before execution.

  • internal/store/api.go:249
    SearchMessagesQuery() always calls FTSSearchClause(1) even when earlier filters have already consumed parameters. For dialects using numbered placeholders, the FTS predicate will bind the wrong argument when prior conditions exist.
    Fix: Pass the next real parameter index, e.g. len(args)+1.

  • internal/store/messages.go (UpsertFTS())
    UpsertFTS() passes 7 arguments to s.db.Exec(), but Dialect.FTSUpsertSQL() is documented to take 6 parameters. That placeholder/argument mismatch is likely to break PostgreSQL integration.
    Fix: Pass exactly the 6 documented parameters, and adjust the SQLite SQL to reuse the first parameter via numbered placeholders if needed.

  • internal/store/store.go (OpenReadOnly())
    OpenReadOnly() creates the dialect but does not call dialect.InitConn(db). This is harmless for SQLite but will leave read-only connections for other dialects uninitialized.
    Fix: Call dialect.InitConn(db) immediately after dialect creation and return any error.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

webgress and others added 3 commits April 20, 2026 19:14
Introduce a Dialect interface that abstracts SQL generation and driver-
specific behavior, and wire the Store to delegate to it. SQLiteDialect
implements the existing SQLite behavior unchanged. No new dependencies.

Scope:
- Dialect interface (internal/store/dialect.go) covers Now(), Rebind(),
  InsertOrIgnore, UpdateOrIgnore, full-text search (upsert/search/delete/
  backfill/availability/clear), connection init, schema init, WAL
  checkpoint, migration probes, and driver error predicates.
- SQLiteDialect (internal/store/dialect_sqlite.go) returns the same SQL
  that was previously hardcoded.
- Store holds a Dialect field and routes calls through it; Open and
  OpenReadOnly both construct and InitConn the dialect.
- All datetime('now'), INSERT/UPDATE OR IGNORE/REPLACE, PRAGMA, FTS5,
  and isSQLiteError usages in the store package go through the dialect.
- subset.go is intentionally not migrated; it keeps using isSQLiteError
  directly and is flagged for follow-up.
- PostgreSQL URLs are rejected by Open() with a clear error.
- Design notes in pg_refactor_docs/ describe the PR1..PR4 plan.

Addressed review feedback:
- InitSchema probes FTSAvailable unconditionally so dialects without a
  separate FTS schema file (PostgreSQL) still set the availability flag.
- SearchMessages and SearchMessagesQuery Rebind the composed SQL before
  execution, and the FTS order fragment is taken from the dialect
  instead of hardcoding "rank".
- OpenReadOnly now calls dialect.InitConn.
- FTSSearchClause drops unused paramIndex; fragments use ? placeholders
  and callers Rebind the final query.
- FTSAvailable and FTSNeedsBackfill return bool; the always-nil error
  return was discarded by every caller.
- insertInChunks accepts a chunkInsert options struct instead of six
  positional parameters.
- FTSUpsertSQL docstring describes the actual SQLite 7-arg contract.
- Duplicate isSQLiteErrorMatch helper removed; dialect predicates share
  the canonical isSQLiteError helper that subset.go also uses.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-owned FTS upsert, Rebind in insertInChunks

- FTSSearchClause() now returns (join, where, orderBy, orderArgCount).
  Callers bind the search term orderArgCount additional times so a PG
  ts_rank() placeholder in ORDER BY binds correctly after Rebind.
  SQLite returns 0 because "rank" is an implicit FTS5 column.
- mergeLabelByName rewritten in portable SQL (DELETE conflicting
  associations, then plain UPDATE). UpdateOrIgnore dropped from Dialect
  — it was a SQLite-semantic leak that PG cannot mechanically emulate.
- FTSUpsertSQL replaced by FTSUpsert(q, FTSDoc): the dialect now owns
  both the SQL and the argument shape, so SQLite's rowid duplication
  stays out of callers and PG is free to do a tsvector column update.
- chunkInsert gains a rebind field; all three callers
  (replaceMessageRecipientsTx, replaceMessageLabelsTx, AddMessageLabels)
  pass s.dialect.Rebind so composed chunk SQL is renumbered for PG.
  replaceMessageRecipientsTx takes RecipientSet to stay under the
  ≤5-positional-parameter limit.
- ensureLabelWith / mergeLabelByName no longer take a Dialect parameter
  now that the UPDATE-OR-IGNORE workaround is gone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…helpers

- SearchMessagesQuery uses a new ftsEnabled bool (len(TextTerms) > 0)
  as the authoritative signal that FTS is active. The old ftsJoin != ""
  check skipped the relevance ORDER BY and the non-FTS fallback for
  dialects whose FTS clause needs no join (PG tsvector).
- ensureLabelWith and mergeLabelByName now take a Dialect and route
  every ?-placeholder statement through d.Rebind, so non-SQLite
  backends get numbered placeholders. The Phase 1 loop inside
  EnsureLabelsBatch uses s.dialect.Rebind for the same reason.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wesm wesm force-pushed the pr1-branch-dialect-extraction branch from 7f47f5c to e291892 Compare April 21, 2026 00:46
@wesm
Copy link
Copy Markdown
Owner

wesm commented Apr 21, 2026

Just rebased and addressed some review findings

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 21, 2026

roborev: Combined Review (e291892)

Verdict: Changes are not ready to merge; the dialect abstraction currently leaves multiple SQL execution paths incompatible with non-SQLite backends.

High

  • Placeholder rebinding is applied inconsistently across dialect-aware queries

    • Location: internal/store/messages.go:214, internal/store/messages.go:1157, internal/store/messages.go:1193; internal/store/messages.go (upsertMessageWith, EnsureConversation, UpsertAttachment, backfillFTSBatch); internal/store/sources.go; internal/store/sync.go
    • Problem: Several updated queries still execute with raw ? placeholders after introducing dialect-specific SQL fragments. On PostgreSQL or any non-SQLite backend, these paths will fail unless the statements are passed through Rebind().
    • Fix: Ensure every shared SQL statement is run through d.Rebind/s.dialect.Rebind before execution, or move full statement construction into dialect methods so callers never emit raw placeholders directly.
  • InsertOrIgnore() call sites also appear to skip placeholder rebinding

    • Location: internal/store/messages.go (EnsureParticipantsBatch, EnsureConversationParticipant, UpsertReaction, EnsureParticipantByPhone)
    • Problem: The abstraction rewrites the INSERT form, but the resulting SQL still contains ? placeholders and is executed without a corresponding Rebind(). That leaves these paths non-portable for backends that require positional parameters.
    • Fix: Apply Rebind() to the SQL returned from InsertOrIgnore() before executing it.

Medium

  • Chunked delete helpers may still generate non-portable IN (...) queries
    • Location: internal/store/messages.go (MarkMessagesDeletedBatch, MarkMessagesDeletedByGmailIDBatch)
    • Problem: execInChunks appears to build dynamic IN (%s) queries without a rebinding step, unlike the refactored insert chunking path. If so, these batch delete/update paths will still break on non-SQLite dialects.
    • Fix: Refactor execInChunks to accept and apply the dialect Rebind() function to the final composed query.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Places rebind inside loggedDB.Query/QueryRow/Exec (and their Context
variants) and adds a loggedTx wrapper returned from withTx. Every SQL
statement reaching the driver now goes through the dialect's rebind
exactly once, without callers needing to wrap each string in
s.Rebind(...). Drops the Dialect parameter from tx-helper functions
that only needed it for rebind (replaceMessageRecipientsTx,
replaceMessageLabelsTx, ensureLabelWith, mergeLabelByName) and the
rebind field from chunkInsert.

Store.Rebind is kept for callers that use Store.DB() to reach the
raw *sql.DB, which bypasses the logged wrapper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 21, 2026

roborev: Combined Review (baf576a)

Verdict: Changes are not ready to merge; 2 medium-severity issues need to be addressed.

Medium

  • pg_refactor_docs/msgvault_pg_implementation_prompt.md:3-6
    This adds a repo-resident prompt that tells users to pipe repository-controlled text into Claude Code with --dangerously-skip-permissions. That creates a prompt-injection path where any future PR changing this file, or referenced companion files, could steer a fully privileged agent into exfiltrating secrets, modifying the workspace, or pushing unauthorized changes.
    Suggested fix: Do not recommend feeding repo-controlled content into an agent with dangerous permissions enabled. Move the playbook outside the repo or another trusted location, require explicit human review before use, and keep normal sandbox/approval gates enabled.

  • internal/store/messages.go:740
    Dialect.InsertOrIgnore is documented as taking a complete SQL statement, but AddMessageLabels passes only a partial prefix (INSERT OR IGNORE INTO ... VALUES ). A PostgreSQL implementation that appends ON CONFLICT DO NOTHING would place it before the generated VALUES tuples, yielding invalid SQL.
    Suggested fix: Add a dedicated method for chunked insert prefixes, such as InsertOrIgnorePrefix(...), or otherwise separate conflict-handling syntax into prefix/suffix handling instead of running a complete-statement rewriter on a partial statement.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

… prompt doc

- Add Dialect.InsertOrIgnorePrefix for chunked inserts that build VALUES
  tuples incrementally. InsertOrIgnore is documented as taking a complete
  statement, but AddMessageLabels was passing only a prefix ending in
  "VALUES "; a PostgreSQL implementation of InsertOrIgnore would yield
  invalid SQL there. AddMessageLabels now uses InsertOrIgnorePrefix +
  InsertOrIgnoreSuffix so conflict handling sits in the right place for
  both dialects.

- Rework the PG implementation prompt doc to stop recommending
  --dangerously-skip-permissions on repo-controlled text. Any change to
  the prompt or its referenced companion docs could steer a fully
  privileged agent; keep normal approval gates enabled instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 21, 2026

roborev: Combined Review (e661000)

Verdict: No medium, high, or critical findings; the reviewed changes look clean.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@wesm wesm merged commit 8e1448e into wesm:main Apr 21, 2026
6 checks passed
webgress added a commit to webgress/msgvault that referenced this pull request Apr 25, 2026
Upstream PR1 (merged as wesm#276) expanded the Dialect interface during
review: FTSUpsert replaces FTSUpsertSQL (dialect owns argument shape),
FTSSearchClause now returns `?` placeholders with an orderArgCount so
loggedDB can rebind uniformly, FTSAvailable/FTSNeedsBackfill no longer
return error, and four new methods (InsertOrIgnorePrefix,
FTSRebuildSchema, IsBusyError, plus the newLoggedDB rebind argument)
join the contract.

Update PostgreSQLDialect to implement the new shape. FTSRebuildSchema
returns an "unimplemented" error for now — proper tsvector rebuild is
deferred to PR3 alongside the rest of the functional work. Remove the
dead UpdateOrIgnore helper (not in the interface, no callers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sternryan added a commit to sternryan/msgvault that referenced this pull request Apr 26, 2026
Reconciles 172 ahead / 37 behind state with upstream wesm/msgvault.

Strategy: accept upstream wholesale for connector code (M365, iMessage,
gvoice, IMAP XOAUTH2) where upstream's implementations are more
battle-tested and already cover the fork's bug fixes. Hand-resolve
store/sync/search/cmd/build files to union both feature sets.

Preserved from fork:
- SQLCipher encryption-at-rest (passphrase, AES-GCM token encryption)
- Advisory file locking (tryLock, lockFile, syscall.Flock)
- AI Archive Intelligence subsystem (internal/embedding/, vec_messages,
  pipeline_runs, --semantic search)
- Web UI (React/TypeScript SPA in web/)
- Hot-path search tokenizer (dispatchToken, toLowerFast, parseSizeFast)
- migrateAddContentID, InitVectorTable, content_id attachment column

Adopted from upstream:
- Dialect interface + loggedDB wrapper + structured logging pipeline
  (wesm#276 PostgreSQL dialect refactor foundation)
- OpenReadOnly() for MCP read-only access
- IsBusyError, SchemaStale helpers
- Unified text import (wesm#238) — M365 OAuth (wesm#228), iMessage (wesm#224),
  Google Voice (wesm#225) — all wholesale
- Search enhancements: regex, FTS5 snippets, sorting (wesm#252),
  domain normalization (normalizeAddr, looksLikeDomain, gTLDs)
- rebuild-fts command (wesm#287), 8 bug fixes from wesm#254
- IMAP date filtering (wesm#222), greeting wait (wesm#248)
- Vector subsystem (wesm#277) — coexists with fork's AI Archive
  Intelligence as parallel implementation; future cleanup needed

Build/runtime fixes applied during merge:
- Replaced mattn/go-sqlite3 imports with mutecomm/go-sqlcipher/v4
  (drop-in API-compatible) to resolve duplicate symbol linker errors
- Dropped sqlite_vec from default BUILD_TAGS (requires SQLite 3.38+
  APIs sqlcipher v4.4.2 does not expose; re-enable when sqlcipher
  upgrades)
- safeRowsAffected helper in db_logger.go: defer recover around
  RowsAffected() call (sqlcipher returns nil internal Result for
  multi-statement DDL)
- Wired normalizeAddr into hot-path tokenizer for from:/to:/cc:/bcc:

Stubbed under unreachable build tag (need follow-up decision):
- cmd/msgvault/cmd/sync_gvoice.go — fork's sync API obsolete vs
  upstream's import-based gvoice
- cmd/msgvault/cmd/sync_imessage.go — same situation

Verified: go build ./... passing, go vet clean, 45/45 test packages
pass with 0 failures. See MERGE_REPORT.md for file-by-file resolution
notes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants