PostgreSQL dialect refactor - PR1 of 4 - Dialect#276
Conversation
roborev: Combined Review (
|
Introduce a Dialect interface that abstracts SQL generation and driver-
specific behavior, and wire the Store to delegate to it. SQLiteDialect
implements the existing SQLite behavior unchanged. No new dependencies.
Scope:
- Dialect interface (internal/store/dialect.go) covers Now(), Rebind(),
InsertOrIgnore, UpdateOrIgnore, full-text search (upsert/search/delete/
backfill/availability/clear), connection init, schema init, WAL
checkpoint, migration probes, and driver error predicates.
- SQLiteDialect (internal/store/dialect_sqlite.go) returns the same SQL
that was previously hardcoded.
- Store holds a Dialect field and routes calls through it; Open and
OpenReadOnly both construct and InitConn the dialect.
- All datetime('now'), INSERT/UPDATE OR IGNORE/REPLACE, PRAGMA, FTS5,
and isSQLiteError usages in the store package go through the dialect.
- subset.go is intentionally not migrated; it keeps using isSQLiteError
directly and is flagged for follow-up.
- PostgreSQL URLs are rejected by Open() with a clear error.
- Design notes in pg_refactor_docs/ describe the PR1..PR4 plan.
Addressed review feedback:
- InitSchema probes FTSAvailable unconditionally so dialects without a
separate FTS schema file (PostgreSQL) still set the availability flag.
- SearchMessages and SearchMessagesQuery Rebind the composed SQL before
execution, and the FTS order fragment is taken from the dialect
instead of hardcoding "rank".
- OpenReadOnly now calls dialect.InitConn.
- FTSSearchClause drops unused paramIndex; fragments use ? placeholders
and callers Rebind the final query.
- FTSAvailable and FTSNeedsBackfill return bool; the always-nil error
return was discarded by every caller.
- insertInChunks accepts a chunkInsert options struct instead of six
positional parameters.
- FTSUpsertSQL docstring describes the actual SQLite 7-arg contract.
- Duplicate isSQLiteErrorMatch helper removed; dialect predicates share
the canonical isSQLiteError helper that subset.go also uses.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-owned FTS upsert, Rebind in insertInChunks - FTSSearchClause() now returns (join, where, orderBy, orderArgCount). Callers bind the search term orderArgCount additional times so a PG ts_rank() placeholder in ORDER BY binds correctly after Rebind. SQLite returns 0 because "rank" is an implicit FTS5 column. - mergeLabelByName rewritten in portable SQL (DELETE conflicting associations, then plain UPDATE). UpdateOrIgnore dropped from Dialect — it was a SQLite-semantic leak that PG cannot mechanically emulate. - FTSUpsertSQL replaced by FTSUpsert(q, FTSDoc): the dialect now owns both the SQL and the argument shape, so SQLite's rowid duplication stays out of callers and PG is free to do a tsvector column update. - chunkInsert gains a rebind field; all three callers (replaceMessageRecipientsTx, replaceMessageLabelsTx, AddMessageLabels) pass s.dialect.Rebind so composed chunk SQL is renumbered for PG. replaceMessageRecipientsTx takes RecipientSet to stay under the ≤5-positional-parameter limit. - ensureLabelWith / mergeLabelByName no longer take a Dialect parameter now that the UPDATE-OR-IGNORE workaround is gone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…helpers - SearchMessagesQuery uses a new ftsEnabled bool (len(TextTerms) > 0) as the authoritative signal that FTS is active. The old ftsJoin != "" check skipped the relevance ORDER BY and the non-FTS fallback for dialects whose FTS clause needs no join (PG tsvector). - ensureLabelWith and mergeLabelByName now take a Dialect and route every ?-placeholder statement through d.Rebind, so non-SQLite backends get numbered placeholders. The Phase 1 loop inside EnsureLabelsBatch uses s.dialect.Rebind for the same reason. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7f47f5c to
e291892
Compare
|
Just rebased and addressed some review findings |
roborev: Combined Review (
|
Places rebind inside loggedDB.Query/QueryRow/Exec (and their Context variants) and adds a loggedTx wrapper returned from withTx. Every SQL statement reaching the driver now goes through the dialect's rebind exactly once, without callers needing to wrap each string in s.Rebind(...). Drops the Dialect parameter from tx-helper functions that only needed it for rebind (replaceMessageRecipientsTx, replaceMessageLabelsTx, ensureLabelWith, mergeLabelByName) and the rebind field from chunkInsert. Store.Rebind is kept for callers that use Store.DB() to reach the raw *sql.DB, which bypasses the logged wrapper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
… prompt doc - Add Dialect.InsertOrIgnorePrefix for chunked inserts that build VALUES tuples incrementally. InsertOrIgnore is documented as taking a complete statement, but AddMessageLabels was passing only a prefix ending in "VALUES "; a PostgreSQL implementation of InsertOrIgnore would yield invalid SQL there. AddMessageLabels now uses InsertOrIgnorePrefix + InsertOrIgnoreSuffix so conflict handling sits in the right place for both dialects. - Rework the PG implementation prompt doc to stop recommending --dangerously-skip-permissions on repo-controlled text. Any change to the prompt or its referenced companion docs could steer a fully privileged agent; keep normal approval gates enabled instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
Upstream PR1 (merged as wesm#276) expanded the Dialect interface during review: FTSUpsert replaces FTSUpsertSQL (dialect owns argument shape), FTSSearchClause now returns `?` placeholders with an orderArgCount so loggedDB can rebind uniformly, FTSAvailable/FTSNeedsBackfill no longer return error, and four new methods (InsertOrIgnorePrefix, FTSRebuildSchema, IsBusyError, plus the newLoggedDB rebind argument) join the contract. Update PostgreSQLDialect to implement the new shape. FTSRebuildSchema returns an "unimplemented" error for now — proper tsvector rebuild is deferred to PR3 alongside the rest of the functional work. Remove the dead UpdateOrIgnore helper (not in the interface, no callers). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reconciles 172 ahead / 37 behind state with upstream wesm/msgvault. Strategy: accept upstream wholesale for connector code (M365, iMessage, gvoice, IMAP XOAUTH2) where upstream's implementations are more battle-tested and already cover the fork's bug fixes. Hand-resolve store/sync/search/cmd/build files to union both feature sets. Preserved from fork: - SQLCipher encryption-at-rest (passphrase, AES-GCM token encryption) - Advisory file locking (tryLock, lockFile, syscall.Flock) - AI Archive Intelligence subsystem (internal/embedding/, vec_messages, pipeline_runs, --semantic search) - Web UI (React/TypeScript SPA in web/) - Hot-path search tokenizer (dispatchToken, toLowerFast, parseSizeFast) - migrateAddContentID, InitVectorTable, content_id attachment column Adopted from upstream: - Dialect interface + loggedDB wrapper + structured logging pipeline (wesm#276 PostgreSQL dialect refactor foundation) - OpenReadOnly() for MCP read-only access - IsBusyError, SchemaStale helpers - Unified text import (wesm#238) — M365 OAuth (wesm#228), iMessage (wesm#224), Google Voice (wesm#225) — all wholesale - Search enhancements: regex, FTS5 snippets, sorting (wesm#252), domain normalization (normalizeAddr, looksLikeDomain, gTLDs) - rebuild-fts command (wesm#287), 8 bug fixes from wesm#254 - IMAP date filtering (wesm#222), greeting wait (wesm#248) - Vector subsystem (wesm#277) — coexists with fork's AI Archive Intelligence as parallel implementation; future cleanup needed Build/runtime fixes applied during merge: - Replaced mattn/go-sqlite3 imports with mutecomm/go-sqlcipher/v4 (drop-in API-compatible) to resolve duplicate symbol linker errors - Dropped sqlite_vec from default BUILD_TAGS (requires SQLite 3.38+ APIs sqlcipher v4.4.2 does not expose; re-enable when sqlcipher upgrades) - safeRowsAffected helper in db_logger.go: defer recover around RowsAffected() call (sqlcipher returns nil internal Result for multi-statement DDL) - Wired normalizeAddr into hot-path tokenizer for from:/to:/cc:/bcc: Stubbed under unreachable build tag (need follow-up decision): - cmd/msgvault/cmd/sync_gvoice.go — fork's sync API obsolete vs upstream's import-based gvoice - cmd/msgvault/cmd/sync_imessage.go — same situation Verified: go build ./... passing, go vet clean, 45/45 test packages pass with 0 failures. See MERGE_REPORT.md for file-by-file resolution notes.
Adding PostgreSQL as an opt-in database backend (SQLite stays the default).
Details in
pg_refactor_docs/.This is split into 4 PRs to make it easier to review:
Dialectinterface andSQLiteDialect, moves all SQLite-specific SQL out of the store layer. No new dependencies, no behavior change.PostgreSQLDialect, pgx driver, PG schema with tsvector FTS, and dual-backend test support.Rebind,RETURNING, parameterized FTS), PG-native schema migrations, connection pool config, and full test coverage across both backends.migrate-dbcommand for copying data between SQLite and PostgreSQL in either direction.