Skip to content

Pr2 branch postgresql dialect#298

Open
webgress wants to merge 12 commits intowesm:mainfrom
webgress:pr2-branch-postgresql-dialect
Open

Pr2 branch postgresql dialect#298
webgress wants to merge 12 commits intowesm:mainfrom
webgress:pr2-branch-postgresql-dialect

Conversation

@webgress
Copy link
Copy Markdown
Contributor

PR2 of the 4-PR PostgreSQL backend split (PR1 merged as #276).

Summary

  • Adds PostgreSQLDialect implementing the Dialect interface.
  • Wires the pgx driver into store.Open() for postgres:// URLs.
  • Adds schema_pg.sql with a tsvector FTS column and GIN index.
  • Adds PostgreSQLEngine scaffold parallel to SQLiteEngine.
  • Adds dual-backend test harness via MSGVAULT_TEST_DB (make test-pg).
  • Aligns PostgreSQLDialect with the post-review PR1 interface (FTSUpsert, ?-placeholder FTSSearchClause, InsertOrIgnorePrefix, FTSRebuildSchema, IsBusyError).

Status

PostgreSQL is not functionally usable after this PR — see docs/PG_STATUS.md for the full punch list. The blockers (schema type translation, Rebind threading, LastInsertId
RETURNING, query-engine parameterization) land in PR3.

SQLite remains the default and only production-ready backend; SQLite test suite is unchanged.

webgress and others added 9 commits April 24, 2026 22:38
go get github.com/jackc/pgx/v5/stdlib

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Create schema_pg.sql with:
- search_fts TSVECTOR column on messages (ALTER TABLE IF NOT EXISTS)
- GIN index for fast full-text search

Unlike SQLite's separate FTS5 virtual table, PostgreSQL stores the
tsvector directly on the messages table, eliminating the need for
a JOIN during search queries.

Update embed directive to include schema_pg.sql.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Full PostgreSQL dialect implementation:

- Rebind: converts ? to $1,$2,... (handles quoted strings correctly)
- Now: NOW() instead of datetime('now')
- InsertOrIgnore: strips OR IGNORE, relies on InsertOrIgnoreSuffix
  to append ON CONFLICT DO NOTHING
- UpdateOrIgnore: strips OR IGNORE (PG raises on conflict)
- FTS: tsvector column on messages with weighted search
  (subject='A', from='B', body/to/cc unweighted), GIN index,
  plainto_tsquery for search, ts_rank for ordering
- FTSAvailable: checks information_schema for search_fts column
- FTSNeedsBackfill: MAX(id) with NULL check on search_fts
- InitConn: sets statement_timeout
- SchemaStaleCheck: information_schema instead of pragma_table_info
- Error codes: 42701 (duplicate_column), 23505 (unique_violation),
  42P01 (undefined_table) via pgconn.PgError

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Open() and OpenReadOnly() now detect postgres:// and postgresql://
URLs and use PostgreSQLDialect + pgx driver. SQLite path remains
the default.

PostgreSQL connection pool settings:
- SetMaxOpenConns(25) vs SQLite's 4
- SetMaxIdleConns(5)

Read-only mode sets default_transaction_read_only = ON for safety.

Register pgx driver via side-effect import in dialect_pg.go.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Create a scaffolded PostgreSQLEngine parallel to SQLiteEngine.
Implements the portable methods directly (GetMessage, GetMessageBySourceID,
GetAttachment, ListAccounts, GetTotalStats for non-search cases) using
PostgreSQL-specific SQL with $N placeholders.

Methods that depend on SQLite-specific constructs (strftime, FTS5 MATCH,
sqlite_master) return ErrNotImplemented pending follow-up work to
parameterize the shared query-building helpers in sqlite.go.

This provides a foundation for PostgreSQL read-path support that can be
extended incrementally.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
testutil.NewTestStore() now detects MSGVAULT_TEST_DB env var:
- unset/empty: SQLite (default, unchanged)
- postgres://... URL: PostgreSQL with per-test schema isolation

PostgreSQL test setup creates a random schema (msgvault_test_<hex>)
via setup connection, then opens the store with search_path pointing
to that schema. Cleanup drops the schema CASCADE on test end.

Also add dialect_pg_test.go with table-driven tests for:
- PostgreSQLDialect.Rebind (including quoted-string handling)
- PostgreSQLDialect.Now, InsertOrIgnore, InsertOrIgnoreSuffix
- PostgreSQLDialect.FTSSearchClause (paramIndex handling)
- SQLiteDialect.FTSSearchClause (equivalence)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Correctness fix:
- PostgreSQLDialect.InsertOrIgnore now auto-appends ON CONFLICT DO NOTHING
  for complete statements (prefix-only callers still use InsertOrIgnoreSuffix).
  This ensures idempotency at call sites like UpsertReaction,
  EnsureConversationParticipant, EnsureParticipantsBatch where the full
  INSERT statement is passed in one piece.

Documentation:
- Add pg_refactor_docs/PG_STATUS.md spelling out what works, what doesn't,
  and the blockers that must resolve before PostgreSQL is functional
  end-to-end (schema type translation, Rebind threading, LastInsertId
  replacement, query engine parameterization).
- Add make test-pg target for running tests against PostgreSQL.
- Document MSGVAULT_TEST_DB and schema_pg.sql in CLAUDE.md.

PR2 is explicit scaffolding: the Dialect interface and PostgreSQLDialect
are validated, but substantial follow-up work is needed to make the store
layer and query engine actually usable with PostgreSQL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wrap raw *sql.DB with newLoggedDB() in openPostgres and
openPostgresReadOnly, matching the SQLite paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Upstream PR1 (merged as wesm#276) expanded the Dialect interface during
review: FTSUpsert replaces FTSUpsertSQL (dialect owns argument shape),
FTSSearchClause now returns `?` placeholders with an orderArgCount so
loggedDB can rebind uniformly, FTSAvailable/FTSNeedsBackfill no longer
return error, and four new methods (InsertOrIgnorePrefix,
FTSRebuildSchema, IsBusyError, plus the newLoggedDB rebind argument)
join the contract.

Update PostgreSQLDialect to implement the new shape. FTSRebuildSchema
returns an "unimplemented" error for now — proper tsvector rebuild is
deferred to PR3 alongside the rest of the functional work. Remove the
dead UpdateOrIgnore helper (not in the interface, no callers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@webgress webgress requested a review from wesm as a code owner April 25, 2026 18:50
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 25, 2026

roborev: Combined Review (97381ab)

Summary verdict: PostgreSQL support has several contract and test-runner issues that should be fixed before merge.

High

  • Location: Makefile:64
    Problem: make test-pg appears to have a malformed shell if statement: cmd/msgvault/cmd/verify.go was pasted where if should be, which will make the target fail.
    Fix: Change it to:
    @if [ -z "$$MSGVAULT_TEST_DB" ]; then \

Medium

  • Location: internal/query/postgres.go:91
    Problem: PostgreSQLEngine.GetMessage returns partial message details and does not exclude soft-deleted messages, unlike SQLite. Participants, labels, attachments, ReceivedAt, and raw-body fallback are missing, so callers using the shared Engine interface get incomplete or inconsistent results.
    Fix: Mirror SQLite’s getMessageByQuery behavior for PostgreSQL, or return ErrNotImplemented until the full contract is implemented.

  • Location: internal/query/postgres.go:216
    Problem: PostgreSQL GetTotalStats applies HideDeletedFromSource and related filters only to message counts. Attachment counts and sizes are queried from attachments directly or only filtered by source_id, so attachments from deleted or otherwise filtered-out messages can still affect stats.
    Fix: Build attachment stats by joining messages and applying the same message filters used for MessageCount.

  • Location: internal/store/store.go:213
    Problem: PostgreSQL read-only mode is set with SET default_transaction_read_only = ON on a pooled *sql.DB, which only affects the connection used for that statement. Later operations can run on newly opened writable connections despite OpenReadOnly.
    Fix: Configure the runtime parameter for every pgx connection, for example via connection config/runtime params or an AfterConnect hook.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Three fixes from the combined review; the high finding (Makefile
test-pg target) was already resolved in 1596d41 — verified with
`make -n test-pg`.

- internal/store/store.go: openPostgresReadOnly now sets
  default_transaction_read_only via pgx RuntimeParams + stdlib
  RegisterConnConfig, so the parameter is sent in the startup packet of
  every pooled connection. The previous `db.Exec("SET ...")` only
  affected whichever single connection happened to serve the Exec —
  later operations on a different pooled connection ran writable.

- internal/query/postgres.go (GetMessage / GetMessageBySourceID): the
  partial implementation only fetched scalar fields from messages and
  message_bodies. The Engine contract (matched by SQLite/DuckDB) also
  populates participants, labels, attachments, ReceivedAt, and a
  raw-MIME body fallback. Returning a partially-populated MessageDetail
  silently violated the contract for callers (cmd/show_message,
  api/handlers, mcp/handlers, etc.). Per the review's preferred fix,
  return ErrNotImplemented until the full message-detail path lands;
  drop the now-unused getMessageByQuery + rebindPg helpers.

- internal/query/postgres.go (GetTotalStats): attachment counts and
  sizes ignored HideDeletedFromSource, WithAttachmentsOnly, the
  message_type='email' restriction, and only filtered by source_id.
  Build attachment stats by joining messages and reusing the same
  WHERE clause as the message-stats query so deleted/non-email
  messages no longer leak into the totals.

docs/PG_STATUS.md: mark blocker wesm#8 (read-only enforcement) resolved
and update the "What works" entry to describe the RuntimeParams path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented May 1, 2026

roborev: Combined Review (8c540c1)

PostgreSQL support has a few blocking correctness issues: one broken test target and two schema/query bugs.

High

  • Location: Makefile:64
    Problem: The test-pg target has a typo/paste error: @cmd/msgvault/cmd/verify.go appears where @if should be, causing the make target to fail with a shell syntax error.
    Fix: Change:
    @cmd/msgvault/cmd/verify.go [ -z "$$MSGVAULT_TEST_DB" ]; then \
    to:
    @if [ -z "$$MSGVAULT_TEST_DB" ]; then \

Medium

  • Location: internal/query/postgres.go:159
    Problem: The initial message_type OR condition is joined with later filters without parentheses, so SourceID, WithAttachmentsOnly, and HideDeletedFromSource only apply to the final m.message_type = '' branch. Filtered PostgreSQL stats can overcount email rows from other sources or deleted rows.
    Fix: Wrap the OR group, for example:

    (m.message_type = 'email' OR m.message_type IS NULL OR m.message_type = '')

    and add a filtered stats test.

  • Location: internal/store/dialect_pg.go:130
    Problem: The information_schema.columns probes do not constrain table_schema, so FTSAvailable and SchemaStaleCheck can report success based on another schema in the same database. This conflicts with per-test-schema PostgreSQL isolation and deployments using search_path.
    Fix: Restrict these checks to the active schema, such as table_schema = current_schema(), or resolve the target table through the current search_path.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

webgress and others added 2 commits May 1, 2026 05:28
`argIdx` was incremented after the only usage point, leaving the final
write dead. golangci-lint (ineffassign) flagged this. Switch to the
standard `len(args)+1` pattern, which is lint-clean and stays
extensible if future conditions append more parameters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The PG dialect work added github.com/jackc/pgx/v5 as a direct
dependency, which changes the vendored module set hash. CI's
nix-build job reported the new expected hash; apply it here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented May 1, 2026

roborev: Combined Review (184b148)

High-level verdict: changes need fixes before merge due to one broken test target and several PostgreSQL correctness/build issues.

High

  • Makefile:64
    The test-pg target appears to contain an accidental paste: @cmd/msgvault/cmd/verify.go instead of @if. This will cause make test-pg to fail with a syntax or execution error.
    Fix: Change it to:
    @if [ -z "$$MSGVAULT_TEST_DB" ]; then \

Medium

  • go.mod:19
    The diff adds github.com/jackc/pgx/v5 and related dependencies, but matching go.sum updates are missing. This can break readonly module builds or cause test/build commands to modify the workspace.
    Fix: Run go mod tidy and commit the generated go.sum entries.

  • internal/query/postgres.go:156
    The initial message_type condition is joined with later filters without parentheses, so SQL precedence makes filters like source_id, has_attachments, and deleted_from_source_at apply only to the last OR branch.
    Fix: Wrap the OR group in parentheses before appending additional AND filters.

  • internal/store/dialect_pg.go:134
    information_schema.columns checks only table_name and column_name, so isolated PostgreSQL test schemas or non-public deployments can see columns from another schema and report FTS or migration state incorrectly.
    Fix: Restrict these checks to the active schema, for example with table_schema = current_schema().


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@wesm
Copy link
Copy Markdown
Owner

wesm commented May 1, 2026

Thank you, I will review this and merge soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants