Skip to content

FTS5 + B-tree corruption on contentful index: recovery requires manual DROP+CREATE+backfill #287

@eddowding

Description

@eddowding

Summary

On a large archive (~34 GB, 422k messages), msgvault verify surfaced persistent SQLite corruption:

Tree 26 page 8231140 cell 2: Rowid 421177 out of order
Tree 58 page 8231146 cell 2: Rowid 421177 out of order
malformed inverted index for FTS5 table main.messages_fts

MIME sample verified 100% OK — only the FTS5 index and one message_bodies B-tree page were affected.

Environment

  • msgvault: v0.14.0 (corruption also present in v0.12.1)
  • OS: macOS 15.1 (Darwin 25.3.0), Apple Silicon
  • Archive: 422,009 messages, 34 GB, APFS, SQLite bundled
  • Archive grew over ~6 months of continuous sync; no specific incident pinpointed

What didn't work

  1. INSERT INTO messages_fts(messages_fts) VALUES('rebuild') — completes in ~2 min but leaves the same B-tree corruption. Presumably because it regenerates from within the corrupt shadow tables rather than from the source messages/message_bodies/message_recipients/participants tables.
  2. INSERT INTO messages_fts(messages_fts) VALUES('delete-all') — errors: 'delete-all' may only be used with a contentless or external content fts5 table. messages_fts is contentful (no content= clause).
  3. .recover — not feasible with <34 GB free.

What worked

Full manual DROP + recreate + batched backfill using the SQL already present in internal/store/dialect_sqlite.go (FTSBackfillBatchSQL):

  1. Stop daemon + MCP
  2. DROP TABLE messages_fts;
  3. CREATE VIRTUAL TABLE messages_fts USING fts5(message_id UNINDEXED, subject, body, from_addr, to_addr, cc_addr, tokenize='unicode61 remove_diacritics 1');
  4. Loop FTSBackfillBatchSQL over 5000-ID batches (mirrors Store.BackfillFTS)
  5. PRAGMA integrity_check — FTS5 error cleared.

Total time: ~5 min for 422k rows. Peak extra disk: ~3 GB (FTS5 shadow tables for this archive are ~2.5 GB).

The remaining Tree 26 ... Rowid out of order in message_bodies turned out to be a phantom-duplicate row on a corrupt page — resolved via DELETE FROM message_bodies WHERE message_id = <rowid>; INSERT ...; (hit a UNIQUE-constraint error on re-insert, which revealed a shadow duplicate; one copy deleted, page ordering healed).

Final PRAGMA integrity_check: ok. msgvault verify now exits clean.

Suggestions

  1. Expose a maintenance command such as msgvault rebuild-fts that runs the same DROP + CREATE + batched backfill — currently users must either know SQLite internals or do a full resync (or .recover, which requires 2× disk).
  2. Document recovery in docs/ — the contentful-FTS5 corruption path is non-obvious (the 'rebuild' and 'delete-all' pragmas behave differently than on contentless tables).
  3. Optional: serve --skip-db-check flag (mirroring verify --skip-db-check) so a corrupted-but-functional archive can still run the daemon while the user schedules a rebuild.

Happy to PR (1) if you agree with the approach — essentially Store.RebuildFTS() = drop + create-from-schema + call existing Store.BackfillFTS.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions