msgvault already ingests WhatsApp, iMessage, Google Voice, and SMS alongside email. The unified schema, the text-message import pipeline, the FTS indexing, the content-addressed attachment store, the Parquet analytics cache — all of it works for non-email message types today. Adding another chat source is now a well-worn path.
Facebook Messenger is the obvious next one and it is the most requested one #192 after whatsapp.
Facebook gives you a "Download Your Information" export, which is a zip file full of JSON or HTML files organized by thread. The data is all there: timestamps, participants, reactions, photos, videos, call logs. It is yours.
I have been testing this, here's the current branch => jesserobbins:jesse/fbmessenger
What this adds
msgvault import-messenger ingests a Facebook DYI export directory and stores every conversation in the same schema as email, WhatsApp, and iMessage.
msgvault import-messenger ~/Downloads/facebook-your_activity --me jesse
msgvault import-messenger --format json ~/facebook-dyi --me jesse
msgvault import-messenger --limit 100 ~/facebook-dyi --me jesse
After import, Messenger conversations show up everywhere: TUI, MCP server, HTTP API, search, analytics. Stage old threads for deletion. Build a collection that spans email and Messenger and deduplicate across both.
DYI export formats
Facebook ships now three archive formats, and which have changed over the years.
JSON is preferable. Millisecond timestamps, structured participants, typed message categories (Generic, Share, Call, Unsubscribe), reaction metadata. The catch: Facebook encodes all strings as Latin-1 bytes stuffed into a JSON UTF-8 document. Every non-ASCII character is mojibake. The parser must decode this transparently.
HTML is what most people have. Less metadata, no reaction structure, timestamps that vary by locale and DYI version. The parser handles four known timestamp layouts and falls back to best-effort extraction. When a thread has both JSON and HTML, JSON wins.
E2EE encrypted threads (new) use a flat-file format that differs from both. One message per line, colon-delimited, no JSON structure. A separate parser handles these.
All three formats are auto-detected per thread. A single export can contain a mix.
Key behaviors
- Discovery: walks the export directory for both the old
messages/ layout and the newer your_activity_across_facebook/messages/ structure. Handles inbox, archived, filtered, and message-request sections.
- Identity:
--me sets is_from_me on outbound messages. Participant addresses are synthesized as <slug>@facebook.messenger.
- Attachments: photos, videos, stickers, audio, and files are ingested into content-addressed storage.
- Reactions: stored as relational rows and appended to message body for FTS searchability.
- Resumable: checkpoints every 50 threads. If interrupted, picks up where it left off.
- Mojibake decoding: Latin-1-over-UTF-8 encoding in Facebook's JSON exports is decoded transparently.
Companion change
The query layer currently hard-codes message_type = 'email' filters in aggregate views, which hides all non-email messages from the TUI. This branch removes those filters so Messenger (and WhatsApp, iMessage, etc.) results participate in search and aggregation. If you archived it, you should be able to find it.
Scope
~4,400 lines across 40 files. Core importer is ~2,100 lines of implementation and ~1,600 lines of tests covering JSON parsing, HTML parsing, E2EE parsing, mojibake decoding, multi-file thread assembly, attachment resolution, FTS indexing, and Parquet cache integration.
Follows the same discover → parse → ingest pipeline as WhatsApp, iMessage, and Google Voice importers.
Related
Branch ready for PR: jesse/fbmessenger (6 commits on current main, tests pass).
msgvault already ingests WhatsApp, iMessage, Google Voice, and SMS alongside email. The unified schema, the text-message import pipeline, the FTS indexing, the content-addressed attachment store, the Parquet analytics cache — all of it works for non-email message types today. Adding another chat source is now a well-worn path.
Facebook Messenger is the obvious next one and it is the most requested one #192 after whatsapp.
Facebook gives you a "Download Your Information" export, which is a zip file full of JSON or HTML files organized by thread. The data is all there: timestamps, participants, reactions, photos, videos, call logs. It is yours.
I have been testing this, here's the current branch => jesserobbins:jesse/fbmessenger
What this adds
msgvault import-messengeringests a Facebook DYI export directory and stores every conversation in the same schema as email, WhatsApp, and iMessage.After import, Messenger conversations show up everywhere: TUI, MCP server, HTTP API, search, analytics. Stage old threads for deletion. Build a collection that spans email and Messenger and deduplicate across both.
DYI export formats
Facebook ships now three archive formats, and which have changed over the years.
JSON is preferable. Millisecond timestamps, structured participants, typed message categories (Generic, Share, Call, Unsubscribe), reaction metadata. The catch: Facebook encodes all strings as Latin-1 bytes stuffed into a JSON UTF-8 document. Every non-ASCII character is mojibake. The parser must decode this transparently.
HTML is what most people have. Less metadata, no reaction structure, timestamps that vary by locale and DYI version. The parser handles four known timestamp layouts and falls back to best-effort extraction. When a thread has both JSON and HTML, JSON wins.
E2EE encrypted threads (new) use a flat-file format that differs from both. One message per line, colon-delimited, no JSON structure. A separate parser handles these.
All three formats are auto-detected per thread. A single export can contain a mix.
Key behaviors
messages/layout and the neweryour_activity_across_facebook/messages/structure. Handles inbox, archived, filtered, and message-request sections.--mesetsis_from_meon outbound messages. Participant addresses are synthesized as<slug>@facebook.messenger.Companion change
The query layer currently hard-codes
message_type = 'email'filters in aggregate views, which hides all non-email messages from the TUI. This branch removes those filters so Messenger (and WhatsApp, iMessage, etc.) results participate in search and aggregation. If you archived it, you should be able to find it.Scope
~4,400 lines across 40 files. Core importer is ~2,100 lines of implementation and ~1,600 lines of tests covering JSON parsing, HTML parsing, E2EE parsing, mojibake decoding, multi-file thread assembly, attachment resolution, FTS indexing, and Parquet cache integration.
Follows the same
discover → parse → ingestpipeline as WhatsApp, iMessage, and Google Voice importers.Related
Branch ready for PR:
jesse/fbmessenger(6 commits on current main, tests pass).