Skip to content

[Sprint 3] SMTP intake, per-domain DKIM, storage helper, mailbox-key re-key#242

Merged
uzyn merged 2 commits into
epic/multi-domainfrom
sprint-3-multi-domain-data-plane
May 23, 2026
Merged

[Sprint 3] SMTP intake, per-domain DKIM, storage helper, mailbox-key re-key#242
uzyn merged 2 commits into
epic/multi-domainfrom
sprint-3-multi-domain-data-plane

Conversation

@uzyn
Copy link
Copy Markdown
Owner

@uzyn uzyn commented May 23, 2026

Sprint Goal

Wire multi-domain into the runtime data plane and absorb the deferred on-disk mailbox-key FQDN re-key from the prior upgrade-migration work.

Stories Implemented

[S3-1] recipient_domain_matches_any + resolve_mailbox_for_rcpt

  • recipient_domain_matches_any(addr, domains) in smtp/session.rs replaces the single-domain helper; SMTP RCPT TO accepts any address whose domain is in config.domains (case-insensitive).
  • Config::resolve_mailbox_for_rcpt does exact FQDN lookup with fallback to per-domain catchall (*@<domain>).
  • Config::resolve_mailbox_by_name accepts either the operator-friendly local-part or the canonical FQDN; wired into every consumer (MARK-*, MAILBOX-DELETE, HOOK-CREATE / DELETE, MCP tools, CLI hooks list / create, mailbox show).
  • Two-domain SMTP integration test (tests/multi_domain.rs) covers RCPT acceptance for each domain, rejection for unconfigured domains, and per-domain inbox routing.

[S3-2] Per-domain DKIM key map via ArcSwap

  • New src/dkim_keys.rs module. SharedDkimKeyMap = Arc<ArcSwap<HashMap<String, DkimKeyEntry>>> so future domain CRUD verbs can hot-swap without restarting the daemon.
  • Per-domain selector resolution: per-domain [domain.<d>] dkim_selector → top-level Config.dkim_selector → built-in "aimx".
  • Missing per-domain key for the default domain is fatal; missing for non-default domains warns and the daemon still starts.
  • Legacy <dkim_dir>/private.key shape still resolves for the default domain (covers fresh installs before the per-domain DKIM keygen flag lands).

[S3-3] send_handler signs per From: domain + carry-over re-key

  • send_handler.rs extracts From: domain from the submitted body, validates against config.domains, looks up the per-domain DKIM key + selector, and signs with the From: domain as the d= tag.
  • Sent copy persists to <data_dir>/<from-domain>/sent/<local>/ via the new storage helper.
  • Per-domain catchall (*@<domain> for any configured domain) is rejected as outbound sender.
  • Carry-over mailbox-key FQDN re-key fires on first start under the new binary via run_mailbox_key_rekey_at_startup in serve.rs. Idempotent — second start sees zero legacy keys and skips.
  • rewrite_config_to_canonical_shape now uses rekey_mailboxes_to_fqdn to write canonical FQDN-keyed [mailboxes."<local>@<domain>"] on disk.

[S3-4] aimx send --from accepts bare local-part + FQDN

  • Daemon-side: rewrite_bare_from_to_default_domain detects bare local-part From: (no @) and rewrites both the header value and the body bytes to <local>@<domains[0]> before validation and signing. Debug-logged.
  • FQDN form passes through unchanged.
  • Display-name forms ("Alice <alice>") preserve the display name while completing the FQDN.

[S3-5] mailbox_storage_path helper + CI grep job

  • New src/storage.rs with mailbox_storage_path(config, mailbox, folder) -> PathBuf and Folder enum.
  • Config::inbox_dir / sent_dir delegate to the helper. Every consumer (ingest, send_handler, mailbox CRUD handler, state handler, MCP tools, setup wizard) goes through the layout-aware helper.
  • CI grep job (Check storage path enforcement in .github/workflows/ci.yml) rejects new raw .join("inbox") / .join("sent") outside the helper module, the upgrade-migration module (which owns the legacy rename(2)), and the discover-mailbox-names path.

Technical Decisions

  • In-memory map keyed by FQDN. The upgrade migration's rewrite_config_to_canonical_shape now re-keys legacy local-part-keyed mailboxes to [mailboxes."<local>@<domain>"] on disk and returns the rewritten Config with FQDN keys. The daemon's Arc<Config> snapshot is FQDN-keyed throughout the request lifetime. resolve_mailbox_by_name accepts either form so callers (MCP, CLI, hooks, SMTP) keep working with the operator-friendly bare local-part on single-domain installs.
  • MAILBOX-LIST exposes FQDN names. Multi-domain installs return FQDN-shaped mailbox names from mailbox_list (e.g. info@a.com). Agents can disambiguate across domains without an extra round-trip. Bare-local-part input continues to resolve via the address-tail fallback in lookup_mailbox_row.
  • ArcSwap, not RwLock. The per-domain DKIM map sits behind ArcSwap<HashMap<String, DkimKeyEntry>> so the upcoming DOMAIN-ADD / DOMAIN-REMOVE verbs swap without blocking concurrent reads. Tests construct an ArcSwap with a single-domain map.
  • Daemon-side bare-from rewrite. The CLI passes the body as-is; the daemon owns the rewrite so the DKIM signature covers the canonical FQDN form even when the CLI submits --from research.
  • Discover-mailbox-names dedup. mailbox::discover_mailbox_names now skips on-disk directory names that correspond to an in-memory FQDN-keyed mailbox so MAILBOX-LIST doesn't double-count alice and alice@a.com.

Test Coverage

File Tests added
src/config.rs 5 (resolve_mailbox_for_rcpt exact FQDN, per-domain catchall, no-catchall reject, unknown-domain reject, legacy local-part keys; is_configured_domain case-insensitive)
src/dkim_keys.rs 5 (per-domain selector override, default selector, per-domain key load, missing-key warning, legacy fallback, case-insensitive lookup)
src/smtp/session.rs 4 (recipient_domain_matches_any multi-domain accept, unconfigured reject, case-insensitive, empty)
src/send_handler.rs 7 (two-domain DKIM d= tag, per-domain selector override, per-domain sent path, unconfigured-domain reject, per-domain catchall reject, bare-local-part rewrite, FQDN passthrough)
src/storage.rs 4 (v1 layout, v2 layout, two-domain routing, catchall dirname)
tests/multi_domain.rs 3 (RCPT acceptance per domain, unconfigured reject, per-domain inbox routing end-to-end)
tests/upgrade.rs 1 (carry-over re-key on already-migrated install with legacy mailbox keys)

Updated tests:

  • tests/integration.rs: the mailbox frontmatter field and MAILBOX-LIST name field now carry the canonical FQDN; updated CC / BCC / multi-recipient assertions accordingly.
  • tests/upgrade.rs::upgrade_migrates_v1_fixture_end_to_end: asserts [mailboxes."info@fixture.example"] lands on disk (replacing the legacy local-part-keyed assertion).
  • src/upgrade_migration.rs tests: updated to expect FQDN-keyed mailboxes after rewrite_config_to_canonical_shape.

Review Focus Areas

  • src/serve.rs run_mailbox_key_rekey_at_startup: holds CONFIG_WRITE_LOCK for the rewrite + Arc swap. Idempotency check is the cheap config_has_legacy_mailbox_keys text scan.
  • src/config.rs resolve_mailbox_by_name: handles three shapes (exact key, bare local-part for default domain, "catchall" legacy alias). Make sure the lookup order doesn't accidentally resolve a bare local-part to a foreign-domain mailbox.
  • src/send_handler.rs rewrite_bare_from_to_default_domain: rewrites both the header value (From: line in the body) and the bare token. The DKIM signer canonicalizes the rewritten body, so any mismatch between the rewrite and the validation paths would break DMARC alignment at receivers.
  • src/mailbox.rs::discover_mailbox_names: the new dedup path skips on-disk names whose local-part matches an in-memory mailbox. A regression here would cause MAILBOX-LIST to double-count.
  • .github/workflows/ci.yml::Check storage path enforcement: the awk script ignores #[cfg(test)] modules. Validate the production-code surface is fully covered.

Deferred Items

  • Sprint 2 layout-aware path shim retirement. The Config::storage_root_for_default_domain and storage_roots helpers still exist (used by discover_mailbox_names and doctor's orphan-storage scan). The shim is now mostly inert — every per-mailbox path goes through mailbox_storage_path. Fully retiring the shim is deferred to a follow-up cleanup PR.
  • MCP FQDN return-shape audit (FR-H1). This PR makes mailbox_list return FQDN names; a follow-up sprint will sweep the other MCP tools (email_list, email_read, email_send, email_reply, email_mark_*, hook_*) to make sure they all expose FQDN-shaped names consistently and accept bare local-part input via the resolver.

…re-key

Implements multi-domain runtime data plane: SMTP intake accepts any
configured domain, outbound signs with per-domain DKIM keys, sent copies
persist under per-domain paths, bare-local-part From: rewrites to the
default domain, and the deferred on-disk mailbox-key FQDN re-key fires
on first start under the new binary.

Stories implemented:
- S3-1: `recipient_domain_matches_any` + `resolve_mailbox_for_rcpt`.
  SMTP RCPT TO accepts any domain in `config.domains`; mailbox
  resolution falls back to per-domain catchall on local-part miss.
  Adds `Config::resolve_mailbox_by_name` so MARK-*, MAILBOX-DELETE,
  HOOK-CREATE / DELETE, MCP tools, and CLI `hooks list / create` accept
  either the operator-friendly local-part (`alice`) or the canonical
  FQDN (`alice@a.com`).
- S3-2: per-domain DKIM key map in `src/dkim_keys.rs`. Loaded at
  startup into `SharedDkimKeyMap = Arc<ArcSwap<HashMap<String,
  DkimKeyEntry>>>` so future DOMAIN-ADD / DOMAIN-REMOVE verbs can
  hot-swap without restarting the daemon. Per-domain selector
  resolution: per-domain override → top-level → built-in `"aimx"`.
  Missing per-domain key for a non-default domain warns and the daemon
  still starts; missing default-domain key is fatal.
- S3-3: `send_handler.rs` extracts From: domain from submitted body,
  validates against `config.domains`, signs with the per-domain key,
  and persists the sent copy under `<data_dir>/<from-domain>/sent/<local>/`.
  Per-domain catchall is rejected as outbound sender on any domain.
- S3-4: bare-local-part From: rewrite happens daemon-side. CLI passes
  `--from research` as-is; the daemon rewrites both the header value
  and the body bytes (the DKIM signature covers `From:`) before
  validation. Debug-logged.
- S3-5: new `src/storage.rs` with `mailbox_storage_path(config,
  mailbox, folder) -> PathBuf` and `Folder` enum. `Config::inbox_dir` /
  `sent_dir` delegate to the helper. CI grep job (`Check storage path
  enforcement`) rejects new raw `.join("inbox" / "sent")` outside
  `src/{storage,upgrade_migration,mailbox}.rs`.

Carry-over re-key absorbed from the previous upgrade migration:
`run_mailbox_key_rekey_at_startup` in `serve.rs` rewrites legacy
`[mailboxes.<local>]` to `[mailboxes."<local>@<domain>"]` on first
start under the new binary even when the `.layout-version: 2` marker
is already present. Idempotent: the second start sees zero legacy
keys and is a no-op. The upgrade-migration `rewrite_config_to_canonical_shape`
now re-keys on-disk too via `rekey_mailboxes_to_fqdn`.

Tests added:
- `src/config.rs`: 5 tests for `is_configured_domain` /
  `resolve_mailbox_for_rcpt` (exact FQDN, per-domain catchall,
  unconfigured-domain rejection, legacy local-part keys).
- `src/dkim_keys.rs`: 5 tests for selector resolution, per-domain key
  loading, missing-key warning path, legacy fallback for default
  domain, and case-insensitive lookup.
- `src/smtp/session.rs`: 4 tests for `recipient_domain_matches_any`
  (multi-domain accept, unconfigured reject, case-insensitive, empty).
- `src/send_handler.rs`: 7 tests for two-domain DKIM signing, per-
  domain `d=` tag, per-domain selector override, per-domain sent path,
  unconfigured-domain rejection, per-domain catchall rejection, bare-
  local-part rewrite, FQDN passthrough.
- `src/storage.rs`: 4 tests for v1 / v2 layout resolution and catchall
  dirname.
- `tests/multi_domain.rs`: 3 end-to-end SMTP intake tests against a
  two-domain `aimx serve` (RCPT acceptance per domain, unconfigured
  rejection, per-domain inbox routing).
- `tests/upgrade.rs`: `carry_over_rekey_fires_on_already_migrated_install_with_legacy_mailbox_keys`
  exercises the re-key on an install where storage + DKIM are already
  on v2 but the on-disk mailbox keys are still local-part shaped.
Copy link
Copy Markdown
Owner Author

@uzyn uzyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sprint Goal Assessment

Sprint 3 wires multi-domain into the runtime data plane (SMTP RCPT, outbound signing, sent persistence) and absorbs the deferred on-disk mailbox-key FQDN re-key from Sprint 2. The goal is met end-to-end: the SMTP listener accepts any configured domain via recipient_domain_matches_any, outbound signs with the per-message domain's DKIM key + selector via the new ArcSwap<HashMap> map, sent copies persist under <data_dir>/<from-domain>/sent/<local>/ via mailbox_storage_path, and the carry-over startup re-key rewrites legacy [mailboxes.<local>][mailboxes."<local>@<domain>"] on already-v2 installs (test passes). Five new test files + 29 new tests, full suite green locally (cargo test --release 116 integration + 50 send_handler + 5 upgrade + 3 multi_domain + 4 storage + 6 dkim_keys all pass; cargo clippy --all-targets -- -D warnings clean).

Acceptance Criteria Checklist

[S3-1] recipient_domain_matches_any + resolve_mailbox_for_rcpt

  • recipient_domain_matches_any replaces single-domain helper in smtp/session.rs.
  • Config::resolve_mailbox_for_rcpt does exact FQDN lookup → per-domain catchall fallback. Matches by mb.address (not key), so legacy local-part-keyed maps resolve too.
  • Every config.mailboxes.get(<local>) callsite migrated to FQDN-aware lookup (resolve_mailbox_by_name) in hook_handler, hooks, mailbox_handler::handle_delete, state_handler, mcp::set_read_status, mcp::lookup_mailbox_row, mailbox::build_show_lines. Survivors verified — send_handler.rs:224/466/731 operate on the canonical key returned by resolve_concrete_mailbox; ingest.rs:191/411/555 operate on the canonical key returned by resolve_recipient_mailboxresolve_mailbox_for_rcpt; hook_handler.rs:237/271 operate on the key returned by find_hook_owner; mailbox.rs:197/320/328 and mailbox_list_handler.rs:93 operate on map keys; setup.rs:4525 and mailbox.rs:1111-1132 are test-only.
  • Two-domain SMTP integration test in tests/multi_domain.rs covers RCPT per domain, unconfigured reject, per-domain inbox routing.

[S3-2] Per-domain DKIM key map via ArcSwap

  • SharedDkimKeyMap = Arc<ArcSwap<HashMap<String, DkimKeyEntry>>> with per-domain key + selector cached at load.
  • Per-domain selector resolution order matches FR-A7: per-domain → top-level → built-in "aimx".
  • Missing key for default domain is fatal; missing for non-default warns + continues.
  • Legacy <dkim_dir>/private.key fallback only for default domain (defensible — see dkim_keys::load_dkim_keys).
  • Six unit tests cover the load matrix + case-insensitive lookup.

[S3-3] send_handler signs per From: domain + carry-over re-key

  • From: domain extracted from body, validated against config.is_configured_domain, signs with per-domain key + selector. d= tag matches per-message domain (verified by two_domain_send_from_b_signs_with_b_dkim_key_and_selector against d=b.com + s=s2025).
  • Sent copy persists under <data_dir>/<from-domain>/sent/<local>/ via mailbox_storage_path (verified by two_domain_send_persists_sent_under_per_domain_path).
  • Per-domain catchall rejected as outbound sender (send_from_per_domain_catchall_is_rejected_as_outbound_sender).
  • Carry-over re-key fires once on already-v2 install with legacy keys, idempotent on second start (carry_over_rekey_fires_on_already_migrated_install_with_legacy_mailbox_keys — confirmed passing).
  • Log lines updated to reference per-message domain.

[S3-4] aimx send --from accepts bare local-part + FQDN

  • Daemon-side rewrite_bare_from_to_default_domain rewrites both header value AND body bytes before DKIM signing (so DMARC alignment stays valid).
  • FQDN form passes through unchanged.
  • Display-name form ("Alice <alice>") preserves the display name.
  • Debug-logged.

[S3-5] mailbox_storage_path helper + CI grep job

  • src/storage.rs exports mailbox_storage_path, storage_path_for, Folder, mailbox_dir_name.
  • Config::inbox_dir / sent_dir delegate to the helper.
  • CI grep job in .github/workflows/ci.yml rejects new raw .join("inbox"|"sent") outside the storage helper module, upgrade_migration.rs (legacy rename owner), and mailbox.rs (discover_mailbox_names walks storage_roots() + inbox/). The awk filter strips #[cfg(test)] modules.

Potential Bugs

Non-blocker: mailbox_handler::handle_create inserts the new mailbox with a bare local-part key, not the FQDN

src/mailbox_handler.rs:393-404:

let address = format!("{name}@{}", new_config.default_domain());
let mb_cfg = MailboxConfig { address, ... };
new_config.mailboxes.insert(name.to_string(), mb_cfg.clone());

On a multi-domain install with a post-re-key map (every existing key is FQDN), a freshly-created mailbox lands keyed by bare local-part. Effects within the same daemon lifetime:

  • resolve_mailbox_by_name("bob@<default>") returns None against the freshly-inserted key "bob" (the resolver's bare-local-part fallback only fires when the input has no @, never when the input is an FQDN against a local-part-keyed mailbox).
  • On-disk config.toml gets a [mailboxes.bob] section mixed in with [mailboxes."alice@<default>"] siblings — the very state config_has_legacy_mailbox_keys detects, so the carry-over re-key fires again on the next daemon restart (self-healing).

The bug is non-blocking because: (a) the next restart corrects it, (b) within the same lifetime, MCP / CLI callers using resolve_mailbox_by_name("bob") (bare) still resolve correctly. But it leaves an inconsistent state until the next restart and adds churn to the re-key path. Fix: rewrite line 404 to insert with mb_cfg.address.clone() as the key (always the FQDN).

Non-blocker: Config::inbox_dir / sent_dir fallback hardcodes the default domain

src/config.rs:1567-1576 (mailbox_folder_dir):

fn mailbox_folder_dir(&self, name: &str, folder: ...) -> PathBuf {
    if let Some(mb) = self.mailboxes.get(name) {
        return crate::storage::mailbox_storage_path(self, mb, folder);
    }
    crate::storage::storage_path_for(self, self.default_domain(), name, folder)
}

When name is not in the in-memory map, the fallback always picks default_domain() as the storage root. Callers that pass a bare local-part for a non-default-domain mailbox (e.g. Sprint 4's aimx domains add b.com + aimx mailboxes create support@b.com flow) would route to <data_dir>/<default_domain>/inbox/support@b.com/ rather than <data_dir>/b.com/inbox/support/. Today Sprint 3 doesn't ship a multi-domain mailbox-create flow (handle_create hardcodes address = "{name}@{default_domain}" at line 393), so the fallback is benign for Sprint 3 traffic. Flagging for Sprint 4 — when MAILBOX-CREATE learns to take a @<non-default-domain> address, this fallback will need to take the domain explicitly or extract it from the address parameter.

Non-blocker: rewrite_from_header_in_body uses raw substring find("from:") over the full body

src/send_handler.rs:665-682:

let lower = text.to_ascii_lowercase();
let from_idx = lower.find("from:")?;

find() matches the first case-insensitive from: substring anywhere in the body. Edge cases:

  • A body containing a quoted From: ... line in the message body (forwarded mail) could be matched, although in practice the header From: comes first per RFC 5322 ordering so the first match is normally the header.
  • RFC 5322 header folding (From:\r\n alice@a.com) — the rewrite finds the first \n and truncates everything after it, then prepends From: <new>\r. The folded continuation line gets orphaned into the headers (alice@a.com as a bare line is not a valid header), and the original From: address material is lost. Today's daemon-mediated CLI never produces folded headers, so this is theoretical.

Both edge cases are exotic; the function should at minimum (a) scan only the header block (up to the first blank line) and (b) match from: only at line start. Worth a quick defensive pass when there's time.

Test Coverage

Strong. Notable additions:

  • tests/multi_domain.rs (3 tests) exercises the full SMTP RCPT + per-domain inbox routing path against a real aimx serve subprocess with two domains and two DKIM keys.
  • tests/upgrade.rs::carry_over_rekey_fires_on_already_migrated_install_with_legacy_mailbox_keys (1 test) constructs a synthetic "v1 binary already migrated, but didn't re-key mailbox keys" fixture and validates that the new binary re-keys on first start, then no-ops on second start. This is the carry-over verification the sprint plan required, and it passes.
  • Per-domain DKIM d= tag + selector assertions in send_handler tests pin the multi-domain invariant: two_domain_send_from_b_signs_with_b_dkim_key_and_selector verifies d=b.com + s=s2025.
  • Bare-local-part rewrite has both an end-to-end test (bare_local_part_from_rewrites_to_default_domain — checks the delivered body's From: line carries the FQDN and the DKIM d= matches domains[0]) and a display-name unit test (rewrite_bare_from_handles_display_name_form).

Gaps worth tracking (non-blocker):

  • No test covers rewrite_bare_from_to_default_domain with folded From: header — the rewrite would silently truncate. Easy to add.
  • No test pins the MAILBOX-CREATE bare-local-part key behavior on a multi-domain post-re-key install (the non-blocker bug above).

Alignment with PRD

  • FR-D1 / D3 (SMTP RCPT): met. recipient_domain_matches_any accepts any configured domain, case-insensitive.
  • FR-B1 / B2 (per-domain DKIM): met. ArcSwap-wrapped map keyed by lowercase domain, per-domain selector resolved at load.
  • FR-E1–E4 (outbound from configured domains): met. From: domain validation, per-domain catchall reject, FQDN passthrough.
  • FR-J3 (log per-message domain): met. Log lines in send_handler and smtp/session.rs::handle_rcpt_to updated.
  • FR-C3 (storage helper): met. mailbox_storage_path + CI grep enforcement.
  • FR-G1 step 3 (on-disk mailbox-key re-key): met via the carry-over re-key in serve.rs::run_mailbox_key_rekey_at_startup. Idempotent, fires only when config_has_legacy_mailbox_keys returns true.
  • FR-H1 (MCP tools return FQDN): partially met for mailbox_list (the daemon-side MailboxListRow.name now carries the FQDN post-re-key; integration tests updated). The other MCP tools (email_list, email_read, email_send, email_reply, email_mark_*, hook_*) are not swept here. Sprint 6 S6-4 explicitly owns this sweep ("Audit every tool that returns a mailbox identifier... and ensure they return the FQDN") — the deferral is in the sprint plan and is therefore acceptable for this sprint. Input handling already accepts both FQDN and bare local-part via lookup_mailbox_row's address-tail fallback + resolve_mailbox_by_name's bare-to-FQDN expansion, so agents passing either form keep working today.

Disposition on the two flagged deferred items

  1. MCP FQDN return-shape auditapprove the deferral. Sprint 6 S6-4 explicitly owns the cross-tool sweep ("Audit every tool that returns a mailbox identifier..."), mailbox_list is the only tool whose return shape MUST be FQDN this sprint because discover_mailbox_names + the carry-over re-key both depend on it, and the input-acceptance side already handles bare local-parts uniformly. The deferral is a reasonable scope cut, not a scope failure.

  2. Layout-aware shim retirementapprove the deferral. S3-3's acceptance criterion says the shim "can simplify (or retire)", not "must retire". The shim is now read-only and dead-code-marked in the one helper (storage_root_for_default_domain) whose only caller is layout-aware reasoning. The surviving callsites (mailbox::discover_mailbox_names, doctor orphan-storage scan, serve.rs DKIM startup-check) need a per-storage-root iterator anyway, so retiring storage_roots() would require building a replacement iterator rather than just deleting code. Reasonable follow-up cleanup.

Code Quality

Blocker: planning-doc cross-reference in new code (CLAUDE.local.md hard rule)

src/dkim_keys.rs:88 introduces:

/// domain only** — pre-Sprint-3 fresh installs that ran `aimx setup

This violates CLAUDE.local.md's strict ban on planning-doc cross-references outside docs/:

Banned tokens (regex-checkable): \bSprint\s*\d+, \bS\d+[-.]\d+\b (e.g. S2-3, S6.1), \bFR-\d+[a-z]?\b (e.g. FR-13b, FR-50c), User Story, Acceptance criteria, PRD section citations like PRD §6.4 or Hardening PRD §6.4 / S3-1:.

Verification before completing scrum-autopilot, sprint-implementer, task-impl-pr, or any work that touches files outside docs/: run rg -n '\bSprint\b|\bS[0-9]+[-.][0-9]+\b|\bFR-[0-9]+[a-z]?\b|User Story|Acceptance criteria' --glob '!docs/**' --glob '!target/**' --glob '!.git/**'. Expect zero hits. If any appear, rewrite them per the rule above before opening or updating the PR.

The verification command yields one new hit introduced by this PR. The fix is the same rewrite rule: keep the technical content, drop the cross-reference. Suggested rewrite:

/// domain only** — fresh installs predating the per-domain DKIM layout that ran \aimx setup``

…or simply describe the layout (legacy single-key vs. per-domain) without the sprint identifier. CLAUDE.local.md PR #152 had to strip ~500 of these from 57 files; the rule is hard.

Non-blocker: PR title carries [Sprint 3] and the PR description carries S3-1...S3-5. The rule applies to PR descriptions and commit messages too. This matches the precedent established by PRs #239 / #241 for this track, so it's not a regression per se, but it's worth noting for the merge commit. The squash-merge body should drop the sprint IDs and describe the changes directly.

(Beyond the planning-doc reference, the diff is otherwise clean. unwrap_or_else(|poisoned| poisoned.into_inner()) is used consistently across the new lock acquisitions; ArcSwap::load_full() is taken once at the top of each request which is the right choreography; per-domain DKIM warning messages are operator-actionable.)

Summary and Recommended Actions

Overall verdict: Needs minor fixes.

Blockers (must fix before merge):

  • Planning-doc cross-reference at src/dkim_keys.rs:88 (pre-Sprint-3 text introduced by this PR). Per CLAUDE.local.md the verification regex must yield zero new hits; rewrite to drop the sprint identifier, keep the technical content.

Non-blockers (should fix but not blocking):

  • mailbox_handler::handle_create (src/mailbox_handler.rs:404) inserts the new mailbox with the bare local-part key on a post-re-key install. Self-healing on next restart but leaves an inconsistent in-memory state until then. Switch the insert key to mb_cfg.address.clone() (FQDN).
  • Config::inbox_dir / sent_dir fallback (src/config.rs:1567-1576) hardcodes default_domain() when the mailbox isn't in the map. Benign in Sprint 3 traffic; will need adjustment when Sprint 4's MAILBOX-CREATE learns about non-default domains.
  • rewrite_from_header_in_body (src/send_handler.rs:665-682) uses lower.find("from:") against the full body and finds the first \n as line end. Brittle on RFC 5322 folded headers and forwarded-mail body text. Restrict to the header block (above first blank line) and match from: at line start.

Nice-to-haves (low priority, worth tracking):

  • Add a defensive test for rewrite_bare_from_to_default_domain with a folded From:\r\n alice header (today it would truncate).
  • Add a regression test pinning MAILBOX-CREATE's in-memory key shape on a post-re-key multi-domain install.
  • The PR title's [Sprint N] convention propagates the planning-ID issue into commit history; consider switching to topic-prefixed titles for the remaining sprints in this track.

Carry-over verification: Confirmed. carry_over_rekey_fires_on_already_migrated_install_with_legacy_mailbox_keys passes locally — the fixture simulates an already-migrated v2 install with stale legacy mailbox keys, the new binary re-keys on first start ([mailboxes.info][mailboxes."info@fixture.example"]), and a second start produces a byte-identical config (idempotency confirmed via assert_eq!(after_first, after_second)).

- dkim_keys.rs: drop the `pre-Sprint-3` cross-reference and describe
  the layout contrast (legacy un-namespaced `<dkim_dir>/{private,public}.key`
  vs. per-domain `<dkim_dir>/<domain>/{private,public}.key`) directly.
- mailbox_handler::handle_create: key the in-memory and on-disk
  mailbox stanza by the canonical FQDN (`<name>@<default_domain>`)
  instead of the bare local-part, so a non-restart `resolve_mailbox_by_name`
  lookup against the FQDN succeeds immediately. Mirror the same fix in
  the CLI fallback `mailbox::create_mailbox` (root, daemon-down path)
  and route `mailbox::delete_mailbox` / `is_registered` through the
  resolver so either key shape continues to work.
- config::mailbox_folder_dir: walk the resolver (accepts bare + FQDN)
  and, when the caller asks about an unregistered name that itself
  carries an `@`, honour the parsed domain instead of always defaulting
  to `default_domain()`. Bare-local fallbacks still pick the default
  domain — there is no other reasonable host root.
- send_handler::rewrite_from_header_in_body: restrict the search to
  the header block (everything before the first blank line) so a
  `From:` literal in quoted/forwarded body text cannot be mistaken for
  the real header, and absorb RFC 5322 §2.2.3 folded continuation
  lines into the rewritten header. Add regression tests for both the
  folded-header and body-literal cases.
- Update post-create test assertions in mailbox_handler, setup, and
  integration.rs to match the new FQDN-keyed shape (and confirm the
  resolver still reaches the new mailbox by bare local-part).
@uzyn
Copy link
Copy Markdown
Owner Author

uzyn commented May 23, 2026

Addressed the review feedback in 27211b8.

What was fixed

  • Blocker — banned pre-Sprint-3 token (src/dkim_keys.rs:88) — rewritten to describe the layout contrast directly: legacy un-namespaced <dkim_dir>/{private,public}.key vs. per-domain <dkim_dir>/<domain>/{private,public}.key. No planning-doc cross-reference.
  • Non-blocker 1 — bare local-part insert (src/mailbox_handler.rs:404)handle_create now inserts the new mailbox under the canonical FQDN key (<name>@<default_domain>) so resolve_mailbox_by_name("<name>@<default>") succeeds immediately without waiting for the carry-over re-key on the next restart. Mirrored the same fix in the CLI fallback mailbox::create_mailbox (root, daemon-down path) and routed mailbox::delete_mailbox / is_registered through resolve_mailbox_by_name so either key shape continues to work. New regression coverage in mailbox_handler::tests::create_mailbox_creates_dirs_writes_config_swaps_handle confirms the FQDN key lands on both the in-memory handle and the on-disk file, and that the resolver reaches the new mailbox by both bare local-part and FQDN immediately after create.
  • Non-blocker 2 — Config::inbox_dir / sent_dir fallback (src/config.rs:1567) — picked option (b): the helper now routes through resolve_mailbox_by_name first (accepts both key shapes), and when the supplied name is itself an FQDN the fallback parses the domain out of it instead of always defaulting to default_domain(). Bare-local fallbacks still pick the default domain — there is no other reasonable host root.
  • Non-blocker 3 — rewrite_from_header_in_body brittle to folding + body literals (src/send_handler.rs:665) — rewritten to walk the header block line-by-line (stopping at the first blank line so a From: literal in quoted/forwarded body text cannot match), and absorbs RFC 5322 §2.2.3 folded continuation lines into the replaced logical header. Two new tests pin both invariants: rewrite_from_header_in_body_handles_folded_from_header and rewrite_from_header_in_body_ignores_from_literal_in_body.

What was intentionally left as-is

  • MCP FQDN sweep — the reviewer-approved deferral to Sprint 6. The MCP surface still accepts both shapes via resolve_mailbox_by_name, but the canonical-name normalisation pass through every tool is the Sprint 6 piece.
  • Layout-aware shim partial retirement — the reviewer-approved partial deferral. Where helpers still consult default_domain() as a defensive default (e.g. the catchall mailbox name "catchall" continues to map via the legacy alias branch in the resolver), the explicit domain comes through whenever the caller already knows it.

New concerns

None.

Verification

cargo fmt -- --check     # clean
cargo clippy --all-targets -- -D warnings   # clean
cargo test --bins        # 1246 passed; 0 failed; 8 ignored
cargo test --test integration         # 116 passed; 0 failed; 8 ignored
cargo test --test multi_domain --test upgrade --test uds_authz   # 9 passed; 0 failed

Banned-tokens regex on the branch diff vs origin/main:

git diff origin/main -- '*.rs' '*.toml' '*.md' '*.yml' '*.yaml' \
  | grep -E '^\+' | grep -vE '^\+\+\+' \
  | grep -E '\bSprint\s+[0-9]+|\bS[0-9]+[-.][0-9]+\b|\bFR-[0-9]+[a-z]?\b|User Story|Acceptance criteria'

returns zero hits.

Copy link
Copy Markdown
Owner Author

@uzyn uzyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review of 27211b8

Verified each previously-flagged issue against the new commit. All four are resolved with no regressions surfaced. Re-ran cargo fmt -- --check, cargo clippy --all-targets -- -D warnings, cargo test --bins (1246 passed / 0 failed / 8 ignored), and cargo test --test integration --test multi_domain --test upgrade --test uds_authz (125 passed / 0 failed / 29 ignored — the ignored set is the root-only matrix).

Resolved issues

Blocker — banned pre-Sprint-3 token (src/dkim_keys.rs:88) — fixed. The doc comment now contrasts the legacy un-namespaced layout (<dkim_dir>/{private,public}.key) with the per-domain layout (<dkim_dir>/<domain>/{private,public}.key) directly, no planning-doc cross-reference. Verification regex git diff origin/epic/multi-domain..27211b8 -- '*.rs' '*.toml' '*.md' '*.yml' '*.yaml' | grep -E '^\+' | grep -vE '^\+\+\+' | grep -E '\bSprint\s+[0-9]+|\bS[0-9]+[-.][0-9]+\b|\bFR-[0-9]+[a-z]?\b|User Story|Acceptance criteria' yields zero hits across the full branch delta.

Non-blocker 1 — bare local-part insert (src/mailbox_handler.rs:404) — fixed. handle_create now new_config.mailboxes.insert(address.clone(), mb_cfg.clone()) where address = "<name>@<default_domain>". The mirrored fix in src/mailbox.rs::create_mailbox (root, daemon-down path) also inserts by the canonical FQDN, and delete_mailbox / is_registered route through resolve_mailbox_by_name so either key shape continues to work. The updated regression test mailbox_handler::tests::create_mailbox_creates_dirs_writes_config_swaps_handle now asserts both that [mailboxes."alice@example.com"] lands on disk AND that resolve_mailbox_by_name reaches the new mailbox via both bare local-part (alice) and FQDN (alice@example.com) immediately after create — proving the post-create state is consistent without waiting for the next-restart carry-over re-key. Companion test assertions across setup.rs, mailbox_handler.rs, and tests/integration.rs were updated to expect the FQDN-keyed shape.

Non-blocker 2 — Config::inbox_dir / sent_dir fallback (src/config.rs:1567) — fixed (option b). mailbox_folder_dir now (a) walks resolve_mailbox_by_name first so both key shapes hit the live MailboxConfig, and (b) for unregistered names, parses the domain out of an FQDN-shaped name via rsplit_once('@') and only falls back to default_domain() for bare local-parts. I traced this through storage::storage_path_for: when an unregistered support@b.com is passed, the fallback produces <data_dir>/b.com/inbox/support/ — matching what a freshly-registered support@b.com mailbox would produce via mailbox_storage_path (which in turn calls mailbox_dir_name to strip the local-part). The two paths are byte-identical, which is exactly the consistency property the follow-up sprints depend on.

Non-blocker 3 — rewrite_from_header_in_body brittle to folding + body literals (src/send_handler.rs:665) — fixed. The rewrite has been replaced with a byte-walker that (a) terminates on the first blank line so a From: literal in quoted / forwarded body text cannot match, and (b) absorbs RFC 5322 §2.2.3 folded continuation lines into the replaced logical header. Two new regression tests pin both invariants:

  • rewrite_from_header_in_body_handles_folded_from_header"From: Alice\r\n <alice@a.com>\r\nTo: ..." collapses to "From: alice@a.com\r\nTo: ..." with the folded continuation gone, body preserved verbatim.
  • rewrite_from_header_in_body_ignores_from_literal_in_body — confirms (a) a body literal alone returns None (no rewrite), and (b) when both a real header AND a body literal are present, only the real header is rewritten and the body literal survives.

Both tests pass locally.

Approved deferrals (unchanged)

  • MCP FQDN return-shape audit — deferred to Sprint 6 S6-4 per the explicit sprint plan ownership. Input handling remains tolerant of both key shapes via the resolver.
  • Layout-aware shim partial retirement — the surviving callsites (mailbox::discover_mailbox_names, doctor's orphan-storage scan, serve.rs DKIM startup-check) need a per-storage-root iterator anyway, so retiring storage_roots() would require building a replacement rather than just deleting code. Reasonable follow-up cleanup.

New issues

None. The fixes are minimal, targeted, and tested. The byte-walker rewrite in rewrite_from_header_in_body is more code than the prior find("from:") shortcut but it's straightforward, the control flow is linear, and the two new tests pin the previously-brittle edges. The mirrored FQDN-keyed insert in the CLI fallback (src/mailbox.rs::create_mailbox) keeps the daemon-down code path consistent with the daemon path so a sudo aimx mailboxes create while the daemon is stopped now also lands FQDN-keyed.

Test discipline

  • 6 new test assertions across mailbox_handler::tests::create_mailbox_creates_dirs_writes_config_swaps_handle, delete_mailbox_*_e2e, the multi-create concurrency test, and the MCP create test — all now assert the FQDN-keyed [mailboxes."<local>@<domain>"] shape on disk and via the resolver, replacing the prior bare-local-part assertions.
  • The bare-substring-match assertions in tests/integration.rs (e.g. !config_text.contains("task-mb") for post-delete) are slightly weaker than the prior exact-stanza-name match, but they're defensible — the assertion intent is "no reference at all", and the substring covers both potential key shapes (FQDN or bare) plus the address = "..." line. Acceptable.

Summary

Overall verdict: Ready to merge.

All three previously-flagged blockers and non-blockers are resolved with targeted fixes and pinning regression tests. Format / clippy / unit / integration suites are green. No new issues surfaced during this re-review. The two reviewer-approved deferrals (MCP FQDN sweep → Sprint 6, layout-aware shim full retirement → follow-up) remain appropriate.

Recommended merge commit message

SMTP intake, per-domain DKIM, storage helper, mailbox-key re-key

Wires multi-domain into the runtime data plane: SMTP RCPT TO accepts
any configured domain, outbound signs with the per-message domain's
DKIM key + selector, sent copies persist under
`<data_dir>/<from-domain>/sent/<local>/`, bare-local-part From:
rewrites to the default domain daemon-side, and the deferred on-disk
mailbox-key FQDN re-key fires on first start under the new binary.

- `recipient_domain_matches_any` replaces the single-domain helper in
  the SMTP session state machine; `Config::resolve_mailbox_for_rcpt`
  does exact FQDN lookup with per-domain catchall fallback.
- Per-domain DKIM key map via
  `Arc<ArcSwap<HashMap<String, DkimKeyEntry>>>` so future domain CRUD
  verbs can hot-swap without restarting. Selector resolution order:
  per-domain override → top-level → built-in `"aimx"`. Missing key
  for non-default domains warns and the daemon still starts; missing
  default-domain key is fatal. Legacy `<dkim_dir>/private.key`
  fallback applies only to the default domain.
- `send_handler` extracts the From: domain from the submitted body,
  validates against `config.domains`, signs with the per-domain key,
  and rejects per-domain catchall as outbound sender. Bare-local-
  part From: rewrites both header and body bytes before signing so
  DMARC alignment stays valid.
- New `src/storage.rs` with `mailbox_storage_path` / `Folder`;
  `Config::inbox_dir` / `sent_dir` delegate to the helper, and a CI
  grep job rejects new raw `.join("inbox" / "sent")` outside the
  storage / upgrade-migration / mailbox modules.
- Carry-over startup re-key rewrites legacy `[mailboxes.<local>]` to
  `[mailboxes."<local>@<domain>"]` on already-v2 installs. Idempotent.
- MAILBOX-CREATE (daemon + CLI fallback) inserts new mailboxes
  FQDN-keyed so the in-memory shape is consistent post-create
  without waiting for the next-restart carry-over.
- 29 new tests across `src/config.rs`, `src/dkim_keys.rs`,
  `src/smtp/session.rs`, `src/send_handler.rs`, `src/storage.rs`,
  `tests/multi_domain.rs`, and `tests/upgrade.rs`.

@uzyn uzyn merged commit 365e02d into epic/multi-domain May 23, 2026
4 checks passed
@uzyn uzyn deleted the sprint-3-multi-domain-data-plane branch May 23, 2026 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant