Skip to content

feat: Microsoft 365 OAuth2 IMAP support#228

Open
YourEconProf wants to merge 27 commits intowesm:mainfrom
YourEconProf:microsoft-imap
Open

feat: Microsoft 365 OAuth2 IMAP support#228
YourEconProf wants to merge 27 commits intowesm:mainfrom
YourEconProf:microsoft-imap

Conversation

@YourEconProf
Copy link
Copy Markdown

I do still need to make edits to README.md for the organizational O365 account. Personal hotmail account is easy.

Summary

  - Microsoft 365 IMAP support: New add-o365 command sets up OAuth2 authentication for Microsoft accounts,
  then syncs mail over IMAP using the XOAUTH2 SASL mechanism
  - XOAUTH2 auth layer: Added AuthMethod config field and XOAUTH2 SASL client so IMAP connections can
  authenticate with OAuth2 bearer tokens instead of passwords
  - Microsoft OAuth2 manager: Full browser-based OAuth2 flow with PKCE, token persistence, automatic refresh,
   and email validation via ID token claims
  - Personal vs. org account support: Automatically detects personal Microsoft domains (hotmail.com,
  outlook.com, live.com) and uses the correct IMAP scope (outlook.office.com vs outlook.office365.com)
  - Account lookup by email: sync-full and sync now resolve IMAP accounts by display name when the bare email
   is passed (e.g., sync-full matt@5.life finds imaps://matt@5.life@outlook.office365.com:993)
  - Config: New [microsoft] section in config.toml for Azure AD client_id and tenant_id

  Tested with

  - Organizational O365 account (custom domain)
  - Personal Hotmail account
  - Gmail account (no regression)
  - Full sync of ~10K messages over IMAP/XOAUTH2

  Test plan

  - msgvault add-o365 user@org.com --tenant — org account auth flow
  - msgvault add-o365 user@hotmail.com — personal account auth flow (auto-detects scope)
  - msgvault sync-full user@org.com --limit 100 — IMAP sync with XOAUTH2
  - msgvault sync-full user@hotmail.com --limit 100 — personal account sync
  - msgvault sync-full user@gmail.com --limit 100 — Gmail still works
  - Unit tests pass (make test)

wesm and others added 15 commits March 23, 2026 16:57
Covers XOAUTH2 SASL auth in IMAP client, Microsoft OAuth2 provider,
add-o365 CLI command, and sync routing changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
10 bite-sized tasks: XOAUTH2 SASL client, auth method config, token
source in IMAP client, Microsoft OAuth manager, config section,
add-o365 CLI, sync routing, account removal, dependency, and final
verification.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… IMAP

Implements the XOAUTH2 SASL mechanism (sasl.Client interface) needed by
Microsoft Exchange Online IMAP, and adds the AuthMethod field to IMAP
Config for routing between password and xoauth2 authentication.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bundles Microsoft OAuth2 browser flow + IMAP auto-configuration into a
single command. Configures outlook.office365.com with XOAUTH2 auth
method automatically after authorization succeeds.

Also includes remove-account Microsoft token cleanup from concurrent task.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The MS Graph /me endpoint requires User.Read scope to return profile
data. Without it, the token validation step after OAuth authorization
would fail with HTTP 403.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
IMAP/O365 sources use imaps:// identifiers, so passing the bare email
(e.g. matt@5.life) to sync-full/sync found no match and fell back to
creating a fake Gmail source. Now falls back to display_name lookup,
which add-o365 populates with the email address.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Azure AD cannot issue a single access token valid for both
outlook.office365.com and graph.microsoft.com audiences. Using the
IMAP-scoped token to call Graph returned 401 "Invalid audience".

Switch to decoding the id_token JWT returned during the authorization
code exchange — the openid+email scopes already embed the user's email
claim, so no extra API call is needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Personal accounts (hotmail.com, outlook.com, live.com, etc.) require
the outlook.office.com resource, not outlook.office365.com. Detect
personal vs. org accounts by email domain and request the appropriate
IMAP scope. Stored scopes are used for token refresh so existing
tokens continue to work.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 27, 2026

roborev: Combined Review (e1748d1)

Verdict: Changes are not clean; 1 high-severity regression and 1 medium-severity robustness issue need attention.

High

  • internal/microsoft/oauth.go:243: resolveTokenEmail now treats the ID token as authoritative for mailbox identity, but extractEmailFromIDToken falls back to preferred_username when email is absent. In Azure AD, preferred_username is often the sign-in UPN rather than the primary SMTP address, so org accounts where those differ can now fail with TokenMismatchError. This is a regression relative to the prior /me.mail behavior.

Medium

  • internal/microsoft/oauth.go:30: scopesForEmail selects personal vs org IMAP scopes using a short hard-coded domain allowlist. Consumer Microsoft accounts using other Outlook/Hotmail aliases or regional domains can still be misclassified as org accounts and receive the wrong IMAP scope, so the auth failure this change is intended to fix can still occur.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

YourEconProf and others added 2 commits March 27, 2026 10:25
…crosoft OAuth

Two fixes for the Microsoft IMAP OAuth flow:

1. (High) When the ID token lacks an "email" claim and only has
   "preferred_username" (UPN), the UPN may differ from the SMTP address
   for org accounts. Previously this caused a TokenMismatchError blocking
   account setup. Now we log a warning but proceed, since the user
   authenticated interactively with login_hint set.

2. (Medium) Use the "tid" (tenant ID) claim from the ID token to
   authoritatively determine personal vs org account type, and correct
   the IMAP scope via refresh token if the domain-based guess was wrong.
   Also expanded the consumer domain allowlist with regional variants.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 27, 2026

roborev: Combined Review (6b128fe)

Verdict: Changes are not ready to merge due to 1 high-severity and 2 medium-severity issues in Microsoft OAuth account verification.

High

  • internal/microsoft/oauth.go:312 (also reported at internal/microsoft/oauth.go:305)
    resolveTokenEmail can accept the user-supplied email even when the ID token lacks an email claim and preferred_username does not match. Because login_hint is only advisory, a user can complete the browser flow with a different Microsoft account, and the code may then bind/store that token under the requested mailbox. This is an account-binding/authentication bypass relative to the prior verification path.
    Recommended fix: Require an authoritative identifier match before saving the token. If email is missing and preferred_username does not case-insensitively match the requested address, fail authorization or perform a trusted follow-up identity lookup.

Medium

  • internal/microsoft/oauth.go:150
    The new scope-correction flow attempts to switch from the O365 IMAP resource to the personal-account IMAP resource by using the refresh token after the initial browser flow. That does not establish consent for the new resource, so custom-domain consumer accounts can still fail authorization despite this change intending to fix that case.
    Recommended fix: When tid shows the initial scope guess was wrong, restart the interactive authorization flow with the corrected IMAP scope instead of trying to retarget the grant via refresh.

  • internal/microsoft/oauth.go:328
    extractIDTokenClaims decodes the JWT payload without validating signature, issuer, audience, expiry, or nonce, but those claims are then used for security-sensitive decisions including account verification and tenant/scope selection. Trusting unvalidated ID token claims is unsafe.
    Recommended fix: Validate the ID token against Microsoft OIDC metadata/JWKS and check at least signature, iss, aud, exp, and nonce; include a nonce in the auth request and verify it on return.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

YourEconProf and others added 2 commits March 27, 2026 10:59
Use source.DisplayName (the email address) when available, falling
back to source.Identifier for sources without a display name.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rection

Three security fixes for the Microsoft OAuth flow:

1. (High) Account-binding bypass: When the ID token lacks an "email"
   claim and only has "preferred_username" (UPN), a mismatched UPN now
   returns TokenMismatchError instead of proceeding with a warning.
   Since login_hint is advisory, the prior behavior could bind a token
   to the wrong mailbox.

2. (Medium) Scope correction: Replace refresh-token-based scope
   correction with a second interactive browser flow. Refresh tokens
   cannot establish consent for a new IMAP resource, so the prior
   approach would fail for custom-domain consumer accounts.

3. (Medium) JWT validation: ID token claims are now validated using
   go-oidc (OIDC discovery + JWKS). Verifies signature, issuer,
   audience, expiry, and nonce. Adds nonce generation to the browser
   flow for replay protection. A verifyIDTokenFn hook allows tests
   to bypass OIDC validation with unsigned test JWTs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@YourEconProf YourEconProf requested a review from wesm as a code owner March 27, 2026 15:35
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 27, 2026

roborev: Combined Review (3839fbc)

Summary verdict: Changes are generally solid, but there are 3 medium-severity regressions that should be addressed before merge.

Medium

  • internal/microsoft/oauth.go:262
    resolveTokenEmail now hard-fails when the ID token has no email claim and preferred_username does not match the user-entered SMTP address. This breaks legitimate Entra/O365 setups where the mailbox address differs from the sign-in UPN, which the previous Graph-based lookup handled.
    Fix: fall back to a trusted mailbox-address source when email is absent (for example Graph /me), or bind to a stable directory identity and store the mailbox address separately instead of treating UPN mismatch as fatal.

  • internal/microsoft/oauth.go:191
    TokenSource reuses whatever scopes are already persisted in the token file. Accounts authorized before this change will keep stale organizational IMAP scopes, so existing personal/custom-domain Microsoft accounts will not benefit from the new scope-correction logic unless they manually reauthorize.
    Fix: add a migration or scope revalidation path when loading/refreshing stored Microsoft tokens, and force reauthorization when saved scopes are stale or mismatched.

  • cmd/msgvault/cmd/sync.go:103, cmd/msgvault/cmd/syncfull.go:98, internal/store/sources.go:38
    The new CLI fallback assumes display_name is unique, but GetSourcesByDisplayName returns all matches. With the updated add-o365 guidance to run sync-full <email>, duplicate display names can cause one command to target multiple sources unintentionally.
    Fix: treat multiple display-name matches as an ambiguity error and require the full identifier, or enforce uniqueness for display names used as CLI selectors.

Notes

  • The security review did not identify any new security issues in the diff.
  • No other medium-or-higher findings were reported.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@wesm wesm changed the base branch from microsoft-imap to main March 27, 2026 18:01
@wesm
Copy link
Copy Markdown
Owner

wesm commented Mar 27, 2026

I'm closing my PR and will let this one be the definitive branch until it gets landed into main

@wesm wesm changed the title Microsoft IMAP Support feat: Microsoft 365 OAuth2 IMAP support Mar 27, 2026
@wesm wesm mentioned this pull request Mar 27, 2026
5 tasks
YourEconProf and others added 3 commits March 27, 2026 15:15
- Accept UPN/SMTP mismatch with a warning instead of hard error;
  Entra UPNs legitimately differ from mailbox SMTP addresses
- Persist tenant_id in token files and validate IMAP scope on load
  to detect and reject stale tokens from before scope-correction
- Treat ambiguous display-name matches as error in sync/sync-full

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Use tf.AccessToken instead of tf.Token.AccessToken (QF1008)
- Check fmt.Fprintf return values with _, _ = (errcheck)
- Lowercase OAuth error string (ST1005)
- Use switch instead of if/else-if chain on callCount (QF1003)
- Check tmpFile.Close() return value (errcheck)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 27, 2026

roborev: Combined Review (aeed1b7)

Verdict: Changes are close, but there is 1 high-severity regression risk and 1 medium-severity concurrency issue that should be addressed before merge.

High

  • Tenant override is not persisted across refresh/sync flows
    Location: cmd/msgvault/cmd/addo365.go:37, internal/microsoft/oauth.go:184
    The add-o365 --tenant ... override is only applied during the initial interactive auth flow and is not stored per account. Later operations reconstruct the Microsoft manager from global config (cfg.Microsoft.EffectiveTenantID()), so refreshes can fall back to common instead of the tenant-specific authority required by some single-tenant app registrations. Accounts added with --tenant can therefore stop working after the initial access token expires.
    Suggested fix: Persist the authority tenant alongside the account/token and use that persisted tenant when building the token endpoint for TokenSource and subsequent sync/remove flows. Add a regression test that authorizes with a tenant override, then validates refresh behavior from a fresh manager instance.

Medium

  • TokenSource ignores caller context and updates shared token state unsafely
    Location: internal/microsoft/oauth.go:214-226
    The closure returned by Manager.TokenSource uses the outer ctx instead of the per-call callCtx, and it mutates tf.Token without synchronization. If multiple syncs or connections refresh concurrently, this can produce a data race and inconsistent token state.
    Suggested fix: Guard shared token updates with a sync.Mutex, and call cfg.TokenSource(callCtx, &tf.Token).Token() inside the closure so refreshes respect the caller’s context.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

YourEconProf and others added 2 commits March 27, 2026 15:48
…ent safety

TokenSource was rebuilding the OAuth config from m.tenantID ("common")
instead of tf.TenantID (the actual tenant persisted per account). This
caused token refreshes to hit the wrong tenant endpoint for Entra accounts.

- Add oauthConfigWithTenant to build tenant-specific OAuth configs
- TokenSource now uses tf.TenantID when present, falls back to m.tenantID
  for pre-migration tokens (backward compatible)
- Replace tf.Token mutation with a mutex-protected lastAccessToken variable,
  eliminating the data race in the closure
- Add three tests: oauthConfigWithTenant unit test, persisted tenant override
  test, and concurrent access test (run with -race)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…me query

Fresh databases created by init-db were missing the oauth_app column in
the sources table, causing "expected 10 destination arguments in Scan,
not 11" when syncing Microsoft accounts. Also fixed GetSourcesByDisplayName
which omitted oauth_app from its SELECT list.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 27, 2026

roborev: Combined Review (7ca2071)

Verdict: Changes are not safe to merge yet due to one high-severity schema migration regression.

High

  • Schema migration missing for oauth_app column
    • Location: internal/store/schema.sql:22, internal/store/sources.go:43, internal/store/sources.go:46
    • Problem: oauth_app was added only in the CREATE TABLE IF NOT EXISTS definition, which does not update existing SQLite databases. The new read path now selects oauth_app, so older databases will fail with no such column: oauth_app when code paths like display-name source lookup run.
    • Fix: Add an explicit migration during schema initialization, such as ALTER TABLE sources ADD COLUMN oauth_app TEXT when the column is missing, and add a migration test covering upgrade from a pre-change database.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Verifies that InitSchema() correctly adds the oauth_app column to
existing databases created before the Microsoft IMAP feature, and that
the source read paths (GetSourcesByIdentifier, GetSourcesByDisplayName)
work after migration.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 27, 2026

roborev: Combined Review (7c13655)

Verdict: Changes are not ready to merge due to 1 High and 2 Medium findings.

High

  • internal/microsoft/oauth.go
    • Problem: The OAuth callback handler and ListenAndServe goroutine can block forever when sending to errChan or codeChan if more than one request reaches the callback endpoint. With a buffer size of 1, a second send can hang the handler, which then causes server.Shutdown(ctx) to wait indefinitely and freeze the CLI process.
    • Fix: Make channel sends non-blocking, for example with select { case ch <- v: default: }.

Medium

  • Location: cmd/msgvault/cmd/sync.go:140, cmd/msgvault/cmd/syncfull.go:127

    • Problem: XOAUTH2 IMAP sources still consider a legacy password credential file sufficient because credential presence is checked before the source auth method. If an Office 365 source was migrated from password auth, a stale password file can make it appear usable even when the Microsoft token is missing, so sync proceeds and fails later instead of skipping or prompting for re-auth.
    • Fix: Parse sync_config first for IMAP sources and, when EffectiveAuthMethod() is AuthXOAuth2, ignore HasCredentials(...) and require msMgr.HasToken(imapCfg.Username).
  • Location: cmd/msgvault/cmd/sync.go:85, cmd/msgvault/cmd/syncfull.go:79

    • Problem: Display-name lookup only runs when GetSourcesByIdentifier(args[0]) returns no rows. If another source already uses that email as its identifier, an O365 IMAP source with the same email in display_name is excluded silently, making msgvault sync[-full] <email> unreliable in mixed setups.
    • Fix: Merge identifier and display-name matches, deduplicate by source ID, then sync the single resolved source or raise an explicit ambiguity error.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

YourEconProf and others added 2 commits March 27, 2026 18:51
…kups, messages-processed count

- browserFlow: make channel sends non-blocking to prevent goroutine deadlock
  on duplicate HTTP requests (favicon, retries) to the callback endpoint
- sync/syncfull: check EffectiveAuthMethod() before HasCredentials() so stale
  IMAP password files don't mask missing XOAUTH2 tokens for O365 sources
- sync/syncfull: always run both identifier and display-name lookups, merge
  and deduplicate by source ID for reliable mixed Gmail+IMAP setups
- incremental: count actual messages touched per history page, not records,
  so summary.MessagesFound and progress output reflect real message counts
- buildAPIClient: guard against empty ClientID before creating Microsoft manager
- sync.Options: remove CheckpointInterval field that was defined but never read

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Priority 1 – correctness:
- store_attachment: safe two-value type assertion on sync.Map cache hit
- deletion/executor: defensive nil check for manifest.Execution in finalizeExecution
- build_cache: only trigger zero-row export error when count query succeeds (avoids false-positive failures when DuckDB can't verify)

Priority 2 – resource / data safety:
- build_cache: log warning instead of silently discarding attachment stats query error
- deletion/executor: escalate checkpoint save failure from Warn to Error

Priority 3 – error visibility:
- repair_encoding: log tx.Rollback() failures at Warn instead of discarding
- mbox_import: log warning when st.FailSync() itself fails (via failSync helper)

Priority 4 – UX:
- export_attachments: fail early with a clear error when output directory is not writable
- addimap: link port flag help text to the package that owns the defaults

Tests:
- build_cache: assert state file is not written when buildCache returns an error
- root: add TokenMismatchError test case verifying recovery instructions in error message

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 27, 2026

roborev: Combined Review (bd91b3b)

Verdict: Changes look mostly solid, but there are 2 issues that should be addressed before merge.

High

  • cmd/msgvault/cmd/build_cache.go:423
    If the Parquet verification query fails, countQueryOK is set to false and the zero-row guard is skipped, which allows _last_sync.json to still be written even though the export was not successfully verified. This can recreate the poisoned-state failure the guard is intended to prevent.
    Recommended fix: treat verification-query failure as a hard stop for state updates, and do not write the state file when the Parquet count/readback step fails.

Medium

  • cmd/msgvault/cmd/sync.go:152
  • cmd/msgvault/cmd/syncfull.go:139
    The IMAP auth precheck ignores ConfigFromJSON errors. If sync_config is malformed, the source is silently treated as having missing credentials and skipped, which masks configuration corruption behind a misleading recovery message.
    Recommended fix: handle parseErr explicitly and surface a configuration/parse error for that source instead of falling through to the missing-credentials path.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants