Skip to content

webhook: dedicated AEAD key (separate from auth.totp_key_b64)#203

Merged
mfwolffe merged 11 commits into
trunkfrom
feat/webhook-aead-key-separation
May 14, 2026
Merged

webhook: dedicated AEAD key (separate from auth.totp_key_b64)#203
mfwolffe merged 11 commits into
trunkfrom
feat/webhook-aead-key-separation

Conversation

@espadonne
Copy link
Copy Markdown
Contributor

Summary

shithub's webhook secrets table stores secrets encrypted at rest
via AEAD, but the AEAD key was the same box used for TOTP
secrets (auth.totp_key_b64). That's an internal hygiene issue
not a feature gap — GitHub-parity is "secrets are encrypted at
rest," which we already did. But:

  1. The docs/internal/runbooks/rotate-secrets.md Webhook AEAD
    section described a webhook.aead_key rotation procedure
    referencing config keys that didn't exist in code.
  2. Compromising auth.totp_key_b64 also exposed every webhook
    secret in the DB.
  3. Rotating the TOTP key would also force re-encryption of every
    webhook row — coordination friction we didn't need.

This PR adds a dedicated webhook.aead_key (env
SHITHUB_WEBHOOK__AEAD_KEY) with a clean transition path:

  • OpenSecret takes (primary, legacy) and tries primary first,
    falls back to legacy. The deliverer wires
    primary=webhook.aead_key, legacy=auth.totp_key_b64 during
    the transition window.
  • New shithubd admin re-encrypt-webhooks walks the table and
    re-encrypts each row from legacy to primary. Idempotent +
    resumable + safe to interrupt.
  • Ansible templates render SHITHUB_WEBHOOK__AEAD_KEY when the
    inventory var is set (else fall back to current behavior —
    zero change for un-upgraded operators).
  • Runbook rewritten to match the actual implementation.

The deploy path is safe to take in either order (set the
key first, then re-encrypt — or re-encrypt-as-a-no-op first,
then set the key). The fallback keeps rows decryptable through
the transition. Operators who never run re-encrypt-webhooks
also keep working — they just don't get the separation benefit.

Per-commit reading order

  1. 687effa config: add webhook.aead_key field
  2. ad01cd8 OpenSecret(primary, legacy, ...) signature change
  3. 80d5191 tests for the fallback chain
  4. bf95125 wire primary+legacy through worker → deliverer
  5. 243bca6 web.env.j2 renders new env var conditionally
  6. 8a860ff worker.env.j2 same
  7. baa53cf production.example documents inventory var
  8. dbf13c5 new ListWebhookSecretsForReencrypt sqlc query
  9. 902f629 admin command
  10. 9e6a3ba runbook rewrite
  11. feb6338 CHANGELOG entry

Test plan

  • go build ./... clean
  • go vet ./... clean
  • gofmt -l clean (touched files)
  • New TestOpenSecret_* cases pass:
    • primary-only happy path
    • primary fails → falls back to legacy
    • both keys nil → error
    • both keys wrong → error names the all-keys-failed condition
    • primary succeeds → legacy not consulted (short-circuit)
  • CI green
  • After deploy + re-encrypt: webhook deliveries still sign
    and verify at the receiver
  • shithubd admin re-encrypt-webhooks --dry-run against
    prod reports a non-zero row count (proves the migration
    walker sees existing rows)

Deploy plan

  1. Merge this PR. CI auto-deploys binary + units + migrations.
    (No new migrations in this PR — pure code change.)
  2. Generate a new key:
    openssl rand -base64 32
  3. Add shithub_webhook_aead_key_b64=… to your local
    inventory/production. This unlocks the separation but
    doesn't activate it yet
    — Ansible needs to run to render
    the new env var into /etc/shithub/{web,worker}.env.
  4. make deploy ANSIBLE_INVENTORY=production (renders new env,
    restarts services). At this point: new webhook writes go
    under the dedicated key; existing rows still decrypt via the
    legacy (TOTP) fallback.
  5. ssh root@shithub.sh "sudo -u shithub /usr/local/bin/shithubd admin re-encrypt-webhooks --dry-run" to count migrate-able rows.
  6. Real run:
    ssh root@shithub.sh "sudo -u shithub /usr/local/bin/shithubd admin re-encrypt-webhooks"
  7. Verify by triggering a webhook delivery (e.g. push to a repo
    that has a webhook configured) and confirming the receiver
    sees a valid HMAC signature.

After step 6, every row is on the dedicated key. The legacy
fallback is dead code in practice — operators can leave it
configured for safety, or remove auth.totp_key_b64 from the
deliverer's reach in a future PR if they want clean separation.

@mfwolffe mfwolffe merged commit 377f57b into trunk May 14, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants