Skip to content

[i,l,-r] fix(slack): @Singleton producer, exclude requester, log root causes#8

Merged
phosphore merged 1 commit into
masterfrom
lorenzo/slack-handler-fixes
Apr 30, 2026
Merged

[i,l,-r] fix(slack): @Singleton producer, exclude requester, log root causes#8
phosphore merged 1 commit into
masterfrom
lorenzo/slack-handler-fixes

Conversation

@phosphore
Copy link
Copy Markdown
Member

Four small fixes from the first end-to-end test on the staged App Engine version (`rev-c2f171f7a17f0f51`).

What was happening

User submitted MPA on `0-test-mpa-flow` → JIT UI showed `HTTP 403: error`. Logs revealed:

```
Failed to DM reviewer adam.rapley@wave.com for jit-group:production.gcp.0-test-mpa-flow:
users.lookupByEmail failed: missing_scope
```

Root cause was a missing Slack scope (`users:read.email`) on the bot — fixed out-of-band (added scope, reinstalled, rotated Secret Manager). But the bug surfaced three pieces of latent bad behaviour worth fixing.

Fixes

1. `produceProposalHandler` is now `@Singleton`

Was `@Dependent` (CDI default), so every request rebuilt the factory output — including a fresh `Firestore` client + gRPC `ManagedChannel`. Each abandoned channel surfaced as an ERROR-severity `ManagedChannel allocation site` stack trace in stderr when gRPC's leak detector fired. `@Singleton` = one construction per JVM, no leak.

2. Filter the requester out of the expanded reviewer set

`JoinOperation.propose` drops the requester at the principal level (group ≠ user). When the policy ACL names a group that the requester is in (`0-test-mpa-flow` is exactly this — security ↔ security), the requester re-appears after `GroupResolver.expand`. They'd get a DM about their own request. Filter post-expansion with case-insensitive email match.

3. Log root causes before re-throwing in `onOperationProposed`

`AbstractProposalHandler.propose` catches `AccessException | IOException` and re-throws a generic `AccessDeniedException("Creating a proposal failed", e)` → 403 in the UI with no detail. The first E2E test hit Slack 403 `missing_scope` — per-reviewer `slack.dm.failed` log captured the symptom, but the `IOException("Slack DM delivery failed for every reviewer")` that propagated up was silent. Add explicit error logs at the three throw sites:

  • `slack.fingerprint.failed` (AccessException from GroupResolver — typical: missing Cloud Identity perms)
  • `slack.noReviewers` (zero individuals after expansion — typical: misconfigured policy)
  • `slack.allDmsFailed` (every per-reviewer DM failed — typical: missing Slack scope, bad token)

4. Bump pom version to `2.3.0-wavemm.2`

Forces a new App Engine `version_id` on the next deploy, so the rotated bot token (with `users:read.email` scope added) actually takes effect. App Engine doesn't allow mutating env vars on an existing version + our `version_id` is derived from the source ZIP hash + same-id replace = 409 → we keep bumping the trailing patch.

Test plan

  • `mvn test` (CI on this PR).
  • After merge + wavemm-iam submodule bump + apply: re-submit `0-test-mpa-flow` on the new versioned URL. Expected: DMs to Adam + Alex (not Lorenzo himself), one approves, sibling DM updates, beneficiary confirmation DM.

🤖 Generated with Claude Code

…ot causes

Four small fixes shaken out of the first end-to-end test on the staged
version:

1. produceProposalHandler is now @singleton

   Previously @dependent (CDI default for @produces with no scope), so
   every request that needed a ProposalHandler was rebuilding everything
   the factory constructs — including a fresh Firestore client and its
   gRPC ManagedChannel. Each abandoned channel surfaced as an ERROR-
   severity "ManagedChannel allocation site" stack trace in stderr when
   gRPC's leak detector eventually fired, polluting logs. Singleton-
   scope means one construction per JVM, no leak.

2. Filter the requester out of the expanded reviewer set

   AbstractProposalHandler / JoinOperation.propose drops the requester
   at the *principal* level (group ≠ user), so when the policy ACL
   names a group that the requester is in (the typical "team approves
   team" case used by 0-test-mpa-flow), the requester re-appears after
   GroupResolver.expand. We'd send them a DM about their own request.
   Drop them with a case-insensitive email match.

3. Log the root cause before re-throwing in onOperationProposed

   AbstractProposalHandler.propose catches AccessException | IOException
   and rethrows a generic AccessDeniedException("Creating a proposal
   failed", e), which surfaces as a 403 in the JIT UI with no detail.
   The first E2E test hit a Slack 'missing_scope' on
   users.lookupByEmail — the per-reviewer slack.dm.failed log captured
   the symptom, but the IOException("Slack DM delivery failed for
   every reviewer") that propagated up was emitted with no log of its
   own. Add explicit error logs at the three throw sites:
     - slack.fingerprint.failed (AccessException from GroupResolver)
     - slack.noReviewers (zero individuals after expansion)
     - slack.allDmsFailed (every per-reviewer DM failed)

4. Bump pom version to 2.3.0-wavemm.2

   Forces a new App Engine version_id on the next deploy, so the bot
   token re-fetched from Secret Manager (rotated to add
   users:read.email scope) actually takes effect. App Engine doesn't
   allow mutating env vars on an existing version, and our version_id
   derives from the source ZIP hash — without the bump TF would fall
   into the same delete-and-recreate-with-same-id 409 trap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@phosphore phosphore changed the title fix(slack): @Singleton producer, exclude requester, log root causes [i,l,-r] fix(slack): @Singleton producer, exclude requester, log root causes Apr 30, 2026
@phosphore phosphore merged commit 281f59d into master Apr 30, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant