Skip to content

docs(rfc): lock RFC-21 Phase-3 design decisions#3967

Merged
mswilkison merged 1 commit into
feat/frost-schnorr-migration-scaffoldfrom
docs/rfc-21-coordinator-aggregation-update-2026-05-22
May 23, 2026
Merged

docs(rfc): lock RFC-21 Phase-3 design decisions#3967
mswilkison merged 1 commit into
feat/frost-schnorr-migration-scaffoldfrom
docs/rfc-21-coordinator-aggregation-update-2026-05-22

Conversation

@mswilkison
Copy link
Copy Markdown
Contributor

Summary

Promotes the Phase-3 design decisions settled in the 2026-05-22
cross-team review into a dedicated Resolved Decisions section
of RFC-21. Doc-only; +180/-38.

Why

The previous draft listed Phase-3 questions under "Open questions"
with a recommended-entering-Phase-3 path that turned out, on
review, to have a critical safety gap: the all-to-all signed-evidence
gossip recommendation silently assumed gossip is synchronously
consistent across the signer set. In practice gossip is eventually
consistent, so two honest signers can hold divergent evidence sets
at the moment the deterministic `NextAttempt` boundary triggers,
producing divergent next-attempt contexts and fracturing the group.

This PR locks the replacement design before Phase 3 implementation
PRs begin landing.

What the resolved-decisions section pins

Decision Resolution
Cross-process coordinator agreement Coordinator-proposed aggregation on a dedicated evidence topic, signed with operator key, receiver-side bundle verification for censorship detection. All-to-all gossip + local union is rejected with rationale.
Source of `DkgGroupPublicKey` for seed Extracted from FFI signer material at attempt construction time. No wallet-registry lookup on hot path.
`AttemptContext` ↔ `NativeExecutionFFISigningRequest` Field on request struct; Go-side orchestration only; does not cross CGO boundary.
`SelectCoordinator` retention Keep as helper; `BeginAttempt` bridges `[32]byte` seed to legacy `int64` via a sterile, named adapter.
Evidence-signing key Reuse existing operator key.
Evidence message format JSON wrapped in existing `pkg/net/gen/pb` envelope; routed via `net.Message`.
Maximum evidence-message size Single `TransitionMessage` per transition, ~10-20 KiB at 100-signer saturation. No chunking.
Silence-parking transience (risk mitigation) Strictly single-attempt skip, no escalation. A peer falsely labelled silent is reinstated by the very next attempt.

Layer-B exclusion-policy strengthening

The exclusion-policy list in Layer B is extended with explicit
"no escalation" wording for the silence/parking case. The risk
Gemini's review surfaced (late-arriving evidence weaponised into
permanent exclusion) is bounded by:

  • Silence parking ≤ 1 attempt.
  • Permanent exclusion only fires on overflow (transport-blamable)
    or non-transport reject (validation-blamable). Neither can
    trigger on a slow-but-honest peer.
  • Receiver-side bundle verification catches a coordinator that
    tries to censor an honest peer's signed snapshot.

Open questions reduced to three

What remains in the Open Questions section is genuinely open:

Test plan

  • Reviewer reads the Resolved decisions section end-to-end.
  • Reviewer confirms the coordinator-aggregation flow as
    documented matches the agreed design.
  • AsciiDoc renders cleanly (CI step `Publish contracts
    documentation` covers this).

No code change; no behaviour-test surface.

Promotes the resolved Phase-3 design decisions (settled in the
2026-05-22 cross-team review) from the Open Questions section
into a dedicated Resolved Decisions section. Four targeted edits:

1. Cross-process coordinator agreement -- replaces the
   all-to-all-with-local-union recommendation (which silently
   assumed synchronous gossip) with coordinator-proposed
   aggregation on a dedicated topic, signed with the operator
   key, with receiver-side bundle verification for censorship
   detection. Documents the rejected alternatives and the
   liveness/safety properties.

2. AttemptSeed source -- the DkgGroupPublicKey input to the
   seed derivation comes from the FFI signer material at
   attempt construction time, not from a wallet registry
   lookup. Removes hot-path async coupling and respects
   layering between core signing and application state.

3. SelectCoordinator seed bridging -- BeginAttempt wraps the
   legacy int64-seeded SelectCoordinator with a sterile,
   named adapter that folds the new [32]byte AttemptSeed into
   the legacy parameter shape. Bridge is exhaustively tested
   so later edits cannot accidentally desynchronise it.

4. Silence-parking transience -- Layer B exclusion policy now
   states explicitly that silence-based parking is
   single-attempt only with no escalation, so a peer falsely
   labelled silent (late delivery, coordinator censorship) is
   reinstated by the very next attempt. Permanent exclusion
   only follows from overflow or non-transport reject events,
   neither of which can fire on a slow-but-honest peer.

Also: removes a stale "(see open question 1)" reference in
Layer A, and adds compact decision blocks for the remaining
Phase-3 questions (signer-material binding, key reuse, JSON
format, message-size budget).

Open questions reduced to three: persistence across restart
(Phase 5+), FFI surface guidance (follows L5 pattern from
PR #425 / #3961), and AttemptContextHash backward-compat
horizon (Phase 6+).

No code changes. Implementation PRs reference these decisions
in their descriptions.
@mswilkison mswilkison merged commit 6214eec into feat/frost-schnorr-migration-scaffold May 23, 2026
15 checks passed
@mswilkison mswilkison deleted the docs/rfc-21-coordinator-aggregation-update-2026-05-22 branch May 23, 2026 00:15
mswilkison added a commit that referenced this pull request May 23, 2026
…idge (#3968)

## Summary

First Phase-3 implementation PR for **RFC-21**. Introduces the ROAST
coordinator state-machine surface (`Coordinator` interface, in-memory
implementation, attempt-handle identity, state enum) plus the sterile
seed-folding adapter that lets the new `[32]byte` `AttemptSeed` drive
the legacy `SelectCoordinator` helper without modifying it.

**No production code path uses the new \`Coordinator\` yet.** Phase 3
"ships unused" per the RFC. Phase 4 wires it into receivers behind the
\`frost_roast_retry\` build tag.

## What lands

### \`pkg/frost/roast/coordinator_state.go\`

| Surface | Role |
|---|---|
| \`AttemptState\` enum | \`Pending / Collecting / Aggregating /
Succeeded / Transitioned\` with \`String()\`. |
| \`AttemptHandle\` | Opaque per-attempt identity. \`ContextHash()\`
accessor cross-checks the bound context. |
| \`Coordinator\` interface | \`BeginAttempt(ctx) → handle\`,
\`State(handle) → state\`, \`SelectedCoordinator(handle) → member\`.
Later Phase-3 PRs (3.2 / 3.3 / 3.4) extend with \`TransitionMessage\`,
\`AggregateBundle\`, \`VerifyBundle\`, and \`NextAttempt\`. |
| \`NewInMemoryCoordinator()\` | Concurrent-safe via \`sync.Mutex\` +
\`atomic.Uint64\` next-id counter. |
| \`ErrUnknownAttempt\` | Sentinel for handle/instance mismatch. |

### \`pkg/frost/roast/seed_bridge.go\`

| Surface | Role |
|---|---|
| \`foldAttemptSeed(seed [32]byte) int64\` | First 8 bytes BE → int64
reinterpretation. Sterile, named, non-cryptographic adapter. Documented
contract: byte-identical input must produce byte-identical output on
every honest signer. |

\`BeginAttempt\` calls \`foldAttemptSeed\` and forwards to the existing
\`SelectCoordinator\` to elect the attempt's coordinator. The legacy
helper itself is **not modified** -- the bridge is the only thing
between RFC-21 contexts and the legacy seed format.

## Why the seed bridge

The legacy \`SelectCoordinator\` takes \`(seed int64, attemptNumber
uint)\`
and is correct in isolation. RFC-21 widens \`AttemptSeed\` to
\`[32]byte\` for the canonical-hash binding. We could rewrite the
shuffle, but rewriting cryptographic-consensus logic that already
agrees across the network is the wrong trade-off; the audit and
behaviour are settled.

The bridge satisfies the resolved decision in RFC-21:
> \"BeginAttempt wraps it with a sterile bridge that folds the new
> [32]byte AttemptSeed into the legacy parameter shape... The bridge
> is named, isolated, and exhaustively tested so later edits cannot
> accidentally desynchronise it.\"

## Test coverage

### \`coordinator_state_test.go\` (9 tests)

- \`TestBeginAttempt_ReturnsHandleWithMatchingContextHash\`
- \`TestBeginAttempt_HandlesAreDistinctAcrossAttempts\`
- \`TestBeginAttempt_RejectsEmptyIncludedSet\` (defence-in-depth)
- \`TestState_ReturnsCollectingAfterBegin\`
- \`TestState_UnknownHandleReturnsSentinel\`
- \`TestSelectedCoordinator_ReturnsMemberFromIncludedSet\`
- \`TestSelectedCoordinator_IsDeterministicForSameContext\` -- two
  independent \`Coordinator\` instances agree on the elected member
-
\`TestSelectedCoordinator_DifferentAttemptNumbersCanProduceDifferentLeaders\`
  -- 16 attempts produce ≥2 distinct leaders, defending the ROAST
  leader-rotation property
- \`TestSelectedCoordinator_UnknownHandleReturnsSentinel\`
- \`TestInMemoryCoordinator_ConcurrentBeginAttemptsAreRaceSafe\` --
  16 goroutines × 50 calls each, all handles unique
- \`TestAttemptState_String\` -- all enum values + unknown sentinel

### \`seed_bridge_test.go\` (5 tests)

- \`TestFoldAttemptSeed_IsDeterministic\`
- \`TestFoldAttemptSeed_TakesFirst8BytesBigEndian\` -- specific
  byte pattern verified
- \`TestFoldAttemptSeed_IgnoresBytesAfterIndex7\` -- documents the
  contract: bytes 8..31 don't influence output (still bound at
  the \`AttemptContext.Hash()\` layer)
- \`TestFoldAttemptSeed_FirstByteSwept\` -- 256-value sweep of the
  high byte produces 256 distinct outputs (no collisions)
- \`TestFoldAttemptSeed_GoldenFixture\` -- literal int64 value
  locks the wire-format reduction; literal drift caught at
  code review

### Verification

| Command | Result |
|---|---|
| \`go build ./...\` | clean |
| \`go test ./pkg/frost/roast/...\` | pass (14 cases) |
| \`go test -race ./pkg/frost/roast/...\` | pass |
| \`go test -tags 'frost_native frost_tbtc_signer' ./pkg/frost/...\` |
pass (5 packages) |
| \`staticcheck -checks '-SA1019' ./pkg/frost/roast/...\` | silent |
| \`go vet ./pkg/frost/roast/...\` | clean |

## Test plan

- [ ] CI green.
- [ ] Reviewer confirms the seed bridge's discard of bytes 8..31 is
  acceptable. (Bytes 8..31 still appear in \`AttemptContext.Hash()\`,
  so any mutation is detected at the protocol-message layer in
  Phase 1B; the bridge merely reduces 256-bit input to the 64-bit
  width \`SelectCoordinator\` needs.)
- [ ] Reviewer confirms the \`Coordinator\` interface scope is
  appropriate for Phase 3.1 (state surface only). Phase 3.2 will
  extend with \`TransitionMessage\` types.

Refs RFC-21 Phase 3 (\`docs/rfc/rfc-21-*\`). Stacked at the integration
tip after #3967 merged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant