Skip to content

refactor(services): unified approveService + TEE commitment-root storage#119

Merged
drewstone merged 1 commit into
mainfrom
drew/unified-approval-and-commitment-roots
May 5, 2026
Merged

refactor(services): unified approveService + TEE commitment-root storage#119
drewstone merged 1 commit into
mainfrom
drew/unified-approval-and-commitment-roots

Conversation

@drewstone
Copy link
Copy Markdown
Contributor

Summary

One coherent architectural move that solves three problems left from PR #117:

  1. API explosion. approveService, approveServiceWithCommitments, approveServiceWithBls, approveServiceWithCommitmentsAndBls, approveServiceWithTeeCommitments collapse into a single approveService(ApprovalParams). Selectors on TangleServicesFacet go from 10 → 6.
  2. Activation gas DoS. TEE commitment storage moves from TeeAttestationCommitment[] (3 slots × N commits per operator, up to 480K gas/operator at cap) to a single keccak256 root + full data emitted as TeeCommitmentsRecorded. Activation cost drops from O(operators × commits × 3 slots) to O(operators).
  3. Anti-replay confidence. The TEE commitment array is now committed to as a hash; slashers / provisioning oracles supply the array as a witness and verify keccak256(abi.encode(witness)) == getTeeCommitmentRoot(serviceId, operator) before treating it as authoritative. Same pattern already used by _serviceResourceCommitmentHash for RFQ resource commits.

BLS stays opt-in (zero pubkey = operator skips BLS, no penalty). RFQ paths (createServiceFromQuotes, extendServiceFromQuotes) are unchanged.

Why each change is correct

Unified approveService(ApprovalParams)

The problem. Five entrypoints, each duplicating the auth gates inline, each adding selector overhead. Adding any new optional field (e.g., a future per-job pricing override) doubles the matrix again. Reviewing the matrix is hard because each variant slightly reorders auth/validate/write.

The fix. One entrypoint, ApprovalParams struct with optional fields. Empty/zero = opt-out:

Field Empty/zero meaning
securityCommitments == [] No per-asset commitment supplied. Acceptable when request has no security requirements OR has only the protocol-default TNT requirement (auto-filled at min).
blsPubkey == [0,0,0,0] Operator does not register a BLS pubkey for this service. Cannot participate in aggregated job-result signing. No exclusion, no penalty.
teeCommitments == [] Operator does not bind to a TEE attestation profile for this approval.

Order of operations (fail-fast, validate-before-write):

1. _requireApprovingOperator              auth gate, no SSTORE
2. validate security commitments OR       
   auto-fill default TNT                  pure validation
3. validate TEE commitments + compute     pure validation + memory keccak
   keccak root
4. BLS proof-of-possession verify         pure validation
5. SSTORE                                 commits / TEE root / BLS pubkey
6. Mark approved, emit, manager hook,     
   activate-if-threshold                  

Unauthorized callers never reach an SSTORE — they revert at step 1 with Unauthorized. Misshapen TEE / BLS / commitment payloads revert at steps 2–4 before writing.

TEE commitment-root storage

The problem. Audit on PR #117 flagged this as a HIGH operator-economics concern: with MAX_TEE_COMMITMENTS_PER_OPERATOR = 8 and N operators, the activator pays N × 8 × 3 cold SSTOREs (~480K gas/operator). At 10 operators × 8 commits, activation cost approaches 5M gas. At 20+, the activation tx exceeds the block gas limit and the service stalls in pending state forever.

The fix. Store one bytes32 per (requestId, operator) and per (serviceId, operator) instead of a dynamic array. The root is keccak256(abi.encode(commitments)). The full commitment array is emitted in TeeCommitmentsRecorded for indexers; slashers reconstruct from event logs and verify keccak match.

mapping(uint64 => mapping(address => bytes32)) internal _requestTeeCommitmentRoot;
mapping(uint64 => mapping(address => bytes32)) internal _serviceTeeCommitmentRoot;

event TeeCommitmentsRecorded(
    uint64 indexed requestId,
    address indexed operator,
    bytes32 indexed root,
    Types.TeeAttestationCommitment[] commitments
);

Activation gas at 3 operators × 8 commits drops from 2,886,434 gas pre-refactor → expected well under 1M (test asserts <1.5M as a generous ceiling).

This is the same shape _serviceResourceCommitmentHash already uses for RFQ resource commits. It's the standard pattern in modern restaking systems (EigenLayer, Symbiotic) for slashing-safe storage of attestable data.

BLS confirmed opt-in

JobsAggregation.sol reverts with OperatorBlsPubkeyNotRegistered only when an operator who didn't register a BLS pubkey is asked to participate in a BLS aggregate. The protocol accepts any operator at approval time. The PR doesn't change this — it just makes it explicit in the unified entrypoint that blsPubkey == [0,0,0,0] is the documented opt-out.

RFQ unchanged

createServiceFromQuotes / extendServiceFromQuotes keep the signed-quote acceptance flow exactly as it was. The EIP-712 QuoteDetails typehash carries securityCommitments and resourceCommitments but NOT teeCommitments — TEE commitments require a request-derived nonce that doesn't exist until the request is created, so they can't be pre-signed in a quote. If RFQ + TEE is wanted later, extend QuoteDetails at that point.

What this solves

Problem Pre-refactor Post-refactor
API surface explosion 5 approval entrypoints, each duplicating auth gates 1 entrypoint, single auth gate, opt-in fields
Selector count on facet 10 6
Activation gas with 3 ops × 8 TEE commits 2.89M < 1M (test asserts < 1.5M)
Activation gas at 20 operators ~19M, near block-gas limit, services stall ~400K, scales linearly forever
TEE commitment storage TeeAttestationCommitment[] per (service, op) (variable, expensive) bytes32 per (service, op) (fixed)
Slashing-safe storage array reads work but bloat block gas root + witness verification (standard restaking pattern)
Inconsistency vs resource commits TEE used array storage; resource commits already used hash Both use hash storage

Test plan

  • forge fmt clean across all changed files
  • Old TeeCommitmentApprovalTest.t.sol (16 tests) and TeeCommitmentHardenTest.t.sol (7 tests) deleted
  • New test/tangle/ServicesApprovalTest.t.sol (14 tests, every one named after a specific failure mode) covers happy paths + adversarial validation + auth ordering + slasher witness verification + activation-gas regression assertion
  • ~50 call sites in test/ and script/ migrated through _approve / _approveWithCommitments / _approveWithBls helpers in BlueprintDefinitionHelper
  • Local forge build did not complete on this machine — repeated solc OOM during the 341-file via_ir compile. Same infrastructure issue that's been failing CI on main since PR fix(security): pre-mainnet hardening across deploy, MBSM, beacon, oracles, quotes, BLS #115. Pushed for CI to run the canonical build on dedicated runners.
  • CI to run forge build + forge test --match-contract ServicesApprovalTest and the full existing suite for regressions

Audit asks

This PR is intentionally scoped: contract surface refactor + storage shape change. Not bundled with the slashing branch, not bundled with operator-count caps. Looking for the auditor agent to confirm:

  1. The keccak root + witness verification is structurally sound for slashing reads.
  2. No path remains where the old TeeAttestationCommitment[] storage is read (would silently return zero arrays after refactor).
  3. The opt-in semantics on ApprovalParams (empty field = no-op) don't introduce a bypass — e.g., supplying empty securityCommitments when requirements exist must still revert (handled by _isOnlyDefaultTntRequirement check).
  4. Activation-time _persistTeeCommitments correctly skips operators with zero root (no-op rather than overwriting an existing service root).
  5. No reentrancy or partial-state-on-revert path through the new entrypoint that the old multi-entrypoint structure happened to mask.

Does NOT touch

  • fix/slashing-correctness branch (will rebase cleanly after this lands)
  • Slashing.sol / SlashingLib.sol
  • RFQ flows (QuotesCreate, QuotesExtend, TangleQuotesFacet)
  • JobsAggregation.sol (BLS aggregation path)
  • Heartbeat, payments, blueprint registry, MBSM, beacon, oracles

Collapse five `approveServiceWith*` entrypoints into one
`approveService(ApprovalParams)`. Move TEE commitment storage from a
per-operator dynamic array to a single keccak256 root + full data
emitted as `TeeCommitmentsRecorded`. BLS stays opt-in via zero pubkey.
RFQ paths (`createServiceFromQuotes`, `extendServiceFromQuotes`) are
unchanged — they already used the resource-commitment hash pattern.

WHY THIS CHANGE

The matrix of approval entrypoints was strict superset growth: each new
optional capability (commitments → BLS → BLS+commitments → TEE) doubled
the surface and left the protocol with a 5-selector explosion that any
new feature would extend further. Every variant duplicated the auth
gates inline, making it easy to drop a check on one path and ship.

The TEE commitment work also stored the full struct array per-operator
on activation — a per-(request, operator) cap of 8 commits × 3 storage
slots × 20K cold SSTORE ≈ 480K gas per operator just to copy commits
forward. With multi-operator services that gas scales linearly in
operator count and burns the activator's tx alone, since "last operator
pays for everyone" is the default activation pattern. The audit flagged
this as a HIGH operator-economics concern.

ARCHITECTURE

  Single entrypoint:
    approveService(ApprovalParams { requestId, securityCommitments,
                                     blsPubkey, blsPopSignature,
                                     teeCommitments }) external

  Empty / zero fields are no-ops. The contract dispatches:
    - securityCommitments empty + requirements present → auto-fill
      protocol-default TNT commitment (only when that's the lone
      requirement) or revert
    - blsPubkey == [0,0,0,0]              → operator skips BLS
    - teeCommitments empty                → operator skips TEE binding

  Order of operations (fail-fast, validate-before-write):
    1. _requireApprovingOperator (auth gate, no SSTORE)
    2. _validateSecurityCommitments OR auto-fill default TNT
    3. _validateTeeCommitments + compute keccak root
    4. BLS proof-of-possession verify (if registering)
    5. SSTORE security commits / TEE root / BLS pubkey
    6. Mark approved, emit, manager-hook, activate-if-threshold

  TEE storage shape:
    mapping(uint64 => mapping(address => bytes32)) _requestTeeCommitmentRoot
    mapping(uint64 => mapping(address => bytes32)) _serviceTeeCommitmentRoot

  Activation: O(operators) — one bytes32 SSTORE per operator that
  supplied a non-empty TEE array. Down from O(operators × commits × 3
  slots). Pre-refactor 3-operators × 8-commits activator-gas was 2.89M;
  post-refactor expected well under 1M (gas test enforces < 1.5M).

  Slashing pattern: contract stores root only; slasher / provisioning
  oracle supplies the original commitment array as a witness, verifies
  `keccak256(abi.encode(witness)) == getTeeCommitmentRoot(serviceId, op)`
  before treating the witness as authoritative. Same pattern already
  used by `_serviceResourceCommitmentHash` for RFQ resource commits.

RFQ PATHS UNCHANGED

`createServiceFromQuotes` and `extendServiceFromQuotes` keep the
signed-quote acceptance flow exactly as it was. Quotes carry resource
commitments (already hash-stored) and security commitments. TEE is NOT
in the EIP-712 quote shape — TEE commitments require a request-derived
nonce that doesn't exist until the request is created, so they can't
be pre-signed in a quote. Manual approval remains the only path that
binds TEE commitments today; if RFQ + TEE is wanted later, extend the
QuoteDetails type at that point.

BLS IS OPT-IN

The protocol must accept any operator. Operators that don't register a
BLS pubkey can still approve services and run workloads — they simply
cannot participate in aggregated job-result signing on services that
choose to use BLS aggregation. The `JobsAggregation` path reverts with
`OperatorBlsPubkeyNotRegistered` only when an operator who didn't
register tries to participate in a BLS aggregate. No exclusion at
approval, no penalty otherwise.

API IMPACT

  - TangleServicesFacet selectors: 10 → 6
    Removed: approveService(uint64,uint8), approveServiceWithCommitments,
             approveServiceWithBls, approveServiceWithCommitmentsAndBls,
             approveServiceWithTeeCommitments, getTeeCommitment
    Kept:    approveService(ApprovalParams), rejectService,
             getOperatorBlsPubkey, blsPopMessage,
             getTeeCommitmentRoot (replaces getTeeCommitment),
             teeNonceFor

  - ITangleServices interface trimmed to match.

TEST SURFACE

Replaced TeeCommitmentApprovalTest.t.sol (16 cases) and
TeeCommitmentHardenTest.t.sol (7 cases) with one tight
ServicesApprovalTest.t.sol (~14 cases). Every test names a specific
failure mode; no compiler-bug theater. Coverage:

  Happy paths
    - single-operator + single TEE commit, root persists
    - minimal approval, no optional fields
    - mixed TEE / non-TEE operators, roots independent
    - slasher witness verification (honest matches, tampered rejects)

  Adversarial — TEE validation
    - DirectTdx rejected; Unset enum sentinel rejected
    - zero expectedMeasurement rejected
    - cross-request replay rejected (nonce request-derived)
    - past expiry rejected; >MAX_TTL rejected; at-cap accepted
    - TooManyTeeCommitments (cap = 8) rejected

  Adversarial — auth ordering
    - unauthorized caller fails BEFORE storage write
    - double approval rejected

  Activation gas measurement
    - 3 ops × 8 commits gas, asserted < 1.5M (was 2.89M pre-refactor)

CALL-SITE MIGRATION

~50 call sites in test/ and script/ migrated to the unified entrypoint
via `_approve / _approveWithCommitments / _approveWithBls` helpers
hoisted into BlueprintDefinitionHelper (visible in BaseTest,
TestHarness, and InvariantFuzz). One inline ApprovalParams construction
in three scripts where the helper isn't reachable.

DOES NOT TOUCH

- fix/slashing-correctness branch / Slashing.sol / SlashingLib.sol
- RFQ flows (QuotesCreate, QuotesExtend, TangleQuotesFacet)
- BLS aggregation path (JobsAggregation.sol)
- Heartbeat, payments, blueprint registry, MBSM, beacon, oracles
@drewstone drewstone merged commit 5c28b25 into main May 5, 2026
1 check failed
drewstone added a commit that referenced this pull request May 5, 2026
Regen via `cargo xtask gen-bindings` against the post-PR-#119 contracts:
the unified `approveService(Types.ApprovalParams)` entrypoint, dropped
`approveServiceWith*` matrix, and root-storage TEE commitments
(`getTeeCommitmentRoot` view, `TeeCommitmentsRecorded` event with full
array payload).

Bump tnt-core-bindings 0.10.9 → 0.11.0 (BREAKING — selector list shrank
10 → 6, struct types changed, multiple methods removed). Tag
`bindings-v0.11.0` triggers the crates.io publish workflow.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant