Skip to content

feat(server): device-authorization for P2P gRPC (SA token + fabric-device check)#1

Draft
wseaton wants to merge 2 commits into
mainfrom
weaton/p2p-grpc-auth
Draft

feat(server): device-authorization for P2P gRPC (SA token + fabric-device check)#1
wseaton wants to merge 2 commits into
mainfrom
weaton/p2p-grpc-auth

Conversation

@wseaton
Copy link
Copy Markdown
Owner

@wseaton wseaton commented May 25, 2026

Draft PR for visibility against my fork's main. Tracks the upstream RFC:
ai-dynamo#298 — [RFC] Coordinator gRPC authorization using device assertions.

Summary

  • Server-side device authorization layer for P2pService gRPC: verifies the caller's bound projected ServiceAccount token via Kubernetes TokenReview, then confirms the caller's pod actually holds a fabric device (RDMA/IB, RoCE, EFA via device-plugin resources, or DRA ResourceClaim by device class).
  • Three modes: off / permissive (verify+log, never block) / enforce (reject).
  • Helm chart wiring under p2pAuth.* plus cluster-scoped RBAC template; Python client attaches a projected token on each RPC and re-reads on rotation.
  • E2E coverage: GPU-less cluster e2e (ci/k8s/auth/cluster-e2e.sh) and a real-transfer GPU e2e (ci/k8s/auth/gpu-e2e.sh).

Notes

  • TLS for the gRPC channel is intentionally not in this PR — split out for separate review.
  • Helm value keys (p2pAuth.*) kept stable; user-facing prose/log lines renamed to "device authorization".

Test plan

  • cargo build -p modelexpress-server clean
  • cargo test -p modelexpress-server auth:: passes
  • ci/k8s/auth/cluster-e2e.sh against a real cluster: enforce path asserts no-token Unauthenticated, device-holding pod OK, device-less pod PermissionDenied
  • ci/k8s/auth/gpu-e2e.sh end-to-end: target logs RDMA transfer complete

…ice check

  Gates the P2pService route on a bound projected SA token (verified via
  TokenReview) and the caller pod requesting a fabric device. Modes: off /
  permissive / enforce. Helm + ClusterRBAC + Python client interceptor + docs.

  See the security RFC for design rationale.

  Assisted-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <weaton@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant