Skip to content

feat: add baremetal TDX/SNP and NVIDIA GPU attestation support#21

Merged
butler54 merged 7 commits intovalidatedpatterns:mainfrom
butler54:feature/baremetal-attestation
May 5, 2026
Merged

feat: add baremetal TDX/SNP and NVIDIA GPU attestation support#21
butler54 merged 7 commits intovalidatedpatterns:mainfrom
butler54:feature/baremetal-attestation

Conversation

@butler54
Copy link
Copy Markdown
Collaborator

@butler54 butler54 commented May 5, 2026

Summary

This PR adds support for bare metal Intel TDX and AMD SEV-SNP attestation, and NVIDIA GPU attestation via NRAS remote verifier, to the trustee-chart.

These changes enable trustee to validate attestation evidence from:

  • Bare metal Intel TDX hosts (using init_data hash verification)
  • Bare metal AMD SEV-SNP hosts (using init_data hash verification)
  • NVIDIA H100/H200 GPUs in confidential VMs (via NRAS remote verifier)

This is a dependency for coco-pattern PR validatedpatterns/coco-pattern#73 which adds bare metal support.

Changes

Baremetal TDX/SNP Attestation (commits a00c494, 45eb36b, 04b7904)

  • Add attestation policies for tdx and snp TEE types using init_data verification
  • Make pcr-stash secret lookup conditional (bare metal lacks Azure vTPM PCRs)
  • Support both Azure vTPM PCR-extended hashes and bare metal raw init_data hashes
  • Disable TLS cert verification for PCCS (uses self-signed cert)

NVIDIA GPU Attestation (commit ad42001)

  • Add kbs.gpu.enabled value (default false)
  • Configure NRAS remote verifier when GPU attestation is enabled
  • Add default_gpu.rego policy for NVIDIA attestation claims
  • Update resource policy to require both CPU and GPU attestation when enabled

ACM Compatibility (commit cfda641)

  • Fix RVPS reference value construction to use chained single-item append calls
  • ACM ConfigurationPolicy template engine rejects variadic append operations

Backwards Compatibility

All changes are backwards compatible:

  • New attestation policies are guarded by TEE type checks (only apply to matching evidence)
  • GPU features require explicit kbs.gpu.enabled: true opt-in
  • pcr-stash lookup wrapped in conditional check (no-op if secret missing)
  • No existing values removed or renamed

Testing

Tested on:

  • Intel TDX bare metal (Dell PowerEdge with TDX)
  • AMD SEV-SNP bare metal
  • Azure CVM with vTPM (regression test)

Related

🤖 Generated with Claude Code

butler54 and others added 5 commits May 5, 2026 20:08
Add direct TEE attestation rules for baremetal Intel TDX and AMD SEV-SNP.
These use init_data hash verification (platform-independent) rather than
Azure vTPM PCR measurements.

Make pcr-stash secret lookup conditional in RVPS policy so baremetal
deployments (which lack pcr-stash) don't fail. The init_data reference
value is always included for both Azure and baremetal platforms.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The PCCS service uses a self-signed certificate which causes
SGX_QL_ROOT_CA_UNTRUSTED errors during TDX quote verification.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The init_data RVPS entry now includes four values:
- PCR8_HASH (secure + debug): SHA256(zeros || SHA256(toml)) for Azure vTPM
- RAW_HASH padded (secure + debug): SHA256(toml) zero-padded to 48 bytes for baremetal TDX/SNP

This allows a single attestation server to validate both Azure vTPM
attestation (which presents PCR-extended hashes) and baremetal TDX/SNP
attestation (which presents raw SHA-256 initdata hashes in the quote's
mr_config_id field, zero-padded to SHA-384 width).

Long-term, veritas (https://github.com/confidential-devhub/veritas)
should be integrated for comprehensive reference value generation
including firmware, kernel, and RTMR measurements.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ACM ConfigurationPolicy template engine rejects variadic append
(want 2 got 11). Chain individual append calls instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add kbs.gpu.enabled value (default false) for GPU attestation support
- Configure NRAS remote verifier when GPU enabled (kbs-config-map)
- Add default_gpu.rego policy for NRAS x-nvidia-* claims
- Add GPU-aware resource policy requiring both cpu0 and gpu0 affirming
- Existing GPU rules in default_cpu.rego handle CPU-class + GPU evidence

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@butler54 butler54 force-pushed the feature/baremetal-attestation branch from ad42001 to a2aa3cc Compare May 5, 2026 11:08
butler54 and others added 2 commits May 5, 2026 20:15
Previously, when GPU attestation was enabled, the policy would still
allow access with only CPU attestation due to the first rule being
unconditionally present. This fix ensures the CPU-only rule only
applies when GPU is disabled, preventing the bypass.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Chris Butler <chris.butler@redhat.com>
@butler54 butler54 merged commit f928983 into validatedpatterns:main May 5, 2026
4 checks passed
butler54 added a commit to butler54/coco-pattern that referenced this pull request May 5, 2026
Replace butler54/trustee-chart.git fork reference with upstream
chart reference now that validatedpatterns/trustee-chart#21 has
merged and released as v0.3.3.

The 0.3.3 release includes baremetal TDX/SNP attestation support
and NVIDIA GPU attestation via NRAS remote verifier.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
butler54 added a commit to validatedpatterns/coco-pattern that referenced this pull request May 5, 2026
* feat: add bare metal support for Intel TDX and AMD SEV-SNP

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: update baremetal values to use released charts

Replace git branch references (repoURL/targetRevision/path) with
released Helm chart references (chart/chartVersion) for trustee,
sandboxed-containers, and sandboxed-policies in values-baremetal.yaml.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add TDX kernel flag and enable intel-dcap for baremetal

Add tdx.enabled flag (default true) to baremetal chart to conditionally
set kvm_intel.tdx=1 kernel argument. Without this, the kvm_intel module
does not activate TDX and NFD cannot detect it.

Enable intel-dcap application in values-baremetal.yaml for PCCS/QGS
attestation services.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove unused runtime class, kernel params, and commented-out templates

Address PR review feedback:
- Remove detect-runtime-class.yaml (OSC operator manages RuntimeClass)
- Remove bm-kernel-params.yaml and kernel-params-mco.yaml (config should
  be provided via initdata or pod annotations to avoid inconsistencies)
- Remove commented-out runtimeclass templates for AMD SNP and Intel TDX

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: update to OSC 1.12 / Trustee 1.1.0

Signed-off-by: Chris Butler <chris.butler@redhat.com>

* feat: integrate Kyverno and update trustee config for baremetal

- Add Kyverno chart and coco-kyverno-policies to baremetal values
- Update trustee chart to 0.3.* with kbs.admin.format v1.1
- Remove bypassAttestation (proper attestation via init_data)
- Remove explicit runtimeClassName overrides (auto-detected by platform)
- Add syncPolicy prune to hello-openshift and kbs-access
- Reset default clusterGroupName to simple

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: set clusterGroupName to baremetal for deployment testing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add UPDATE operation to initdata injection policy

The policy only fired on Pod/Deployment CREATE, so pods created before
the initdata ConfigMap existed never got the cc_init_data annotation.
Adding UPDATE allows Kyverno to inject the annotation when a Deployment
is updated (e.g. by ArgoCD sync), triggering a rolling restart with
the correct initdata.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add intel-device-plugins-operator subscription for SGX/TDX quote generation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: enable TDX config in trustee to point QCNL at local PCCS service

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: store raw SHA-256 hash alongside PCR8 hash in initdata ConfigMaps

Adds RAW_HASH field to both initdata and debug-initdata ConfigMaps.

PCR8_HASH = SHA256(zeros || SHA256(toml)) — used by Azure vTPM attestation
RAW_HASH = SHA256(toml) — used by baremetal TDX/SNP attestation

Both are needed because Azure and baremetal present initdata differently
in their attestation evidence. A single Trustee attestation server must
accept both formats to support multi-platform deployments.

Future: integrate veritas for comprehensive reference value generation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: point trustee at feature branch for baremetal attestation testing

Temporarily uses butler54/trustee-chart feature/baremetal-attestation
branch instead of released chart. This branch includes:
- Baremetal TDX and SNP attestation rules
- Conditional pcr-stash (no error on baremetal without vTPM)
- Raw init_data hash (zero-padded) for baremetal attestation
- TDX QCNL config with use_secure_cert: false for local PCCS

Revert to chartVersion after merging and releasing trustee chart.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: increase kata VM memory for kbs-access to 8192MB

The kbs-access-app container image is ~1GB which causes container
creation timeouts with the default 2GB kata VM memory.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: target Pods only for cc_init_data injection, disable autogen

The autogen Deployment rule causes admission failures when the initdata
ConfigMap hasn't been propagated to the workload namespace yet. By
targeting Pods only (autogen-controllers: none), Deployments are admitted
without ConfigMap resolution. Pods get cc_init_data injected at creation
time when the ConfigMap is available. A rollout restart picks up new
initdata values.

Also removes UPDATE operation — only CREATE is needed since a rollout
restart creates new Pods.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use ${initial_pcr} braces in PCR8 hash computation

Without braces, bash treats $initial_pcr followed by the hex hash
as a single undefined variable name, producing SHA-256 of empty
string instead of the correct PCR extend value.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: address PR #73 review comments and merge PR #75 documentation

This commit addresses all review comments from bpradipt and pawelpros on
PR #73, merges documentation from PR #75, and updates container images.

Documentation changes:
- README: Replace "peer-pod infrastructure" wording to clarify Azure vs bare metal
- README: Update OCP version requirements from 4.17+ to 4.19.28+ (OSC 1.12 requirement)
- README: Clarify PCR collection differs for Azure (get-pcr.sh) vs bare metal (manual)
- README: Distinguish Azure (kata-remote) from bare metal (kata-cc) runtime classes
- values-secret.yaml.template: Add missing kbsPrivateKey secret
- values-secret.yaml.template: Reorganize with clear section headers and improved docs
- gen-secrets.sh: Add prominent alert when values-secret file is created
- Merge docs/nfd-matchall-bug.md from PR #75 (NFD matchAll bug report)
- Merge docs/pcr-reference-values-bare-metal.md from PR #75 (PCR collection guide)

Code cleanup:
- Delete obsolete qgs-config-cm.yaml (QGS args now inline)
- Delete obsolete qgs-sgx-cm.yaml (QCNL config via downwardAPI)
- Remove commented-out detect-runtime-class reference in values-baremetal.yaml

Image updates:
- intel-dpo-sgx.yaml: Update intel-sgx-plugin to sha256:4ac8769c (v0.35.0)
- pccs-deployment.yaml: Update osc-pccs to sha256:edf57087 (v1.12)
- qgs-ds.yaml: Update osc-tdx-qgs to sha256:308d66da (v1.12)

Resolves review comments from:
- bpradipt: peer-pod wording, OCP versions, PCR clarification
- pawelpros: obsolete ConfigMaps, image digests, PCR requirements

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: revert clusterGroupName to simple for main branch merge

The clusterGroupName was changed to 'baremetal' in commit a601af0 for
deployment testing. Reverting to 'simple' as the default so existing
users are not affected when this PR merges to main.

The baremetal clusterGroup remains available by setting
clusterGroupName: baremetal in user overrides or CI.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: update trustee chart to use upstream 0.3.3 release

Replace butler54/trustee-chart.git fork reference with upstream
chart reference now that validatedpatterns/trustee-chart#21 has
merged and released as v0.3.3.

The 0.3.3 release includes baremetal TDX/SNP attestation support
and NVIDIA GPU attestation via NRAS remote verifier.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Signed-off-by: Chris Butler <chris.butler@redhat.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant