Skip to content

feat: add bare metal support for Intel TDX and AMD SEV-SNP#73

Open
butler54 wants to merge 18 commits intovalidatedpatterns:mainfrom
butler54:baremetal-tp-releases-squashed
Open

feat: add bare metal support for Intel TDX and AMD SEV-SNP#73
butler54 wants to merge 18 commits intovalidatedpatterns:mainfrom
butler54:baremetal-tp-releases-squashed

Conversation

@butler54
Copy link
Copy Markdown
Collaborator

@butler54 butler54 commented Mar 9, 2026

Summary

  • Adds a new baremetal clusterGroup for deploying CoCo on bare metal with Intel TDX or AMD SEV-SNP hardware
  • NFD auto-detects CPU TEE capabilities and labels nodes accordingly
  • RuntimeClasses for kata-tdx and kata-snp created automatically
  • MachineConfigs for kernel parameters (TDX) and vsock device access
  • Intel DCAP chart with PCCS and QGS services for TDX attestation
  • Storage support via HPP, LVMS, or external providers
  • PCCS secrets generation added to gen-secrets.sh
  • Platform override files for BareMetal and None platforms
  • Documentation for Dell TDX configuration, NFD notes, and bare metal PCR reference values

Test plan

  • Deploy baremetal clusterGroup on Intel TDX hardware
  • Deploy baremetal clusterGroup on AMD SEV-SNP hardware
  • Verify NFD correctly labels nodes with TEE capabilities
  • Verify kata-tdx/kata-snp RuntimeClasses are created
  • Verify PCCS and QGS services deploy on Intel nodes
  • Verify existing Azure deployments (simple, trusted-hub, spoke) are unaffected

🤖 Generated with Claude Code

@butler54 butler54 requested a review from a team March 9, 2026 06:30
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@butler54 butler54 force-pushed the baremetal-tp-releases-squashed branch from b4eaf36 to bad2552 Compare March 10, 2026 02:22
Comment thread ansible/detect-runtime-class.yaml Outdated
Comment thread charts/all/baremetal/bm-kernel-params.yaml Outdated
butler54 and others added 4 commits March 10, 2026 15:01
Replace git branch references (repoURL/targetRevision/path) with
released Helm chart references (chart/chartVersion) for trustee,
sandboxed-containers, and sandboxed-policies in values-baremetal.yaml.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add tdx.enabled flag (default true) to baremetal chart to conditionally
set kvm_intel.tdx=1 kernel argument. Without this, the kvm_intel module
does not activate TDX and NFD cannot detect it.

Enable intel-dcap application in values-baremetal.yaml for PCCS/QGS
attestation services.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mplates

Address PR review feedback:
- Remove detect-runtime-class.yaml (OSC operator manages RuntimeClass)
- Remove bm-kernel-params.yaml and kernel-params-mco.yaml (config should
  be provided via initdata or pod annotations to avoid inconsistencies)
- Remove commented-out runtimeclass templates for AMD SNP and Intel TDX

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@butler54 butler54 requested a review from bpradipt March 23, 2026 07:50
butler54 and others added 12 commits April 20, 2026 08:44
Signed-off-by: Chris Butler <chris.butler@redhat.com>
Conflicts resolved:
- _helpers.tpl: kept runtimeClassName override support from baremetal
- kbs-access/values.yaml: merged main's structure with runtimeClassName param
- kbs-access/secure-pod.yaml: accepted deletion (replaced by secure-deployment.yaml)
- kbs-access/secure-deployment.yaml: added runtimeClassName values override support

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add Kyverno chart and coco-kyverno-policies to baremetal values
- Update trustee chart to 0.3.* with kbs.admin.format v1.1
- Remove bypassAttestation (proper attestation via init_data)
- Remove explicit runtimeClassName overrides (auto-detected by platform)
- Add syncPolicy prune to hello-openshift and kbs-access
- Reset default clusterGroupName to simple

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The policy only fired on Pod/Deployment CREATE, so pods created before
the initdata ConfigMap existed never got the cc_init_data annotation.
Adding UPDATE allows Kyverno to inject the annotation when a Deployment
is updated (e.g. by ArgoCD sync), triggering a rolling restart with
the correct initdata.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e generation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds RAW_HASH field to both initdata and debug-initdata ConfigMaps.

PCR8_HASH = SHA256(zeros || SHA256(toml)) — used by Azure vTPM attestation
RAW_HASH = SHA256(toml) — used by baremetal TDX/SNP attestation

Both are needed because Azure and baremetal present initdata differently
in their attestation evidence. A single Trustee attestation server must
accept both formats to support multi-platform deployments.

Future: integrate veritas for comprehensive reference value generation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Temporarily uses butler54/trustee-chart feature/baremetal-attestation
branch instead of released chart. This branch includes:
- Baremetal TDX and SNP attestation rules
- Conditional pcr-stash (no error on baremetal without vTPM)
- Raw init_data hash (zero-padded) for baremetal attestation
- TDX QCNL config with use_secure_cert: false for local PCCS

Revert to chartVersion after merging and releasing trustee chart.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The kbs-access-app container image is ~1GB which causes container
creation timeouts with the default 2GB kata VM memory.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The autogen Deployment rule causes admission failures when the initdata
ConfigMap hasn't been propagated to the workload namespace yet. By
targeting Pods only (autogen-controllers: none), Deployments are admitted
without ConfigMap resolution. Pods get cc_init_data injected at creation
time when the ConfigMap is available. A rollout restart picks up new
initdata values.

Also removes UPDATE operation — only CREATE is needed since a rollout
restart creates new Pods.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Without braces, bash treats $initial_pcr followed by the hex hash
as a single undefined variable name, producing SHA-256 of empty
string instead of the correct PCR extend value.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@pawelpros
Copy link
Copy Markdown

Today I have tested this PR with @butler54.

Tests performed:

  • [SUCCESS] Deploy baremetal clusterGroup on Intel TDX hardware
  • [SUCCESS] Verify NFD correctly labels nodes with TEE capabilities
  • [SUCCESS] Verify kata-tdx RuntimeClasses are created
  • [SUCCESS] Verify PCCS and QGS services deploy on Intel nodes

metadata:
name: sgxdeviceplugin-sample
spec:
image: registry.connect.redhat.com/intel/intel-sgx-plugin@sha256:f2c77521c6dae6b4db1896a5784ba8b06a5ebb2a01684184fc90143cfcca7bf4
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
image: registry.connect.redhat.com/intel/intel-sgx-plugin@sha256:f2c77521c6dae6b4db1896a5784ba8b06a5ebb2a01684184fc90143cfcca7bf4
image: registry.connect.redhat.com/intel/intel-sgx-plugin@sha256:4ac8769c4f0a82b3ea04cf1532f15e9935c71fe390ff5a9dc3ee57f970a65f0b

privileged: true # Required for chcon to work on host files
containers:
- name: pccs
image: registry.redhat.io/openshift-sandboxed-containers/osc-pccs@sha256:de64fc7b13aaa7e466e825d62207f77e7c63a4f9da98663c3ab06abc45f2334d
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dnsPolicy: ClusterFirstWithHostNet
initContainers:
- name: platform-registration
image: registry.redhat.io/openshift-sandboxed-containers/osc-tdx-qgs@sha256:86b23461c4eea073f4535a777374a54e934c37ac8c96c6180030f92ebf970524
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mountPath: /sys/firmware/efi/efivars
containers:
- name: tdx-qgs
image: registry.redhat.io/openshift-sandboxed-containers/osc-tdx-qgs@sha256:86b23461c4eea073f4535a777374a54e934c37ac8c96c6180030f92ebf970524
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,9 @@
apiVersion: v1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file can be dropped as it's already included in args (qgs-ds.yaml:

        - name: tdx-qgs
          image: registry.redhat.io/openshift-sandboxed-containers/osc-tdx-qgs:latest
          args:
            - -p=4050
            - -n=4

@@ -0,0 +1,16 @@
apiVersion: v1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is also obsolete qgs-ds.yaml‎

Actual used values are here:

 qcnl-conf: '{"pccs_url": "https://pccs-service:8042/sgx/certification/v4/", "use_secure_cert": false, "pck_cache_expire_hours": 168}'

Comment thread README.md Outdated
2. `bash scripts/get-pcr.sh` — retrieves PCR measurements from the peer-pod VM image and stores them at `~/.coco-pattern/measurements.json` (requires `podman`, `skopeo`, and `~/pull-secret.json`)
3. Review and customise `~/values-secret-coco-pattern.yaml` — this file is loaded into Vault and provides secrets to the pattern
1. `bash scripts/gen-secrets.sh` — generates KBS key pairs, PCCS certificates/tokens (for bare metal), and copies `values-secret.yaml.template` to `~/values-secret-coco-pattern.yaml`
2. `bash scripts/get-pcr.sh` — retrieves PCR measurements from the peer-pod VM image and stores them at `~/.coco-pattern/measurements.json` (requires `podman`, `skopeo`, and `~/pull-secret.json`). **Not required for bare metal deployments.**
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For our testing this step was required for baremetal as it was failing due to lack of file:

  - name: pcrStash
    vaultPrefixes:
    - hub
    fields:
    - name: json
      path: ~/.coco-pattern/measurements.json

Comment thread README.md Outdated
Validated pattern for deploying confidential containers on OpenShift using the [Validated Patterns](https://validatedpatterns.io/) framework.

Confidential containers use hardware-backed Trusted Execution Environments (TEEs) to isolate workloads from cluster and hypervisor administrators. This pattern deploys and configures the Red Hat CoCo stack — including the sandboxed containers operator, Trustee (Key Broker Service), and peer-pod infrastructure — on Azure.
Confidential containers use hardware-backed Trusted Execution Environments (TEEs) to isolate workloads from cluster and hypervisor administrators. This pattern deploys and configures the Red Hat CoCo stack — including the sandboxed containers operator, Trustee (Key Broker Service), and peer-pod infrastructure — on Azure and bare metal.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Confidential containers use hardware-backed Trusted Execution Environments (TEEs) to isolate workloads from cluster and hypervisor administrators. This pattern deploys and configures the Red Hat CoCo stack — including the sandboxed containers operator, Trustee (Key Broker Service), and peer-pod infrastructure — on Azure and bare metal.
Confidential containers use hardware-backed Trusted Execution Environments (TEEs) to isolate workloads from cluster and hypervisor administrators. This pattern deploys and configures the Red Hat CoCo stack — including the sandboxed containers operator, Trustee (Key Broker Service) operator, and Kata infrastructure — on Azure cloud instances and bare metal.

I removed peer-pod infra as it gave the impression that it's for both Azure and bare-metal

Comment thread README.md Outdated

**Bare metal deployments:**

- OpenShift 4.17+ cluster on bare metal with Intel TDX or AMD SEV-SNP hardware
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bpradipt
Copy link
Copy Markdown
Collaborator

Some minor nits. Rest looks good to me

…lidatedpatterns#75 documentation

This commit addresses all review comments from bpradipt and pawelpros on
PR validatedpatterns#73, merges documentation from PR validatedpatterns#75, and updates container images.

Documentation changes:
- README: Replace "peer-pod infrastructure" wording to clarify Azure vs bare metal
- README: Update OCP version requirements from 4.17+ to 4.19.28+ (OSC 1.12 requirement)
- README: Clarify PCR collection differs for Azure (get-pcr.sh) vs bare metal (manual)
- README: Distinguish Azure (kata-remote) from bare metal (kata-cc) runtime classes
- values-secret.yaml.template: Add missing kbsPrivateKey secret
- values-secret.yaml.template: Reorganize with clear section headers and improved docs
- gen-secrets.sh: Add prominent alert when values-secret file is created
- Merge docs/nfd-matchall-bug.md from PR validatedpatterns#75 (NFD matchAll bug report)
- Merge docs/pcr-reference-values-bare-metal.md from PR validatedpatterns#75 (PCR collection guide)

Code cleanup:
- Delete obsolete qgs-config-cm.yaml (QGS args now inline)
- Delete obsolete qgs-sgx-cm.yaml (QCNL config via downwardAPI)
- Remove commented-out detect-runtime-class reference in values-baremetal.yaml

Image updates:
- intel-dpo-sgx.yaml: Update intel-sgx-plugin to sha256:4ac8769c (v0.35.0)
- pccs-deployment.yaml: Update osc-pccs to sha256:edf57087 (v1.12)
- qgs-ds.yaml: Update osc-tdx-qgs to sha256:308d66da (v1.12)

Resolves review comments from:
- bpradipt: peer-pod wording, OCP versions, PCR clarification
- pawelpros: obsolete ConfigMaps, image digests, PCR requirements

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants