Skip to content

Conversation

@Jakob-Naucke
Copy link
Contributor

Run job to run confidential-clusters/compute-pcrs. Supersedes the reference value input workflow. Hard-coded for FCOS 42. Supersedes #11.

@alicefr

  • Permission might be overly permissive -- any hints or is this fine?
  • We could move to an official registry for the compute-pcrs image after merge of Add Containerfile compute-pcrs#19.

cc @bgartzi

use crate::macros::info_if_exists;

const BOOT_IMAGE: &str = "quay.io/fedora/fedora-coreos:42.20250705.3.0";
const COMPUTE_IMAGE: &str = "quay.io/jnaucke/compute-pcrs:latest";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably build the image as part of this repo

@alicefr
Copy link
Contributor

alicefr commented Aug 25, 2025

@Jakob-Naucke instead of reading the job pod output what about if we include the compute-pcrs binary in a new container image, and you add a new binary which launches the calculation and then update the config maps of trustee? I think it will be a cleaner

@Jakob-Naucke
Copy link
Contributor Author

Jakob-Naucke commented Aug 25, 2025

add a new binary which launches the calculation and then update the config maps of trustee? I think it will be a cleaner

Hmm, I have two concerns about it being cleaner.

new binary

Extra Rust crate or jq-manglery in Shell? Neither seems very low friction.

update the config maps

Is API access nearly as easy as what we have from the operator?

@alicefr
Copy link
Contributor

alicefr commented Aug 25, 2025

add a new binary which launches the calculation and then update the config maps of trustee? I think it will be a cleaner

Hmm, I have two concerns about it being cleaner.

new binary

Extra Rust crate or jq-manglery in Shell? Neither seems very low friction.

Well, if we convert the compute-pcrs into a library, it removes all those frictions. IMO, a library what we should get from the compute-pcrs repository. It should also pretty straightforward to convert the binary into a library, and then the new binary can rely on this library.

update the config maps

Is API access nearly as easy as what we have from the operator?

Well, the operator would need to fetch the logs and patch the reference values config maps, so why not let the job doing it immediately

@Jakob-Naucke
Copy link
Contributor Author

Extra Rust crate or jq-manglery in Shell? Neither seems very low friction.

Well, if we convert the compute-pcrs into a library, it removes all those frictions. IMO, a library what we should get from the compute-pcrs repository. It should also pretty straightforward to convert the binary into a library, and then the new binary can rely on this library.

oh yeah that's cleaner indeed. Didn't realize you meant that based on

include the compute-pcrs binary in a new container image


Is API access nearly as easy as what we have from the operator?

Well, the operator would need to fetch the logs and patch the reference values config maps, so why not let the job doing it immediately

What I meant was: The job needs all the k8s API interaction logic again. That's probably fine though.

@alicefr
Copy link
Contributor

alicefr commented Aug 25, 2025

Is API access nearly as easy as what we have from the operator?

Well, the operator would need to fetch the logs and patch the reference values config maps, so why not let the job doing it immediately

What I meant was: The job needs all the k8s API interaction logic again. That's probably fine though.

Yes, it requires the k8s client and the rbac for updating the config map. But it is a self-contained job with a clear scope, so I personally prefer it rather then parsing the pod output.
You could also let the job create a config map in the operator with the reference values and then the operator update the reference values in the trustee namespace, but imo, it is a bit overkilled.

@bgartzi
Copy link

bgartzi commented Aug 25, 2025

@alicefr:

Well, if we convert the compute-pcrs into a library, it removes all those frictions. IMO, a library what we should get from the compute-pcrs repository. It should also pretty straightforward to convert the binary into a library, and then the new binary can rely on this library.

The source code is already divided into cli and lib. Is that what you meant? Is there something else we should do on the compute-pcrs side?

@Jakob-Naucke
Copy link
Contributor Author

The source code is already divided into cli and lib. Is that what you meant? Is there something else we should do on the compute-pcrs side?

I plan to make a new binary crate that uses compute-pcrs-lib. I think that side is fine as is.

@Jakob-Naucke
Copy link
Contributor Author

untested, pushed to the wrong remote, bound to happen. converting to draft. sorry for the noise.

@Jakob-Naucke Jakob-Naucke marked this pull request as draft August 26, 2025 13:49
@Jakob-Naucke
Copy link
Contributor Author

@alicefr now tested and ready for review, comment on permissions still standing though

@Jakob-Naucke Jakob-Naucke marked this pull request as ready for review August 26, 2025 15:51
Makefile Outdated
REGISTRY ?= quay.io
OPERATOR_IMAGE=$(REGISTRY)/confidential-clusters/cocl-operator:latest
COMPUTE_PCRS_IMAGE=$(REGISTRY)/confidential-clusters/compute-pcrs:latest
PCRS_BOOT_IMAGE=quay.io/fedora/fedora-coreos:42.20250705.3.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't put this image as part of the manifests. For now, it is fine to have it hardcoded, but it is something we need to infer from the cluster

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying

  1. cluster inference should be implemented before merge, or
  2. we can merge this but it will need to be changed, or
  3. cluster inference is not necessary now, but even then we should have something else (what?)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic will be part of the operator not the manifests. So, I would avoid to put this as part of the manifests generation. You can, for now, just hardcode the fedora version in the operator

Makefile Outdated
--trustee-namespace operators
--trustee-namespace operators \
--pcrs-compute-image $(COMPUTE_PCRS_IMAGE) \
--pcrs-boot-image $(PCRS_BOOT_IMAGE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As comment as above

Comment on lines +5 to +7
# Hack: Set compute-pcrs as sole member to avoid needing to copy other crates.
# In that case, a rebuild would be triggered upon any change in those crates.
RUN sed -i 's/members =.*/members = ["compute-pcrs"]/' Cargo.toml && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cannot you copy the Cargo.toml from compute-pcrs instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it has workspace dependencies and I prefer workspace dependencies over potentially deviating versions

Comment on lines 15 to 16
"--efivars", "/reference-values/efivars/qemu-ovmf/fcos-42", \
"--mokvars", "/reference-values/mok-variables/fcos-42"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: usually the args are specified at container creation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no this is intentional. These arguments are dependent on the path of reference-values, which is "known" in this Containerfile, not when the other arguments are fed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubernetes overwrites the entrypoint in any case. What I don't like is that there is fcos-42 hardcoded there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubernetes overwrites the entrypoint in any case.

Not when you use args instead of command 🙂

What I don't like is that there is fcos-42 hardcoded there

I can see that. I'm considering adding a "reference values base directory" flag to the compute-pcrs binary though so this information isn't spread out. Or we clone with an init container. WDYT?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has a further decision been taken on this topic? I see this is one of the only references to reference-values/mok-variables/fcos-42 which I would like to update to reference-values/mok-variables/fedora-42 for the sake of simplicity.

Are we safe if I move trusted-execution-clusters/reference-values#3 on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgartzi go ahead.

git clone --depth 1 https://github.com/confidential-clusters/reference-values && \
cargo build --release -p compute-pcrs

FROM docker.io/library/debian:trixie
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use fedora as base image?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I used Debian because docker.io/library/rust only has Debian (or Alpine) as base and a Debian base will be 100% ABI compatible for execution. I can check if Fedora works though (or base the build container on Fedora too, probably requires an explicit Rust installation).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it can be built on the debian rust image, but you should copy it in a fedora base image. Or are you afraid that the dynamic library won't match?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or are you afraid that the dynamic library won't match?

Yes I was, but maybe it's fine. I'll test.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nobody asked, but, would a similar approach to the one proposed in trusted-execution-clusters/compute-pcrs/pull/19 work for this?

i.e. fedora as builder, install needed dependencies (a subset of the compute-pcrs https://github.com/confidential-clusters/compute-pcrs/blob/main/.github/Containerfile.buildroot), build binary, then copy the binary to a clean fedora image?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or use compute-pcrs's image? 🙃

Comment on lines +94 to +105
let client = Client::try_default().await?;
let config_maps: Api<ConfigMap> = Api::namespaced(client, &args.namespace);
match config_maps
.create(&PostParams::default(), &config_map)
.await
{
Ok(_) => info!("Create ConfigMap {}", args.configmap),
Err(kube::Error::Api(ae)) if ae.code == 409 => {
info!("ConfigMap {} already exists", args.configmap)
}
Err(e) => return Err(e.into()),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the config map already exists? You probably wants to retrieve its value check if it is different from the reference values, and if not then not update it. Right now, it make little sense but when we have more coreos versions to handle then the logic will become useful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as per #13, the RVs will be computed statelessly, and the config map will be overwritten. the code that I wrote for this which I momentarily refuse to delete is at Jakob-Naucke:shelved-append-rvs. I'm in favor of merging this PR first and moving on from there.

)]
pcrs_compute_image: String,

#[arg(long, default_value = "quay.io/fedora/fedora-coreos:42.20250705.3.0")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this and hardcoded directly in the compute-pcrs. We will then add a logic to detect the coreos version to calculate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hardcoded directly in the compute-pcrs

This info is required when defining the container and its image volume, it's nothing that compute-pcrs lib/bin/image can influence

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the job with the image volume is created by the operator, so you don't need this in the manifests

name: Some(name.to_string()),
namespace: Some(namespace.to_string()),
let pod_spec = PodSpec {
service_account_name: Some("cocl-operator".to_string()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would create a separate service account for the job only with a separate Role to only be able to create and modify the config maps in the trustee namespace

@alicefr
Copy link
Contributor

alicefr commented Aug 27, 2025

@Jakob-Naucke as far as it regards the permission, I think you are referring to job RBAC, right? I think you should split the permission for the job only as I mentioned already here

fi
done
kubectl delete deploy cocl-operator -n confidential-clusters || true
kubectl delete job compute-pcrs -n confidential-clusters || true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think should be handled by the operator

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the job completes, the operator can then remove it

let create = jobs.create(&PostParams::default(), &job).await;
info_if_exists!(create, "Job", job_name);
let completed = await_condition(jobs.clone(), job_name, is_job_completed());
let _ = timeout(Duration::from_secs(900), completed).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this blocking until it completes?
The usual way in kubernetes is watching objects with a controller and then trigger a reconciliation loop if there is a change in the state. Can we implement something like this here? See the rust documentation for the Controller

Jakob-Naucke and others added 6 commits September 9, 2025 19:29
Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
Based-on-patch-by: Alice Frosi <alicefr@redhat.com>
Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
Based-on-patch-by: Alice Frosi <afrosi@redhat.com>
in place of ClusterRoleBindings

Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
Based-on-patch-by: Alice Frosi <afrosi@redhat.com>
and some extra dependency cleanups

Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
Create new binary to use
confidential-clusters/compute-pcrs-lib and write the configmap. Build
image with it and extend cocl spec with the respective image fields. Run
with a job. Supersedes the reference value input workflow.

Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
Action::requeue(Duration::from_secs(60))
}

pub async fn launch_rv_job_controller(client: Client, namespace: &str) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, yes I meant exactly this, thx!

@alicefr alicefr merged commit ed73907 into trusted-execution-clusters:main Sep 11, 2025
5 checks passed
@Jakob-Naucke Jakob-Naucke deleted the compute-pcrs branch September 11, 2025 07:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants