Skip to content

Conversation

@Jakob-Naucke
Copy link
Contributor

  • Create new config map to cache known PCR values and parts.
  • Fill with PCR label if present in boot image.
  • Create new binary as fallback to compute values using an image volume.
  • Recompute reference values based on config map.

Supersedes the reference value input workflow and #12.

Uses a config map for value caching instead of a CRD because identifying a CRD with an image ref is a little weird (character type & length limits). I'm not a huge fan either though because this means rewriting the entire map on every new image (as we do with the reference values…), so I'm open to other suggestions.

fi
done
kubectl delete deploy cocl-operator -n confidential-clusters || true
kubectl delete job compute-pcrs -n confidential-clusters || true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already left the comment in the other PR, but the job deletion should be handled by the operator

@alicefr
Copy link
Contributor

alicefr commented Sep 2, 2025

Uses a config map for value caching instead of a CRD because identifying a CRD with an image ref is a little weird (character type & length limits). I'm not a huge fan either though because this means rewriting the entire map on every new image (as we do with the reference values…), so I'm open to other suggestions.

The other alternative is to have a file on a PVC and append the values there, but then it requires a CSI provider, not a big deal though

Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
@Jakob-Naucke Jakob-Naucke requested review from alicefr and removed request for alicefr September 11, 2025 16:24
..Default::default()
},
]),
restart_policy: Some("Never".to_string()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do they have this constant already define in the crate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no :(

Comment on lines +166 to +171
if status.completion_time.is_none() {
info!("Job {name} changed, but had not completed");
return Ok(Action::requeue(Duration::from_secs(300)));
}
let jobs: Api<Job> = Api::namespaced(ctx.client.clone(), &ctx.operator_namespace);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are missing what we do if the job fails. Do we restart it? Currently, you are deleting it if it was successful and when it failed as well

Copy link
Contributor Author

@Jakob-Naucke Jakob-Naucke Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, you are deleting it if it was successful and when it failed as well

no, completion_time is only Some on success. New jobs pods for that job will create within the default back-off limit (6), the job will stick around as failed.

let context = Arc::new(ctx);
// refine watcher if we handle more than one type of job
tokio::spawn(
Controller::new(jobs, watcher::Config::default())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the rust crate allow to monitor only a subset of jobs? Like filtering by label? We are only interested in the pcr calculation jobs. If there is such a filtering, it should reduced the number of the objects that are monitored

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already constrained to the namespace, but would you prefer a label anyway?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add other kind jobs, it will be better, yes

crds/src/lib.rs Outdated
Comment on lines 26 to 29
pub struct ImagePcr {
pub first_seen: DateTime<Utc>,
pub pcrs: Vec<Pcr>,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a new CRD? I thought it should be some kind of metadata in the image manifest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it should be some kind of metadata in the image manifest

Do you mean the label content? The label content I assume to be this. ImagePcrs is what is expected in the operator-managed config map (Create new config map to cache known PCR values and parts), as an interpretation of this suggestion from #13:

CRD to store for each image seen in the cluster:

  • image sha256sum / tag
  • Date first seen
  • PCR values (i.e. the label from the container above)

where I ultimately didn't go for a CRD for reasons described in the original PR comment (counter-suggestions still welcome)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so, IIUC, this will be the content of the config map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It could look like this (most recently tested with a labelled image that I created, thus the tag):

data:
  image-pcrs.json: '{"quay.io/jnaucke/fedora-tagged-pcrs:v2":{"first_seen":"2025-09-12T08:06:49.324536935Z","pcrs":[{"id":4,"value":"551bbd142a716c67cd78336593c2eb3b547b575e810ced4501d761082b5cd4a8","parts":[{"name":"EV_EFI_ACTION", ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, please don't define it in the CRDs. Just move it in the reference values module/file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah now I see where you're coming from. The issue is that the fallback compute-pcrs job also writes to that config map (alternative would be job output parsing which we previously disregarded) and needs to know its structure. Alternatives that I'd see:

  • rename the crds crate (to k8s_resources? meh)
  • define the map in operator like you suggested, and import that from compute-pcrs instead. I'm not a fan because that would trigger a rebuild of compute-pcrs upon any change of operator (hence the Cargo.toml edit in its Containerfile), which with the current Podman setup is not incremental.
  • make it a CRD after all (would require some RFC1035-compatible identification of images which, FWIW, I already ended up making for job naming in the meantime)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or define another crate. This seems the cleanest option to me

@alicefr
Copy link
Contributor

alicefr commented Sep 12, 2025

@Jakob-Naucke when @6-dehan has implemented the first unti-tests in #20, please add the tests covering this PR. The logic is becoming slightly more complex

- Create a new config map to cache known PCR values and parts.
- Fill with PCR label if present in boot image.
- Recompute reference values based on config map.

Technicalities:
- Pivot compute-pcrs job from setting reference values directly to
  writing to the new config map. It is now used as a fallback.
- Move reference value-specific, non-Trustee-interacting operations to a
  new module.

Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
@alicefr
Copy link
Contributor

alicefr commented Sep 12, 2025

Looks good!

@alicefr alicefr merged commit 15d245e into trusted-execution-clusters:main Sep 12, 2025
5 checks passed
@Jakob-Naucke Jakob-Naucke deleted the labels branch September 12, 2025 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants